Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 72795

New on SSC: range join - a program to form pairwise combinations using a range

$
0
0
Thanks for Kit Baum, a new program called rangejoin is now available from SSC. Stata 11 is required.

rangejoin forms pairwise combinations between observations in memory and observations in a using dataset when the value of a key variable in the using dataset is within the range specified by observations in the data in memory.

To install, type in Stata's command window:

Code:
ssc install rangejoin
Once installed, type

Code:
help rangejoin
to get more information.

rangejoin leverages the power of the recently announced rangestat (from SSC, with Roberto Ferrer and Nick Cox) to implement what amounts to a joinby over a range of values. The observations in memory define the low and high bounds of the interval to use and the values of a key variable in the using dataset determines which observations are to be paired.

Without rangejoin, you would have to follow these steps to achieve the same results:
  1. rename variables in memory or in the using dataset to avoid name conflicts, and
  2. use cross or joinby to form all pairwise combinations, and
  3. use keep if inrange(keyvar, low, high) to reduce the data to pairwise combinations that are within the desired range.
With even moderately small datasets, the overhead required to form all pairwise combinations first and then pare down to observations in range is hugely inefficient. For larger datasets, this brute force approach may not be feasible at all.

rangejoin can be used with a stored copy of the data in memory to form the same pairwise combinations that rangestat makes to calculates its statistics. This could be useful if you need to impose an additional condition that can't be specified with rangestat.

You can also do things that you could not do with rangestat, that is you can form pairwise combinations over completely different datasets. Suppose that you have data on medical events:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(patient event_id event_type) str10 event_date float eventday
1 1 345 "01/02/2015" 20090
1 2 543 "01/02/2015" 20090
1 3 676 "01/20/2015" 20108
2 1 567 "01/10/2015" 20098
2 2 999 "01/20/2015" 20108
end
format %td eventday
save "events.dta", replace
and you have in memory data on prescription fills:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(patient Rx_id) str10 fill_date float(supply drug_id fillday)
1 1 "01/01/2015" 30 88 20089
1 2 "01/01/2015" 90 99 20089
1 3 "01/15/2015" 15 77 20103
1 4 "01/17/2015" 21 33 20105
2 1 "01/01/2015" 15 22 20089
2 2 "01/07/2015" 30 44 20095
end
format %td fillday
You can match each prescription with medical events that occurred during the fill period:
Code:
* the day of the last pill
gen lastday = fillday + supply - 1
format %td lastday

rangejoin eventday fillday lastday using "events.dta", by(patient)
list patient Rx_id drug_id fillday lastday event_date event_type , sepby(patient Rx_id)
and the results:

Code:
. list patient Rx_id drug_id fillday lastday event_date event_type , sepby(patient Rx_id)

     +---------------------------------------------------------------------------+
     | patient   Rx_id   drug_id     fillday     lastday   event_date   event~pe |
     |---------------------------------------------------------------------------|
  1. |       1       1        88   01jan2015   30jan2015   01/02/2015        345 |
  2. |       1       1        88   01jan2015   30jan2015   01/02/2015        543 |
  3. |       1       1        88   01jan2015   30jan2015   01/20/2015        676 |
     |---------------------------------------------------------------------------|
  4. |       1       2        99   01jan2015   31mar2015   01/02/2015        345 |
  5. |       1       2        99   01jan2015   31mar2015   01/02/2015        543 |
  6. |       1       2        99   01jan2015   31mar2015   01/20/2015        676 |
     |---------------------------------------------------------------------------|
  7. |       1       3        77   15jan2015   29jan2015   01/20/2015        676 |
     |---------------------------------------------------------------------------|
  8. |       1       4        33   17jan2015   06feb2015   01/20/2015        676 |
     |---------------------------------------------------------------------------|
  9. |       2       1        22   01jan2015   15jan2015   01/10/2015        567 |
     |---------------------------------------------------------------------------|
 10. |       2       2        44   07jan2015   05feb2015   01/10/2015        567 |
 11. |       2       2        44   07jan2015   05feb2015   01/20/2015        999 |
     +---------------------------------------------------------------------------+
You can reorder the observations and show, for each medical event, which prescriptions were being taken at the time of the event:
Code:
. sort patient event_id Rx_id

. list patient event_id eventday event_type Rx_id drug_id fillday lastday, sepby(patient event_id)

     +-------------------------------------------------------------------------------------+
     | patient   event_id    eventday   event~pe   Rx_id   drug_id     fillday     lastday |
     |-------------------------------------------------------------------------------------|
  1. |       1          1   02jan2015        345       1        88   01jan2015   30jan2015 |
  2. |       1          1   02jan2015        345       2        99   01jan2015   31mar2015 |
     |-------------------------------------------------------------------------------------|
  3. |       1          2   02jan2015        543       1        88   01jan2015   30jan2015 |
  4. |       1          2   02jan2015        543       2        99   01jan2015   31mar2015 |
     |-------------------------------------------------------------------------------------|
  5. |       1          3   20jan2015        676       1        88   01jan2015   30jan2015 |
  6. |       1          3   20jan2015        676       2        99   01jan2015   31mar2015 |
  7. |       1          3   20jan2015        676       3        77   15jan2015   29jan2015 |
  8. |       1          3   20jan2015        676       4        33   17jan2015   06feb2015 |
     |-------------------------------------------------------------------------------------|
  9. |       2          1   10jan2015        567       1        22   01jan2015   15jan2015 |
 10. |       2          1   10jan2015        567       2        44   07jan2015   05feb2015 |
     |-------------------------------------------------------------------------------------|
 11. |       2          2   20jan2015        999       2        44   07jan2015   05feb2015 |
     +-------------------------------------------------------------------------------------+

Viewing all articles
Browse latest Browse all 72795

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>