Dear all,
I have a dataset which contains actual observations and a varying number of control observations per group.
The goal is to assign 3 control observations (marked as control*==1) to each actual observation (marked as actual==1) in order to have a fixed ratio between actuals and controls. Therefore, I would like to randomly select 3 observations from all potential controls per group.
The group variable (group_id) is the combination of "worker_id & firm_id". So, for each actual observation per group, the potential pool of controls consists in observations which share the same "worker_id & firm_id" and are marked as control1==1 or control2==1. For example, for the actual observation in line 1 below, the possible control observations according to control1 are those in line 2-7 because the have the same worker_id & firm_id and are marked as control1==1. According to control2, the eligible controls are line 6 and 7.
The variables control1 and control2 simply apply a different set of criteria under which an observation can qualify as a control to the actual. The goal is to conduct 2 separate random selections of 3 controls per group, first according to control1 and second according to control2. As a result, ideally 2 new variables would be generated which tag the randomly selected control observations per group according to control1 and control2.
Do you have any suggestions on how to approach this? I would really appreciate any help with the code.
There are in particular two aspects that I struggle to incorporate:
First, note that there can be more than one "actual" per group (max. 4) (compare for example line 1 and 8). In this case, the potential pool of controls is the same, however, the random set of 3 control observations should be drawn independently for each "actual".
Second, for some groups there may be fewer than 3 potential controls available. In this case, I would like to flag this "actual" in order to exclude it from the analysis later.
Please find below a data excerpt. Many thanks again for your help.
I have a dataset which contains actual observations and a varying number of control observations per group.
The goal is to assign 3 control observations (marked as control*==1) to each actual observation (marked as actual==1) in order to have a fixed ratio between actuals and controls. Therefore, I would like to randomly select 3 observations from all potential controls per group.
The group variable (group_id) is the combination of "worker_id & firm_id". So, for each actual observation per group, the potential pool of controls consists in observations which share the same "worker_id & firm_id" and are marked as control1==1 or control2==1. For example, for the actual observation in line 1 below, the possible control observations according to control1 are those in line 2-7 because the have the same worker_id & firm_id and are marked as control1==1. According to control2, the eligible controls are line 6 and 7.
The variables control1 and control2 simply apply a different set of criteria under which an observation can qualify as a control to the actual. The goal is to conduct 2 separate random selections of 3 controls per group, first according to control1 and second according to control2. As a result, ideally 2 new variables would be generated which tag the randomly selected control observations per group according to control1 and control2.
Do you have any suggestions on how to approach this? I would really appreciate any help with the code.
There are in particular two aspects that I struggle to incorporate:
First, note that there can be more than one "actual" per group (max. 4) (compare for example line 1 and 8). In this case, the potential pool of controls is the same, however, the random set of 3 control observations should be drawn independently for each "actual".
Second, for some groups there may be fewer than 3 potential controls available. In this case, I would like to flag this "actual" in order to exclude it from the analysis later.
Please find below a data excerpt. Many thanks again for your help.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte worker_id int(firm_id group_id coworker_id) byte(actual control1 control2) 1 999 1999 100 1 0 0 1 999 1999 589 0 1 0 1 999 1999 877 0 1 0 1 999 1999 234 0 1 0 1 999 1999 205 0 1 0 1 999 1999 743 0 1 1 1 999 1999 284 0 1 1 1 999 1999 104 1 0 0 2 876 2876 874 1 0 0 2 876 2876 432 1 0 0 2 876 2876 434 0 1 1 2 876 2876 546 0 1 1 2 876 2876 342 0 1 1 2 876 2876 689 0 1 0 2 876 2876 65 0 1 1 2 876 2876 439 0 1 0 2 876 2876 234 0 1 0 2 876 2876 543 0 1 0 end