I have a large dataset that contains information about pieces of equipment. I need to be able to aggregate up individual equipment characteristics to a group level.
Variables are labeled in the following way: <Group ID>_<Subgroup ID>_<element>_<equipment #>. For example, "PP_cam_old_2" captures the old format of the second piece of equipment listed under "PP" and "cam."
I would like to be able to compare information about individual pieces of equipment, and aggregate up within groups and subgroups. This requires me to match variables based on their Header and Subheader IDs, as well as the equipment # (to make sure I’m comparing pieces of information about the same piece of equipment). For example, if I want to compare the old format of the second piece of equipment under "PP" and "cam" to the new format for the second piece of equipment under "PP" and "cam" I need to compare the PP_cam_old_2 to PP_cam_new_2 . This would allow me to tell if the equipment is being upgraded, and to what.
I would prefer not to do this manually for all pieces of equipment listed in the dataset. It seems like some combination of a forloop with an embedded regular expression would be the way to go. I am new to using regular expressions, but my experiencing using them to relabel this dataset has been encouraging.
In the example above, my initial thought was to start by generating a new variable that records the difference between <Group ID>_<Subgroup ID>_< equipment #> to <Group ID>_<Subgroup ID>_new_< equipment #>. I started playing around with the below, but this is clearly not correct:
foreach v of varlist PP_*_old* {
local var: var `v'
if regexm(`"`var'"', "^(PP_)(.+)_old_([0-9]+$"){
gen (1)_(2)_1 = .
if regexs(1)_(2)_old_regexs(3) = "hardware" & regexs(1)_(2)_new_regexs(3) = "hardware"
replace (1)_(2)_1 = 1
}
}
I get the following error:
PP_scg_switch_old_1 not allowed
r(101);
Interestingly, this is not the first variable that matches the criteria listed in -if regexm(`"`var'"', "^(PP_)(.+)_old_([0-9]+$")-
Any thoughts?
Variables are labeled in the following way: <Group ID>_<Subgroup ID>_<element>_<equipment #>. For example, "PP_cam_old_2" captures the old format of the second piece of equipment listed under "PP" and "cam."
I would like to be able to compare information about individual pieces of equipment, and aggregate up within groups and subgroups. This requires me to match variables based on their Header and Subheader IDs, as well as the equipment # (to make sure I’m comparing pieces of information about the same piece of equipment). For example, if I want to compare the old format of the second piece of equipment under "PP" and "cam" to the new format for the second piece of equipment under "PP" and "cam" I need to compare the PP_cam_old_2 to PP_cam_new_2 . This would allow me to tell if the equipment is being upgraded, and to what.
I would prefer not to do this manually for all pieces of equipment listed in the dataset. It seems like some combination of a forloop with an embedded regular expression would be the way to go. I am new to using regular expressions, but my experiencing using them to relabel this dataset has been encouraging.
In the example above, my initial thought was to start by generating a new variable that records the difference between <Group ID>_<Subgroup ID>_< equipment #> to <Group ID>_<Subgroup ID>_new_< equipment #>. I started playing around with the below, but this is clearly not correct:
foreach v of varlist PP_*_old* {
local var: var `v'
if regexm(`"`var'"', "^(PP_)(.+)_old_([0-9]+$"){
gen (1)_(2)_1 = .
if regexs(1)_(2)_old_regexs(3) = "hardware" & regexs(1)_(2)_new_regexs(3) = "hardware"
replace (1)_(2)_1 = 1
}
}
I get the following error:
PP_scg_switch_old_1 not allowed
r(101);
Interestingly, this is not the first variable that matches the criteria listed in -if regexm(`"`var'"', "^(PP_)(.+)_old_([0-9]+$")-
Any thoughts?