Hello,
I have the following problem: I want to drop some observations of one variable (in my case called "Mother's place of birth" or "mplbir") based on the frequency of a matching variable (here called "Child's birth state" or "birthstate"). In a first step I looked at the frequencies of for example the state of Alabama (Geocode 01) by using
table mplbir if birthstate==01
----------------------
Mplbir | Freq.
----------+-----------
01 | 41,341
02 | 58
03 | 132
04 | 131
05 | 6
This means that in the state of Alabama in a specific year 41,341 children were born from a mother also from Alabama, 58 from a mother from Alaska (Geocode 02), 132 from a mother from Arizona (Geocode 03) and so on.
Now I would like to drop all observations in my dataset that have a frequency of more than 500, i.e. I only want to keep working with low numbers of "birthpairs". Therefore, I guess writing a loop would be the easiest option. I just can't figure out how to code in Stata that observations with a match and a frequency of more than 500 should be dropped.
Thank you very much in advance for helping me!
Best,
Max
I have the following problem: I want to drop some observations of one variable (in my case called "Mother's place of birth" or "mplbir") based on the frequency of a matching variable (here called "Child's birth state" or "birthstate"). In a first step I looked at the frequencies of for example the state of Alabama (Geocode 01) by using
table mplbir if birthstate==01
----------------------
Mplbir | Freq.
----------+-----------
01 | 41,341
02 | 58
03 | 132
04 | 131
05 | 6
This means that in the state of Alabama in a specific year 41,341 children were born from a mother also from Alabama, 58 from a mother from Alaska (Geocode 02), 132 from a mother from Arizona (Geocode 03) and so on.
Now I would like to drop all observations in my dataset that have a frequency of more than 500, i.e. I only want to keep working with low numbers of "birthpairs". Therefore, I guess writing a loop would be the easiest option. I just can't figure out how to code in Stata that observations with a match and a frequency of more than 500 should be dropped.
Thank you very much in advance for helping me!
Best,
Max