Using Stata version 13.1.
I am relatively new to using Stata and have been approaching my problem with a particular strategy. I need either a way to make that strategy work, or a whole new approach to the problem, and would appreciate any suggestions.
I have a data set of around 3 million cases, spanning 19 years, with 61 variables at the moment. These are records of criminal convictions. Many of the individuals appear multiple times in the data set, ranging from once to 40 times.
I want to create an indicator variable signposting that for any given conviction, that individual has been convicted in the previous 5 years.
It is very simple to generate an indicator for a previous conviction at any time. The variables here are:
Yearconvict: the year in which the conviction took place
Findate: the date of the conviction (important when 2 convictions in one year)
Spistr: the unique identification variable, which is a string
Here's an example of my data, with some different situations:
The code is:
However, the 5 year roll is a problem for me. I’ve tried a few different approaches, but the strategy is I create a count of the number of convictions within the 5 year window. If that number of convictions is greater than which iteration of a loop we're up to, then prior history gets a "1”. The code is:
Next I created a series of variables that gave the number of convictions within the 5 year window for each year. Ie, “earliest2012” would count how many convictions an individual had between 2008 and 2012. “Priorcon” is my prior convictions variable, and ends up full of zeroes here.
Last, I ran a loop to replace priorcon with a 1 if the number of convictions in the previous 5 years is greater than which iteration of the loop we’re up to (I suspect there is a more efficient way to do this, but as I said, I’m relatively new. I just wrote another loop for each year – so replacing “earliest2013” with “earliest2012”, for example, and yearconvict==2012. If you're playing along at home, for the example dataset, run this for 2013, 2012 and 2011).
This worked except where there were 2 convictions in one year, and the earliest conviction had no priors. Priorcon will contain a 1 in that case, because the earliest* variable counts both convictions from that year, so 2>1 in both convictions. See where spistr is 0000001362 - two convictions in 2012, but nothing prior. The first case should have priorcon=1 and the second priorcon=0, but both have priorcon=1. Look at spistr=0000000913 for where the code works the way I want it to.
For those of you who have got this far in this post, thank you, and thanks in advance for any suggestions.
Shannon
I am relatively new to using Stata and have been approaching my problem with a particular strategy. I need either a way to make that strategy work, or a whole new approach to the problem, and would appreciate any suggestions.
I have a data set of around 3 million cases, spanning 19 years, with 61 variables at the moment. These are records of criminal convictions. Many of the individuals appear multiple times in the data set, ranging from once to 40 times.
I want to create an indicator variable signposting that for any given conviction, that individual has been convicted in the previous 5 years.
It is very simple to generate an indicator for a previous conviction at any time. The variables here are:
Yearconvict: the year in which the conviction took place
Findate: the date of the conviction (important when 2 convictions in one year)
Spistr: the unique identification variable, which is a string
Here's an example of my data, with some different situations:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str10 spistr float(yearconvict findate) "0000000913" 2013 19400 "0000000913" 2012 19313 "0000000913" 2012 19253 "0000000913" 2011 18949 "0000001359" 2012 19304 "0000001362" 2012 19341 "0000001362" 2012 19229 end format %td findate
The code is:
Code:
*First create an indicator for where there is no unique person identifier for a case sort yearconvict gen idmissind=(spistr=="") tab idmissind, missing label variable idmissind "1 where single person ID missing" *Then create priorsimp, a simple prior conviction measure sort spistr findate gen priorsimp=0 replace priorsimp=. if spistr=="" by spistr: replace priorsimp=1 if findate[_n]>findate[1]&idmissind==0
Code:
*Sort so that all cases with single person id are first, then sort into groups, then so that most recent *case is first within the group gsort idmissind spistr -findate *generate a numbered order for each case within its group by idmissind spistr_id: gen numcons=_n replace numcons=. if (idmissind==1)
Code:
gen priorcon=y[.] levelsof yearconvict, local(ycon) foreach l of local ycon { replace priorcon=0 if yearconvict==`l' egen earliest`l'=count(numcons) if yearconvict>=`l'-5 & yearconvict<=`l', by(spistr) }
Code:
summarize numcons quietly forvalues i = 1/`r(max)' { tempvar temp gen temp=`i' replace priorcon=1 if earliest2013>temp & yearconvict==2013 drop temp } *Then once done running all of the loops Drop earliest*
For those of you who have got this far in this post, thank you, and thanks in advance for any suggestions.
Shannon