Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 72776

Cleaning a variable using the generate command with the 'real' function

$
0
0
Hi all

I currently have a variable 'patientheight' (storage type string) in my dataset that describes heights in cm for 25,000 plus patients. While the majority of these measurements are recorded correctly, a small proportion have been entered in the wrong format. For example, a tab of the first fifteen recorded heights is as follows:

:
list patientheight in 1/15
168
171
171
156
None
135
1.2
175cm
132 cm
125
136
148.7cm
N/A
148.7
None


From the above, I would like to replace those coded as 'None' and 'N/A' as missing as they cannot be used. I generally use the following code to take care of this:

:
tab patientheight if missing(real(patientheight))

gen heightnew = real(patientheight)
tab heightnew
However, the above will also get rid of those heights entered as 175cm, 132 cm etc. There are almost 3000 heights entered with a 'cm' suffix which I would like to retain. Is there a way to tell stata to retain heights entered with the 'cm' suffix and convert the same to numerical?

Thanks

/Amal

Viewing all articles
Browse latest Browse all 72776

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>