Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 73407

Replacing string using regexm/regexs

$
0
0
Hi,

My data consists of a list of viral mutations, separated by a comma. Here is some dummy data:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str33 nrti
"K65R,Y115F,M184V"           
"D67N,K70R,M184MV,K219E"      
"D67N,K70E,M184V"            
"D67DN,K70R,M184V,T215I,K219E"
"D67DN,K70E,M184V,K219KR"    
"K70Q,M184V"                 
"M184V"                      
end
The format for each mutation (separated by a comma, no space), should be capital string, 1-3 numbers, followed by one capital string (eg, K65R). However, sometimes there are two string characters at the end (eg, K65KR). I want to replace this so that the first of the two string characters at the end is removed (eg, K65KR -> K65R).

I am trying to achieve this using the regexm/regexs string functions. I can identify the issue using regexr to replace the errors with a different text (repeating the code to identify cases where there are more than one problem mutation in a cell).

Code:
gen dup = nrti
replace dup = regexr(dup, "[A-Z][0-9]+[A-Z][A-Z]","issue")
But this isn't exactly what I want to do. I am trying various iterations using regexs but can't quite seem to get there. Does anyone have any advice on how I could achieve this?

I really appreciate your any help on this.

Bryony

Viewing all articles
Browse latest Browse all 73407

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>