by id: replace var1[1] = max(var1), error "weights not allowed"

August 15, 2019, 1:14 am

≫ Next: ASDOC: Dropping base level of factor variables

≪ Previous: how many waves at least needed to run system GMM

Dear community
I have the following types of panel data:
The actual data is huge, so ask questions to reduce the workload as much as possible.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id var1 target1 target2)
1 . 4 4
1 4 . 4
1 4 . 4
1 . . .
2 . 8 8
2 . . 8
2 8 . 8
2 . . .
end

I want to make target1 or target2 column as simple as possible using var1 column.
It doesn't matter whether target1 or target2.

The way I thought about it is as follows in the code

Code:

by id: replace var1[1] = max(var1)

Then there's an error called "weights not allowed."

thanks!

↧

ASDOC: Dropping base level of factor variables

August 15, 2019, 2:59 am

≫ Next: Time variable issue when trying a difference in difference approach

≪ Previous: by id: replace var1[1] = max(var1), error "weights not allowed"

Dear all users pf Statalist

I do not find option of the command of asdoc to drop base level of factor variables(i.variable) such as binary dummy variable. I want a coefficient of 0bn.variable not to stay in the nested table.

or Is it an only way to use option drop() that specifies the list of coefficients which are supposed to be dropped from the produced table?

Ryo

↧

Time variable issue when trying a difference in difference approach

August 15, 2019, 4:07 am

≫ Next: Help with xtreg, splines and trends?

≪ Previous: ASDOC: Dropping base level of factor variables

Good day

I want to perform a difference in difference regression defined as

Code:

mi estimate, esampvaryok: reg job_hours treatment survey survey#treatment, robust

treatment is a self created binary variable that seems to be performing correctly

However for some reason my time variable is causing issues as can be seen by the following output

Code:

. mi estimate, esampvaryok: reg job_hours treatment survey survey#treatment, robust

Multiple-imputation estimates                     Imputations     =          5
Linear regression                                 Number of obs   =         47
                                                  Average RVI     =     0.1641
                                                  Largest FMI     =     0.3495
                                                  Complete DF     =         43
DF adjustment:   Small sample                     DF:     min     =      16.96
                                                          avg     =      30.94
                                                          max     =      39.37
Model F test:       Equal FMI                     F(   3,   32.3) =       1.60
Within VCE type:       Robust                     Prob > F        =     0.2078

----------------------------------------------------------------------------------
       job_hours |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
       treatment |  -7.204348   10.75615    -0.67   0.512    -29.90156    15.49287
          survey |  -6.755652   3.470882    -1.95   0.059     -13.7762    .2648965
                 |
survey#treatment |
            1 1  |  -2.855652   17.76597    -0.16   0.873    -39.22317    33.51187
            2 0  |          0  (omitted)
            2 1  |          0  (omitted)
                 |
           _cons |   50.41565   5.211848     9.67   0.000     39.87688    60.95442
----------------------------------------------------------------------------------

Survey is coded as 1 or 2 for the two respective waves. However, for some reason it will not perform the interaction and I do not understand why. There is variation in all variables used. I have used survey for other things in my code, so why won't it use the second wave for the regression and instead omits it? Is it an operator error? Because if I use two "##" instead of one (and shorten my equation properly) to do the following equation

Code:

mi estimate, esampvaryok: reg job_hours survey##treatment, robust

I get the following output

Code:

 . mi estimate, esampvaryok: reg job_hours survey##treatment, robust

Multiple-imputation estimates                     Imputations     =          5
Linear regression                                 Number of obs   =         47
                                                  Average RVI     =     0.1641
                                                  Largest FMI     =     0.1763
                                                  Complete DF     =         43
DF adjustment:   Small sample                     DF:     min     =      28.42
                                                          avg     =      34.41
                                                          max     =      39.55
Model F test:       Equal FMI                     F(   3,   32.3) =       1.60
Within VCE type:       Robust                     Prob > F        =     0.2078

----------------------------------------------------------------------------------
       job_hours |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
        2.survey |  -6.755652   3.470882    -1.95   0.059     -13.7762    .2648965
     1.treatment |     -10.06   14.64265    -0.69   0.497    -39.93717    19.81717
                 |
survey#treatment |
            2 1  |   2.855652   17.76597     0.16   0.873    -33.51187    39.22317
                 |
           _cons |      43.66   2.246678    19.43   0.000     39.11769    48.20231
----------------------------------------------------------------------------------

Here I get the interaction term, so is this now correct? It is basically the same as the first output, only that the interaction term is now positive instead of negative, but almost all other values are identical or very similar. So is this now a properly done diff in diff interaction?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte job_hours float treatment byte survey
 . . 2
 . . 2
 . . 2
13 . 2
 . . 2
 . . 2
46 0 1
42 0 2
46 0 1
42 0 2
46 0 1
42 0 2
46 0 1
42 0 2
46 0 1
42 0 2
46 0 1
42 0 2
 . 1 1
 . 1 2
51 1 1
35 1 2
37 1 1
35 1 2
55 1 1
58 1 2
65 1 1
36 1 2
48 1 1
38 1 2
38 0 1
40 0 2
38 0 1
40 0 2
38 0 1
40 0 2
38 0 1
40 0 2
38 0 1
40 0 2
38 0 1
40 0 2
 . 0 1
 . 0 2
 . 0 1
 . 0 2
 . 0 1
 . 0 2
 . 0 1
 . 0 2
 . 0 1
 . 0 2
 . 0 1
 . 0 2
41 0 1
40 0 2
41 0 1
40 0 2
41 0 1
40 0 2
41 0 1
40 0 2
41 0 1
40 0 2
41 0 1
40 0 2
 0 . 1
 0 . 1
 0 . 1
 0 . 1
 0 . 1
 0 . 1
31 . 1
31 . 1
31 . 1
31 . 1
31 . 1
31 . 1
52 . 1
52 . 1
52 . 1
52 . 1
52 . 1
52 . 1
38 . 1
38 . 1
38 . 1
38 . 1
38 . 1
38 . 1
38 0 1
45 0 2
38 0 1
45 0 2
38 0 1
45 0 2
38 0 1
45 0 2
38 0 1
45 0 2
end

↧

Help with xtreg, splines and trends?

August 15, 2019, 4:28 am

≫ Next: Unzipping file

≪ Previous: Time variable issue when trying a difference in difference approach

Hi all, since my last post I have read the FAQs and so this should be a better post;

Now, I am doing a research paper on the effects of economic crisis on mental health problems in Russia.

As such, I have age specific (5 year groups) mortality data (by cause) on Russia from 1980 to 2000, and the causes are divided into 3: suicide rate, chronic alcoholism and "other psychoses" (ICD 9/10)

The initial variables were then: Age Group, Year, Suicide Rate, Chronic Alcoholism and Other Psychoses: for the rest of this post I focus on the trends of Suicide Rate

After uploading the data to Stata, I used

Code:

egen panel =group(AgeGroup)

to create a panel based on the Age Group and then used

Code:

xtset panel Year

to define my data as panel data.

After this, I created 2 dummy variables, afterfall and aftercrisis. afterfall is a dummy=1 after 1991 (after the fall of the Soviet Union) and aftercrisis is a dummy=1 after 1997 (after the start of the Ruble crisis)

I then ran

Code:

xtreg SuicideRate Year afterfall i.panel,re

and

Code:

xtreg SuicideRate Year aftercrisis i.panel,re

to check the magnitude of the effects of the trend breaks.

A question here: what exactly would be the meaning of the coefficients attched to Year and afterfall/aftercrisis in these regressions? Also, since I used

Code:

i.panel,re

its shows coefficients for each panel: what do these coefficients mean?

Anyway, after this, I wanted to check the what the trend was after and before the two breaks: and so I created splines using

Code:

mkspline prereform 11 reform 17 crisis = time

, so the splines would break the data from 1980 to 1991 and then 1992 to 1997 and then 1998 to 2000: therefore three new spline variables are formed: prereform, reform and crisis

However I am quite unsure of what regression to use now to check the difference in trends before and after the crisis: would I use

Code:

xtreg SuicideRate prereform reform crisis aftercrisis

or individually do them two splines at a time to get the trend before/after the fall and before/after the crisis like this

Code:

xtreg SuicideRate prereform reform afterfall

and

Code:

xtreg SuicideRate reform crisis aftercrisis

???

Also, how do I check for the trends in each age group before and after the fall and the crisis?

My data looks like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 AgeGroup int(Year SuicideRate ChronicAlcoholism OtherPsychoses) float(panel afterfall aftercrisis time) byte(prereform reform crisis)
"15to19" 1980  225   1  1 1 0 0  0  0 0 0
"15to19" 1981  222   0  2 1 0 0  1  1 0 0
"15to19" 1982  225   1  2 1 0 0  2  2 0 0
"15to19" 1983  221   0  1 1 0 0  3  3 0 0
"15to19" 1984  228   0  1 1 0 0  4  4 0 0
"15to19" 1985  202   0  1 1 0 0  5  5 0 0
"15to19" 1986  163   0  2 1 0 0  6  6 0 0
"15to19" 1987  160   0  1 1 0 0  7  7 0 0
"15to19" 1988  183   0  1 1 0 0  8  8 0 0
"15to19" 1989  185   0  1 1 0 0  9  9 0 0
"15to19" 1990  235   0  2 1 0 0 10 10 0 0
"15to19" 1991  242   0  1 1 0 0 11 11 0 0
"15to19" 1992  254   0  2 1 1 0 12 11 1 0
"15to19" 1993  323   1  2 1 1 0 13 11 2 0
"15to19" 1994  354   0  1 1 1 0 14 11 3 0
"15to19" 1995  366   1  1 1 1 0 15 11 4 0
"15to19" 1996  351   1  1 1 1 0 16 11 5 0
"15to19" 1997  347   0  1 1 1 0 17 11 6 0
"15to19" 1998  335   1  2 1 1 1 18 11 6 1
"15to19" 1999  339  12  1 1 1 1 19 11 6 2
"15to19" 2000  363  16  0 1 1 1 20 11 6 3
"20to24" 1980  540   2  3 2 0 0  0  0 0 0
"20to24" 1981  494   4  3 2 0 0  1  1 0 0
"20to24" 1982  526   3  2 2 0 0  2  2 0 0
"20to24" 1983  478   2  2 2 0 0  3  3 0 0
"20to24" 1984  513   4  3 2 0 0  4  4 0 0
"20to24" 1985  436   3  3 2 0 0  5  5 0 0
"20to24" 1986  311   2  1 2 0 0  6  6 0 0
"20to24" 1987  288   1  1 2 0 0  7  7 0 0
"20to24" 1988  298   0  2 2 0 0  8  8 0 0
"20to24" 1989  332   0  1 2 0 0  9  9 0 0
"20to24" 1990  341   3  1 2 0 0 10 10 0 0
"20to24" 1991  354   1  1 2 0 0 11 11 0 0
"20to24" 1992  429   1  2 2 1 0 12 11 1 0
"20to24" 1993  534   4  3 2 1 0 13 11 2 0
"20to24" 1994  649   6  4 2 1 0 14 11 3 0
"20to24" 1995  725   5  5 2 1 0 15 11 4 0
"20to24" 1996  734   5  5 2 1 0 16 11 5 0
"20to24" 1997  724   6  2 2 1 0 17 11 6 0
"20to24" 1998  709   4  1 2 1 1 18 11 6 1
"20to24" 1999  757  27  5 2 1 1 19 11 6 2
"20to24" 2000  796  35  3 2 1 1 20 11 6 3
"25to29" 1980  750  20  4 3 0 0  0  0 0 0
"25to29" 1981  742  24  2 3 0 0  1  1 0 0
"25to29" 1982  770  21  3 3 0 0  2  2 0 0
"25to29" 1983  721  19  4 3 0 0  3  3 0 0
"25to29" 1984  781  17  3 3 0 0  4  4 0 0
"25to29" 1985  639  14  3 3 0 0  5  5 0 0
"25to29" 1986  431   6  2 3 0 0  6  6 0 0
"25to29" 1987  432   5  3 3 0 0  7  7 0 0
"25to29" 1988  448   3  3 3 0 0  8  8 0 0
"25to29" 1989  494   4  1 3 0 0  9  9 0 0
"25to29" 1990  498   7  2 3 0 0 10 10 0 0
"25to29" 1991  513   6  3 3 0 0 11 11 0 0
"25to29" 1992  600   8  5 3 1 0 12 11 1 0
"25to29" 1993  747  21  5 3 1 0 13 11 2 0
"25to29" 1994  863  31  3 3 1 0 14 11 3 0
"25to29" 1995  847  31  6 3 1 0 15 11 4 0
"25to29" 1996  828  19  7 3 1 0 16 11 5 0
"25to29" 1997  767  13  5 3 1 0 17 11 6 0
"25to29" 1998  722  11  4 3 1 1 18 11 6 1
"25to29" 1999  800  42  5 3 1 1 19 11 6 2
"25to29" 2000  867  56  3 3 1 1 20 11 6 3
end

Thank you so much!

↧

Unzipping file

August 15, 2019, 4:34 am

≫ Next: How to use value labels in graph legend rather than variable names

≪ Previous: Help with xtreg, splines and trends?

Hello. I am having trouble unzipping a file with Stata. Please see picture attached.

↧

How to use value labels in graph legend rather than variable names

August 15, 2019, 5:18 am

≫ Next: Dropping dummies from output table

≪ Previous: Unzipping file

I have a bar graph with 11 variables (B1_1 - B1_11) and each variable is binary with a label on the 1.

I would like to know if it is possible to get a bar graph where the bars are labelled in the legend after the labels on the value of each variable.

I have seen the following for using variable labels:

Code:

graph bar (count) B1_*, legend(order(1 "`: var label B1_1'"  2  "`: var label B1_2'"... ))......

But I have not seen an equivalent for the value labels.

Any help would be greatly appreciated

↧

Dropping dummies from output table

August 15, 2019, 5:22 am

≫ Next: Random allocation of observations

≪ Previous: How to use value labels in graph legend rather than variable names

Hi guys,

I hope you can help me. I am working with the following regressions:

regress ylist xlist year_*
estimates store Time_FE

regress ylist xlist importer_* exporter_*
estimates store Country_FE

regress ylist xlist importer_* exporter_* year_*
estimates store Time_Country_FE

In order to show my results in the end in one comprehensive table I use the command:

estimates table Time_FE Country_FE Time_Country_FE, star stats (N)

The problem is I am analysing data for a period of 59 years as well as bilateral trade for 22 countries, which results in an endless long output table (see attachment).

How is possible to drop all these time and country dummies so I do not see them in my table?

Thank you,

Daniel.

↧

Random allocation of observations

August 15, 2019, 5:24 am

≫ Next: Piecewise regression with panel data

≪ Previous: Dropping dummies from output table

Dear Community.

I have about five million observations(men, age 40s).

I'd like to classify these 5 million people into 10 groups (randomly) according to the distribution below.
In other words, each observation should be allocated from group 1 to 10, but should be matched overall proportion of the group.

group proportion
1 0.07
2 0.19
3 0.16
4 0.12
5 0.21
6 0.01
7 0.05
8 0.04
9 0.1
10 0.05
(total 1.00)

In addition, I have an overall observation of about 20 million people, and the above tasks should be performed by gender*age groups.

Thanks in advance.

Best regards,
Yunsun

↧

Piecewise regression with panel data

August 15, 2019, 5:43 am

≫ Next: chaning time to minutes.

≪ Previous: Random allocation of observations

Dear. Statalist

I am trying to do a piecewise regression with panel data but I can't find out much about this topic. Anyone have suggestions on what STATA command to use? Also what are the issues with doing a piecewise regression with paneldata?

This is my equation: nl (KOSTBHG = SIZE*{b1} + (SIZE>{c})*( SIZE-{c})*{b2}), variables(SIZE) initial(b1 0 c 28 b2 0) noconstant, if(Year==2015)

↧

chaning time to minutes.

August 15, 2019, 6:23 am

≫ Next: Matrix, standard deviation of each row

≪ Previous: Piecewise regression with panel data

I have a variable where the data er described as 6D 21H 52M 0S (d= days, H= hours, M=minutes, S= seconds) as a str14 format.

I need the value in minutes, how can i convert this values to total minutes or hours.

Best Regards
Massar

↧

Matrix, standard deviation of each row

August 15, 2019, 7:25 am

≫ Next: Confusion about Confirmatory Factor Analysis with validscale

≪ Previous: chaning time to minutes.

I have a matrix that is 12 rows by 500 columns. I would like to take the average and standard deviation of the rows. I have devised a method to take the mean but I have limited understanding of how to take the sd. For example,

Code:

clear all
clear mata

global num_rows = 3

matrix A = (1,2,9\2,7,5\2,4,18)
matrix B = J($num_rows, 1, 1)
matrix sum = A*B
matrix M = sum/$num_rows
mat all = A,M

How can I add the standard deviation of each row?

Thank you very much for your help.

↧

Confusion about Confirmatory Factor Analysis with validscale

August 15, 2019, 7:30 am

≫ Next: Mathematical function ceil

≪ Previous: Matrix, standard deviation of each row

Dear all,

I am new to STATA and try to test the validity of my measures. I run Stata 16 for Microsoft. My data contains 425 observations of 16 variables, each measured with 2-5 items on a 6-point Likert scale. I tested for reliability with alpha and have values above 0.75. I have tried to conduct an Exploratory Factor Analysis using the factor commandand even specified the number of variables to be created from the items by adding factor(x) to the command, however I did not get the constructs I expected. Almost each item lead to other factors than the ones expected. As I used validated items from the literature, I decided to conduct a Confirmatory Factor Analysis. I used the following command from the Stata conference in 2017 (https://www.stata.com/meeting/france...7_Perrot.pdf):

validscale PE1 PE2 PE3 EE1 EE2 EE3 EE4 SI1 SI2 SI3 FC1 FC2 FC4 PFC1 PFC2 PFC3 IT1 IT2 IT3 DT1 DT2 DT3 DT4 DT5 PFR1 PFR2 PFR3 PVR1 PVR2 PVR3 TR1 TR2 T
> R3 OR1 OR2 OR3 OR4 BI1 BI2 BI3 MF1 MF2 MF3 MF4 IC1 IC2 IC3 IC4 PD1 PD2 PD3 PD4 UA1 UA2, partition(3 4 3 3 3 3 5 3 3 3 4 3 4 4 4 2) scorename(PE EE SI
> FC PFC IT DT PFR PVR TR OR BI MF IC PD UA) graphs compscore(stand) cfa cfamethod(ml) cfasb cfacov(PE1*EE1) alpha(0.7) delta(0.9) h(0.3) hjmin(0.3) tco
> nc(0.4)

And got the results visible in the picture attached.

The Goodness of fit results seem to be acceptable:
Goodness of fit (with Satorra-Bentler correction):

chi2: 2203.11
df: 1256
chi2/df: 1.8
RMSEA [90% CI]: 0.042 [ ; ]
SRMR: 0.058
NFI: 0.858
RNI:0.933
CFI: 0.933
IFI: 0.933
MCI: 0.327
(p-value = 0.000)

My problem are the factor loadings, as they have values over 1. How should I decide whether they are good or bad? Usually they should be between 0.70 and 1. In the presentation of the conference, I did not find additional information.

The conference paper is available here: https://www.researchgate.net/publica...urement_scales.

I would appreciate any help!

Kind regards,
Ana

↧

Mathematical function ceil

August 15, 2019, 8:13 am

≫ Next: Stata 16 - remove "Stata" default folder

≪ Previous: Confusion about Confirmatory Factor Analysis with validscale

Hi guys,

I generated the variable "revenue" in steps of 50 Mio. After that I want to generate the corresponding histogram, but it only shows me positive revenue. Since I have negative an positive revenue, how can I change the code?

So far I got:

Code:

gen revenue = 50 * ceil(WC01001/50e6)

↧

Stata 16 - remove "Stata" default folder

August 15, 2019, 8:20 am

≫ Next: tsline graph replacing time xlabel with contents of a string variable?

≪ Previous: Mathematical function ceil

Stata 16 creates a "Stata" folder in the Documents folder whenever it starts. I want to change the settings so that the folder is not created. I am running Stata/MP 16.0 for Mac (64-bit Intel)

↧

tsline graph replacing time xlabel with contents of a string variable?

August 15, 2019, 9:57 am

≫ Next: Mixed frequency data

≪ Previous: Stata 16 - remove "Stata" default folder

I am using tsline to produce a graph of an environmental measurement (noise) varying with time - but time also represents varying locations as we moved about.

Can I use tsline but replace the X-axis labels with text (from a string variable) showing the locations at which the data were obtained?

Alternately can I annotate a tsline graph in another way to show the locations by the time-ranges? using textboxes perhaps?

↧

Mixed frequency data

August 15, 2019, 10:07 am

≫ Next: Combine graphs through graph editor

≪ Previous: tsline graph replacing time xlabel with contents of a string variable?

I am currently testing the response of 3 UK stock indices to macroeconomic shocks. I have Policy shocks measured on the day of the MPC announcements, Industrial production & Unemployment & Inflation shocks (measured at the end of month) and daily Interest rate and oil shocks. The change in FTSE prices are computed daily.
For data that is not measured at daily intervals, I have 0's in place where the observations are missing. Will this be impacting the results produced by an OLS regression? the rvfplots show scattered residuals but there is many observations at the value 0.
Is there a better method for measuring the variables that occur at a lower frequency? If i replace the 0s with missing values my regression will not run. Is a regression using mixed frequency data of this type possible?

Really appreciate any help
Best,
Connor

↧

Combine graphs through graph editor

August 15, 2019, 10:22 am

≫ Next: How to identify a variable's top/bottom 30% of each year in panel data?

≪ Previous: Mixed frequency data

Dear community!
I have created two graphs and edited both of them through graph editor, now I would like to combine them.
Could you help me how to do this?

↧

How to identify a variable's top/bottom 30% of each year in panel data?

August 15, 2019, 11:21 am

≫ Next: Regression Planes in Stata 15?

≪ Previous: Combine graphs through graph editor

Dear Stata users,

I am trying to run an cross-sectional regression on firms in the bottom 30% and top30% of the distribution of book-to-market value of panel data.
I tried to rank firms every year, but I can't identify the top/ bottom 30% of them, because this an unbalanced panel , and each year's total number of firms is different.
I would be grateful if someone could help me to identify these firms each year.

here's the code i use

Code:

sort gvkey year
local i=1964 // the time period is 1964-2014
while `i'<=2014{
quietly egen per70`i'=pctile(btm), p(70) //btm is the book-to-market value, and I have to find out the firms with top/bottom 30% of the distribution of btm
quietly drop if btm<70`i' 
quietly drop per70`i' 
local i=`i'+1
}

Here's part of my data

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long gvkey double year float btm
 2403 1964 .16921677
 9103 1964 .16258383
 1608 1964 .18209743
10060 1964 .18040167
 1481 1964 .14549348
 4780 1964  .1590813
 3874 1964  .1805083
 3235 1964  .1769771
11535 1964 .17150614
 4475 1964  .1712105
 4453 1964 .12189302
 4021 1964 .18900825
11264 1964 .17115825
10878 1964 .19931643
 6502 1964 .18697643
 8645 1964 .16715898
 6113 1964 .16654503
 3489 1964  .1921034
11280 1964 .16220094
 9616 1964 .18040165
end

Thank you for your help in advance!

↧

Regression Planes in Stata 15?

August 15, 2019, 11:23 am

≫ Next: Renaming and Numbering Variables Sharing Common First 32 Characters

≪ Previous: How to identify a variable's top/bottom 30% of each year in panel data?

Are there any packages to generate 3-dimensional regression plots (regression planes) in Stata 15? There are packages available in R that I have used previously but for consistency with other figures, I would like to be able to do this in Stata should it exist. I have not found anything about it and assume it is because it simply isn't available. Just looking for a confirmation here before I make the switch?

↧

Renaming and Numbering Variables Sharing Common First 32 Characters

August 15, 2019, 11:43 am

≫ Next: How to assign the same value every 3 observations

≪ Previous: Regression Planes in Stata 15?

Hi all! I am importing a large dataset from Excel with many variable names that are over 32 characters. Obviously these are truncated to 32 characters upon import; however a large series of these share the first 32 characters in common. As such I am left with a number of variables whose names are the column number from excel, but whose label's contain the full name.

Example:

Name	Label
PayorGovernmentHealthInsura	Payor - Government Health Insurance - A
AJ	Payor - Government Health Insurance - B
AK	Payor - Government Health Insurance - C
AL	Payor - Government Health Insurance - D
AM	Payor - Government Health Insurance - E
AN	Payor - Government Health Insurance - F
AO	Payor - Government Health Insurance - G
AP	Payor - Government Health Insurance - H
AQ	Payor - Government Health Insurance - I

I've been trying to write code that selects variables with names greater than 32 characters, generates a variable name from the label, truncates that name to 30 characters then adds a sequential two digit number at the end to produce the following result:

Name	Label
PayorGovernmentHealthInsu01	Payor - Government Health Insurance - A
PayorGovernmentHealthInsu02	Payor - Government Health Insurance - B
PayorGovernmentHealthInsu03	Payor - Government Health Insurance - C
PayorGovernmentHealthInsu04	Payor - Government Health Insurance - D
PayorGovernmentHealthInsu05	Payor - Government Health Insurance - E
PayorGovernmentHealthInsu06	Payor - Government Health Insurance - F
PayorGovernmentHealthInsu07	Payor - Government Health Insurance - G
PayorGovernmentHealthInsu08	Payor - Government Health Insurance - H
PayorGovernmentHealthInsu09	Payor - Government Health Insurance - I

Ideally these numbers would restart at 01 for each common 30 character stem.

I have limited coding experience in languages other than STATA so I've been struggling for several hours writing the following clunky code:

gen k = 0
foreach var of varlist _all {
local label : variable label `var'
local new_name = lower(substr(strtoname("`label'"), 1, 30))
if strlen(`new_name') = 30 {
replace k = k + 1
tostring k, replace
local num_name = `new_name' + k
rename `var' `num_name'
destring k, replace
}
else {
rename `var' `new_name'
replace k = 0
}

I also tried using addnumber but the way I've written it each variable gets a "1" added, instead of sequential numbers.

foreach var of varlist _all {
local label : variable label `var'
local new_name = lower(substr(strtoname("`label'"), 1, 30))
rename `var' `new_name'#, addnumber
}

I would love any advice on writing this code or tips on how others have handled a similar issue.

Thanks so much!

Mark

↧