Quantcast
Channel: Statalist
Viewing all 72840 articles
Browse latest View live

How to calculate the elder generation’ maximum education year in one family

$
0
0
Suppose I have dataframe like this:
HTML Code:
  family relationship meanings              edu  
1      1 A            respondent                   12  
2      1 B            respondent's spouse     18  
3      1 C            A's father                      10  
4      1 D            A's mother                    9  
5      1 E1           A's first son                 15  
6      1 F1           E1's spouse                 14  
7      1 G11          E1's first son               3  
8      1 G12          E1's second son          1  
9      1 E2           A's second son             13
10      2 A            respondent                  21
11      2 B            respondent's spouse    6
12      2 C            A's father                    12
13      2 D            A's mother                 16
14      2 E1           A's first son               18
15      2 F1           E1's spouse               15
16      2 E2           A's second son           17
17      2 E3           A's third son               16
relationship: relationship in one family.

meanings: the meanings of second column "relationship".

I want to calculate the father generation’ maximum education year in one family. We do not need spouse's information. The expected results are as follows:

HTML Code:
 family id      edu fedu  
1      1 A        12 10    
2      1 C        10 NA    
3      1 E1       15 18    
4      1 E2       13 18    
5      1 G11       3 15    
6      1 G12       1 15    
7      2 A        21 16    
8      2 C        12 NA    
9      2 E1       18 21    
10      2 E2       17 21    
11      2 E3       16 21
Thanks!

Summarize Table Column Width

$
0
0
I'm new to Stata and sorry for the ignorance, but I can't find the code to make the first column in a summarize table wider. Most of my variable names are well beyond the standard 12 characters and I have plenty of space on my screen to display a wider column. I've tried fvwrapon and cellwidth but they don't work.

How to test nonlinearity between hazard ratio and an Xvar after Cox regression with restricted cubic spline?

$
0
0
Dear Statalist members,

Recently I did a Cox regression with restricted cubic spline (mvrs) in order to graph the non-linear association between hazard ratio and my Xvar1.

Although the graph showed exactly an U-shaped association, the reviewer requied a test for non-linearity (which is actually a P-value in most articles, as far as I have learnt).

Is there any code to test such hypothesis in Stata? Based on my knowledge, it seems like what I am testing is whether the coefficient for cubicly transformed Xvar (Xvar_0, Xvar_1, Xvar_2) is equal to 0 (see the results below). Is that enough for proving the non-linearity?

Thx a lot.


Code:
 xi: mvrs stcox Xvar age sex stemi hbp dm hf_2 pci_his ldl crea i.culp timi_0 d2btime thrombo iabp tiro asp

Final multivariable spline model for _t
------------------------------------------------------------------------------
    Variable |    -----Initial-----          -----Final-----
             |   df     Select   Alpha    Status    df    Knot positions
-------------+----------------------------------------------------------------
         Xvar |    4     1.0000   0.0500     in      3     [lin] 2.79 12.07
         age |    4     1.0000   0.0500     in      1     Linear
         sex |    1     1.0000   0.0500     in      2     Linear
       stemi |    1     1.0000   0.0500     in      2     Linear
         hbp |    1     1.0000   0.0500     in      2     Linear
          dm |    1     1.0000   0.0500     in      2     Linear
        hf_2 |    1     1.0000   0.0500     in      2     Linear
     pci_his |    1     1.0000   0.0500     in      2     Linear
         ldl |    4     1.0000   0.0500     in      1     Linear
        crea |    4     1.0000   0.0500     in      1     Linear
    _Iculp_2 |    1     1.0000   0.0500     in      2     Linear
    _Iculp_3 |    1     1.0000   0.0500     in      2     Linear
    _Iculp_4 |    1     1.0000   0.0500     in      2     Linear
    _Iculp_5 |    1     1.0000   0.0500     in      2     Linear
      timi_0 |    1     1.0000   0.0500     in      2     Linear
     d2btime |    4     1.0000   0.0500     in      1     Linear
     thrombo |    1     1.0000   0.0500     in      2     Linear
        iabp |    1     1.0000   0.0500     in      2     Linear
        tiro |    1     1.0000   0.0500     in      2     Linear
         asp |    1     1.0000   0.0500     in      2     Linear
------------------------------------------------------------------------------


Cox regression -- Breslow method for ties
Entry time _t0                                    Number of obs   =       3980
                                                  LR chi2(22)     =     235.83
                                                  Prob > chi2     =     0.0000
Log likelihood = -935.89368                       Pseudo R2       =     0.1119

------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       Xvar_0 |   .0350324    .087242     0.40   0.688    -.1359588    .2060236
       Xvar_1 |  -.1742873    .077077    -2.26   0.024    -.3253555   -.0232192
       Xvar_2 |   .2408546   .0900958     2.67   0.008     .0642702    .4174391
         age |   .0468714   .0086012     5.45   0.000     .0300133    .0637295
         sex |   -.513448   .1966388    -2.61   0.009     -.898853   -.1280431
       stemi |  -.2819774   .2841655    -0.99   0.321    -.8389316    .2749769
         hbp |   .3275591   .2001628     1.64   0.102    -.0647527    .7198709
          dm |   .0735698   .1830752     0.40   0.688    -.2852511    .4323906
        hf_2 |   1.216276   .1854233     6.56   0.000     .8528525    1.579699
     pci_his |   .3591419   .2500795     1.44   0.151    -.1310049    .8492887
         ldl |   -.042943   .1036864    -0.41   0.679    -.2461646    .1602787
        crea |   .0104941   .0021757     4.82   0.000     .0062299    .0147584
    _Iculp_2 |   .7382852   .3876987     1.90   0.057    -.0215903    1.498161
    _Iculp_3 |   .9029866   .3816996     2.37   0.018     .1548691    1.651104
    _Iculp_4 |     1.6113     .47715     3.38   0.001     .6761029    2.546497
    _Iculp_5 |   1.771119   .7053499     2.51   0.012     .3886584    3.153579
      timi_0 |   .4584349   .2198601     2.09   0.037     .0275171    .8893528
     d2btime |   1.42e-06   9.61e-06     0.15   0.883    -.0000174    .0000203
     thrombo |   .0092184   .2040014     0.05   0.964    -.3906169    .4090538
        iabp |   .6062582   .2079598     2.92   0.004     .1986645    1.013852
        tiro |  -.0185661   .2624105    -0.07   0.944    -.5328812     .495749
         asp |  -.1059745   .4885523    -0.22   0.828    -1.063519    .8515703
------------------------------------------------------------------------------
Deviance: 1871.787.

. testparm Xvar_*

 ( 1)  Xvar_0 = 0
 ( 2)  Xvar_1 = 0
 ( 3)  Xvar_2 = 0

           chi2(  3) =   10.60
         Prob > chi2 =    0.0141

Array

add different suffixes to 1500+ variable names using foreach

$
0
0
Dear all,

I have been trying to add different suffixes (numbers) to a set of variables named v1, v2, v3... but my code seems not working. The suffixes come from the variable id. Namely, I want to rename my variables as follows:

v1 ==> d_1101506022
v2 ==> d_1101506032
v3 ==> d_1101506073
.
.
.

My code is as follows:

foreach x in v* {
local var = id
local suffix = `var'[`_n']
rename `x' d_`suffix'
}

Will you please tell me what I am doing wrong? Thank you
Array

Highly Left Skewed variable

$
0
0
Hello,

I wanted to run a panel logistic regression with dependent variable Y (0, 1) and an explanatory variable X with other control variables.

Where, X is the index variable scaled between 0 and 1 and it is highly left-skewed (most of the observations near to 1, see the Histogram below).

Is it fine if I include the original X variable in the regression without transformation? or Do I need to use any other regression model (ordered logit or probit)?


Thanks for your consideration.

Array

posterior means for crossed random effects from logistic random intercept model

$
0
0
dear statalist members,

I am trying to estimate the posterior means of two crossed random effects from a logistic random intercept model (estimated with melogit). Although the model is quite huge (some 24000 observations, around 25 individual-level covariates and two crossed random effects), estimation nevertheless converges without problems after some time (around 50 hours). However, when I try to calculate posterior means of both random effects(with predict re_*, reffects) this takes an appearingly endless amount of time (currently 15 days).

To check how fast this type of postestimation works with a much smaller problem, I have created an artificial data set with just 200 observations, two individual-level covariates and two crossed random effects for two grouping variables with 12 and 18 categories. Model estimation runs smoothly and the results conform to what I would expect, given the structure of the artificial data (all syntax and output below). However, the calculation of the posterior means again takes a lot of time and has not come to an end by now (after approximately 3 days). Although the user is warned that "computing empirical Bayes means for a crossed-effects model is very time consuming", calculation seems extremely slow, given the apparently limited size of the estimation problem (?).

I should probably mention that I have no problems to estimate posterior modes (with predict …, reffects rmodes).

It would appreciate very much any comments and ideas to the following points:
1. Is there some fundamental issue with the calculation of posterior means for crossed random effects, I should be aware of?
2. Has anybody successfully used "predict …, reffects" after a logistic mixed model with crossed random effects? What's the experience regarding the amount of time needed?
3. Are there any alternative (and preferably faster) estimation methods for posterior means in this type of situation?

Thanks a lot for any suggestions!
Stefan



My do-file:

Code:
clear
set obs 200

gen x1 = rnormal(0,1)
gen x2 = rnormal(0,1)

gen u1 = rnormal(0,1)
gen u2 = rnormal(0,1)
correlate x* u*

xtile re1 = u1, nq(12)
xtile re2 = u2, nq(18)

by re1, sort: egen postmean1 = mean(u1)
by re2, sort: egen postmean2 = mean(u2)

gen e = rnormal(0,1.5)

gen z = (0.7*x1)+(-0.5*x2)+postmean1+postmean2+e
gen prob = 1/(1+exp(-1*z))
sum prob
recode prob (min/0.6=1)(*=0), gen(d)

melogit d x1 x2 || _all:R.re1 || re2:, diff

* started ca 15:30 pm 21. 12. 2019
predict pre*, reffects


Model-Output:

Array

Multilevel Models

$
0
0
Hi; I am runing runing linear model with four levels. I have not been able to find out what the meaning and usefulness of the variable being reported right after the constant (cons) from the stata output. They are "lns1_1_1:_cons" to "atr3_1_1_2:_cons" . I can imagine they are the changes of the "cons" variable for every level. Nontheless, I see 5 of them, so that is difficult to interpret.

Thanks a lot for your help. Leonardo

_cons 4.339***
(0.082)
lns1_1_1:_cons -3.354***
(0.055)
lns2_1_1:_cons -5.163***
(0.225)
lns3_1_1:_cons -5.621***
(0.043)
lns3_1_2:_cons -2.245***
(0.013)
atr3_1_1_2:_cons -0.898***

Identifying the newly employed receptionist and the waitresses who joined the enterprise after the new receptionist assumed office

$
0
0
Hi,
I have the follwoing sample:
Code:
ssc install dataex
clear
input int year str6 staff_id str5 hotel_id byte(waitress receptionist)
2009 "124665" "23453" 1 0
2009 "455543" "23453" 0 1
2009 "334532" "23453" 1 0
2009 "888976" "23453" 1 0
2010 "124665" "23453" 1 0
2010 "455543" "23453" 0 1
2010 "334532" "23453" 1 0
2010 "556333" "23453" 1 0
2011 "124665" "23453" 1 0
2011 "877776" "23453" 0 1
2011 "666755" "23453" 1 0
2011 "556333" "23453" 1 0
2012 "124665" "23453" 1 0
2012 "877776" "23453" 0 1
2012 "666755" "23453" 1 0
2012 "777554" "23453" 1 0
2009 "564400" "66766" 1 0
2009 "990988" "66766" 0 1
2009 "008998" "66766" 1 0
2009 "669009" "66766" 1 0
2010 "564400" "66766" 1 0
2010 "990988" "66766" 0 1
2010 "455543" "66766" 1 0
2010 "556333" "66766" 1 0
2011 "564400" "66766" 1 0
2011 "777799" "66766" 0 1
2011 "881100" "66766" 1 0
2011 "556333" "66766" 1 0
2012 "669009" "66766" 1 0
2012 "990988" "66766" 0 1
2012 "881100" "66766" 1 0
2012 "778811" "66766" 1 0
2009 "564400" "99088" 1 0
2009 "004377" "99088" 0 1
2009 "110925" "99088" 1 0
2009 "778811" "99088" 1 0
2010 "564400" "99088" 1 0
2010 "004377" "99088" 0 1
2010 "669009" "99088" 1 0
2010 "888976" "99088" 1 0                              
end
Firstly, I would like to create a dummy variable (equals 1 and 0 otherwise) if a receptionist was changed with another new receptionist by comparing the current year with the previous year(s) for each firm-year observation. For example, by comparing the years 2009 and 2010 with 2011, the receptionist No. "877776" in Hotel No. "23453", should be identified as a newly employed receptionist in 2011(equal to 1).

Secondly, I would like to create a dummy variable (equals 1 and 0 otherwise) if the waitress in a particular firm-year observation joined the hotel after the receptionist assumed office. In other words, I would like to identify those who came after the employment of the receptionist of the hotel. Note that I only have one receptionist per firm-year observation.

Thanks in advance

Creating Lagged variables with repeated values

$
0
0
Dear all,

I wanted help in creating lagged variables when there are repeated values in the dataset.

In particular, I want to construct monthly lags for the dependent variable State Wins for the following regression:

reg StateWins ramadan_month lagged_one_month_StateWins lagged_two_month_StateWins

Here is a snapshot of my dataset:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(StateWins ramadan_month) int yeardecision str9 monthdecided str21 islamicdate
1 0 1953 "July" "Shawwal, 1372"
1 0 1953 "July" "Shawwal, 1372"
1 0 1953 "July" "Shawwal, 1372"
0 1 1953 "June" "Ramadan, 1372"
1 1 1953 "June" "Ramadan, 1372"
1 1 1953 "June" "Ramadan, 1372"
1 0 1953 "March" "Djumadal-Akhira, 1372"
0 0 1954 "March" "Djumadal-Akhira, 1372"
end

tsset month or tsset yeardecision gives me error that I have repeated values in the sample so I cannot use l.StateWins
Code:
. tsset yeardecision
repeated time values in sample
r(451);

. tsset month
repeated time values in sample
r(451);
How may I construct a lagged variable at monthly level for my dependent variable StateWins?

Thank you very much. Your help here will really be aappreciated.

Kind Regards,
Roger

Specifying IV with panel data, binary treatment and binary/count outcome

$
0
0
Dear all Statalisters

I am working on an application of an instrumental variable (IV) approach to address unobserved confounding. My main aim at this stage is to understand what estimation method is most suitable and implementation in Stata. I have read a lot of literature on this topic, however, I think input from the Statalist forum could be beneficial. Especially as I see many highly skilled researchers that have published on specifications of IV are active in this community (e.g. excellent work by Jeff Wooldridge Joao Santos Silva and others).

The context is a just-identified application with one endogenous variable and one instrument with the following specifications:
  • Panel data: 7 years on a population of approximately 10 000 patients.
  • Instrument: Continuous (rate, i.e. “preference instrument”), which I have also considered implementing as a set of binary indicators using Statas -i.IV- specification.
  • Treatment: Binary (medication over a given period, yes/no).
  • Outcome: Count/binary (outcome can be defined both ways).
The approach I find most suitable at the moment is a combination of GLM and GMM as outlined in Johnston et al. (2008), which address both binary and count responses (but have not found specific discussion on when the IV is continuous). I also see that Wooldridge comment in #6 here that the FE Poisson estimator is preferable when addressing endogeneity in count panel models, although I am unsure whether this applies to am application with a continuous instrument.

Additionally, I see the Arellano-Bond estimator as a relevant approach. My (currently vague) understanding of the Arellano-Bond estimator is that the lagged versions of the dependent variable itself becomes the instrument, so we would not need an additional instrument.

Hopefully some of you have input on this IV-setup and it's implementation in Stata (I assume a version of -xtivreg- is the way to go).

Twoway graph X axis

$
0
0
Hi all,

Would anyone know whether there is a similar over () function such as with the graph bar (asis) command that can be used with the twoway stacked bar command?

I have a created a stacked bar graph over weeks in the x axis and would like to also include months in the x axis as well.

graph twoway (bar AGE_N_0_4 week,color(black) lcolor(black)lwidth(medthick)) (rbar AGE_N_0_4 A2 axis ,color(gs7*0.9) lcolor(black)lwidth(medthick)) (rbar A2 A3 week,color(gs7*0.4) lcolor(black)lwidth(medthick)) (rbar A3 A4 week,color(dimgray) lcolor(black)lwidth(medthick))(rbar A4 A5 week,color(white) lcolor(black)lwidth(medthick)) (line THREE_WEEK_AVERAGE_COUNT week, lpattern(solid)color (red) recast(connected) msymbol(O))(line TWO_SD_THREE_WEEK_COUNT week, lpattern(dash)color (blue) recast(connected) msymbol(x))(line HISTORICAL_AGERAVE week, lpattern(dash)color (gray) recast(connected) msymbol(s)),xlabel(1(2)62,valuelabel angle(90) labsize(*1.0)tick)xtick(1(1)62) ytitle("X", size(small) height(4)) ylabel(,nogrid)graphregion(color(white)) ylabel(0(1)10, labsize(*1.0)angle(0))

why the bias corredted confidence interval is missing?

$
0
0
Hi all,
why the confidence interval after bootstraping is missing?

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(gender age edu) float(id serial gpMediator gpIV DV gpDV Mediator)
0 49 2 1 1   .54861116     .15625   5         1  4.944445
0 49 2 1 2  -.17361116    -.09375   4         0 4.2222223
0 49 2 1 3   .54861116    -.09375   4         0  4.944445
0 49 2 1 4      -.0625    -.09375   4         0 4.3333335
0 49 2 1 5    .3263888    -.09375   4         0 4.7222223
0 49 2 1 6   -.3958335    -.09375   3        -1         4
0 49 2 1 7   -.3958335     .40625   4         0         4
0 49 2 1 8   -.3958335    -.09375   4         0         4
0 46 1 2 1      -.4375    1.46875   4     -.375  4.111111
0 46 1 2 2   -.3819447     .71875   4     -.375 4.1666665
0 46 1 2 3   .00694418    -.53125   4     -.375 4.5555553
0 46 1 2 4    .2291665    -.03125   4     -.375  4.777778
0 46 1 2 5  -.54861116   -1.03125   4     -.375         4
0 46 1 2 6    .2291665     .21875   5      .625  4.777778
0 46 1 2 7    .4513888    -.28125   5      .625         5
0 46 1 2 8    .4513888    -.53125   5      .625         5
0 41 3 3 1 -.013888836    -.09375   4     .0625         4
0 41 3 3 2 -.013888836     .15625   4     .0625         4
0 41 3 3 3 -.013888836    -.09375   4     .0625         4
0 41 3 3 4   .09722233    -.09375 3.5    -.4375  4.111111
0 41 3 3 5 -.013888836    -.09375   4     .0625         4
0 41 3 3 6 -.013888836     .15625   4     .0625         4
0 41 3 3 7 -.013888836    -.09375   4     .0625         4
0 41 3 3 8 -.013888836     .15625   4     .0625         4
0 46 3 4 1 -.007936478 .035714388   4 1.2857144  4.111111
0 46 3 4 2   .04761887  -.9642856   3  .2857144 4.1666665
0 46 3 4 3   -.0634923   .2857144   3  .2857144 4.0555553
0 46 3 4 4  -.11904764   .2857144   2 -.7142856         4
0 46 3 4 5   .10317469   .7857144   2 -.7142856 4.2222223
0 46 3 4 6     .325397 .035714388   3  .2857144 4.4444447
0 46 3 4 7           .          .   .         .         .
0 46 3 4 8   -.2857144  -.4642856   2 -.7142856  3.833333
1 59 2 5 1     .583333     .03125 3.5     .8125 4.5555553
1 59 2 5 2   .02777767    -.21875   1   -1.6875         4
1 59 2 5 3    .4166665    -.21875   2    -.6875  4.388889
1 59 2 5 4        -.25     .03125 3.5     .8125  3.722222
1 59 2 5 5   -.3611112     .53125 3.5     .8125  3.611111
1 59 2 5 6  -.19444466     .28125 2.5    -.1875  3.777778
1 59 2 5 7        -.25    -.21875 2.5    -.1875  3.722222
1 59 2 5 8   .02777767    -.21875   3     .3125         4
0 40 5 6 1   -.0833335     .65625   4    -.0625  4.111111
0 40 5 6 2   .47222185     .40625 4.5     .4375 4.6666665
0 40 5 6 3   .02777767     .15625 4.5     .4375 4.2222223
0 40 5 6 4   -.0833335    -.34375   4    -.0625  4.111111
0 40 5 6 5   -.0833335    -.34375   4    -.0625  4.111111
0 40 5 6 6  -.13888931    -.34375 3.5    -.5625 4.0555553
0 40 5 6 7   -.0833335     .15625   4    -.0625  4.111111
0 40 5 6 8  -.02777815    -.34375   4    -.0625 4.1666665
0 41 4 7 1       .0625      -.125   4     -.125 4.1666665
0 41 4 7 2   .34027815      -.125   5      .875 4.4444447
0 41 4 7 3   .11805582      -.125   4     -.125 4.2222223
0 41 4 7 4   -.1041665      -.125   4     -.125         4
0 41 4 7 5   -.1041665       .375   4     -.125         4
0 41 4 7 6   -.1041665       .125   4     -.125         4
0 41 4 7 7   -.1041665       .125   4     -.125         4
0 41 4 7 8   -.1041665      -.125   4     -.125         4
1 36 4 8 1           .          .   .         .         .
1 36 4 8 2           .          .   .         .         .
1 36 4 8 3  -.11111116       .125 4.5       1.5 4.6666665
1 36 4 8 4  -.11111116      -.375 3.5        .5 4.6666665
1 36 4 8 5  -.05555534       .125   3         0 4.7222223
1 36 4 8 6   .22222233      -.375   2        -1         5
1 36 4 8 7     .166667       .375   3         0  4.944445
1 36 4 8 8  -.11111116       .125   2        -1 4.6666665
0 36 3 9 1    .4722221          0   4         0  3.888889
0 36 3 9 2   .58333325        .25   4         0         4
0 36 3 9 3   -2.027778       -.25   4         0  1.388889
0 36 3 9 4         .25        .25   4         0  3.666667
0 36 3 9 5         .25          0   4         0  3.666667
0 36 3 9 6    .4722221       -.25   4         0  3.888889
end
label values gender gender
label def gender 0 "男", modify
label def gender 1 "女", modify
label values edu edu
label def edu 1 "大专以下", modify
label def edu 2 "大专", modify
label def edu 3 "本科", modify
label def edu 4 "硕士", modify
label def edu 5 "博士", modify


xtset id serial

global ctrl gender age edu





****//examine medaition
set seed 10000
capture program drop Med
program define Med, rclass

mixed Mediator $ctrl l.gpMediator cl.gpIV || id: l.gpMediator  cl.gpIV , cov(exc)
    return scalar a=(_b[Mediator:L.gpIV] )
    
mixed DV $ctrl l.gpDV cl.gpIV c.gpMediator|| id:cl.gpIV c.gpMediator l.gpDV,  cov(exc)
    return scalar b=(_b[DV:gpMediator])    
end

bootstrap mediation=(r(a)*r(b)) , reps(5) reject(e(converged)!=1) cluster(id) idcluster(newid) group(serial): Med
estat bootstrap,percentile bc
program drop Med

Just trying to compute least square means for a repeated measures ANOVA!

$
0
0
I can't get adjusted means for both treatment conditions (drug and placebo) across three time points. Margins will give me estimation for the time points, but not by condition. Syntax is:

anova *IV* drugcondition/subjid|drugcondition timepoint drugcondition#timepoint c.*covariate*, repeated(timepoint)

I tried adjust xi regress, and margins, and even separating the data to create separate drug and placebo files to try to generate timepoint margins. I could get adjusted margins for placebo but not the drug condition in that case.

What am I doing wrong? Is there a way to manually calculate least square means from unadjusted means?

Cross sectional regression by year and sized-based peer groups

$
0
0
Firstly, Happy Holidays to everyone

Now my problem. I need to run a cross-sectional regression by year and by size-based peer groups. Size-based peer groups are chosen by matching the year and the 50 adjacent observations (25 firms with a lower value of total assets and 25 firms with a higher value of total assets.
Can anyone please suggest how I can generate those peer groups and run the regression?
I am using Stata 14.2

Thank you.

Winsorization

$
0
0
I have financial data (a dependent variable) that is heavily skewed with a long tail. Is it acceptable to winsorize only one tail (i.e., symmetrically)? Also, is it acceptable to winsorize each tail at different levels? Thank you!

order variables using foreach

$
0
0
Dear all,

I have around 1600 variables with suffixes coming from the variable id. I was wondering whether you could help me with revising the code below so that the new order of the variables is the same as rows? For instance, con_1504 should come first after id, and con_1503 second, and so on

My code may be nowhere close to the correct version but here it is:

foreach var in con_1-con_2482 {
forval j = 1/1657 {
local this = string(id[`j'])
local that = substr(`var', 5, .)

}
}

Thank you in advance for any suggestions and Merry Christmas to all!

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int id float(con_4 con_6 con_8 con_16 con_17 con_19 con_23 con_24 con_26)
1504 0 0 0 0 0 0 0 0 0
1503 0 0 0 0 0 0 0 0 0
1625 0 0 0 0 0 0 0 0 0
1502 0 0 0 0 0 0 0 0 0
1626 0 0 0 0 0 0 0 0 0
1627 0 0 0 0 0 0 0 0 0
1761 0 0 0 0 0 0 0 0 0
1760 0 0 0 0 0 0 0 0 0
1762 0 0 0 0 0 0 0 0 0
1759 0 0 0 0 0 0 0 0 0
end

Convert column into row

$
0
0
Hi
,
I want to transport the
follwoing data sample
( column into row
)

----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -
dataex-. To install: 
ssc install 
dataex

clear
input byte 
aggrement_assurance float
(
p_freq_cust_satifi_ass_cctv 
p_freq_cust_satifi_ass_dripolite 
p_freq_cust_satifi_ass_driskill 
p_freq_cust_satifi_ass_pickpock 
p_freq_cust_satifi_ass_comfot)
1 
 1.
55  2.06 
 1.55 20.
62  8.25
2 
 3.
09  13.4 10.82 40.21 28.87
3 21.65 52.06 54.12 24.74 44.85
4 61.34 28.87 32.99 11.86 17.53
5 12.37 
 3.61  
 .52 
 2.58  
 .52

end
label values 
aggrement_assurance 
cust_satifi_ass_cctv
label def 
cust_satifi_ass_cctv 1 "strongly disagreed", 
modify
label def 
cust_satifi_ass_cctv 2 "
disagreed", 
modify
label def 
cust_satifi_ass_cctv 3 "Neutral", 
modify
label def 
cust_satifi_ass_cctv 4 "
Agreed", 
modify
label def 
cust_satifi_ass_cctv 5 "Strongly Agreed", 
modify
------------------ copy up to and including the previous line ------------------

I want to like that
way
aggrement_assurance strongly disagreed disagreed Neutral Agreed Strongly Agreed
p_freq_cust_satifi_ass_cctv 1.55 3.
09
21.65 61.34 12.37
p_freq_cust_satifi_ass_dripolite 2.06 13.4 52.06 28.87 3.61
p_freq_cust_satifi_ass_driskill 1.55 10.82 54.12 32.99 0.52
p_freq_cust_satifi_ass_pickpock 20.62 40.21 24.74 11.86 2.58
p_freq_cust_satifi_ass_comfot 8.25 28.87 44.85 17.53 0.52

Loops

$
0
0






I have two questions regarding two problems that I have been facing so far.

First after running a probit model I want to calculate he percentage of correctly called event by using different cutoffs. I am trying to do the following:

first set the cutoffs: y=.09 y=.1 y=.11 y=.12

for each y{

estat classification, cutoff(`y')
}

In here I want to test different cutoffs for the predicted probability in a loop. Should I use foreach or forvalue command in this case and how to do that?


Second, I want to calculate the sum of wrongly called event and don't know how to do that by using Stata command.

After command
estat classification, cutoff(.12)


Array

I want to calculate the sum of the false rate (24.54 + 33.75) in a loop. Is there any way to do that in Stata?


Thanks in advance













P-Values for Heteroskedasticity-Robust Inference

$
0
0
Hi there,

What distribution are the p-values based off of when an OLS regression of y on 1,x1,...,xk is conducted, using the robust command? How about the p-values for a test of multiple exclusion restrictions post estimation using the robust command?

I would assume for the former that the p-values would be based on a standard normal since the robust standard errors are only robust asymptotically and the t-distributed random variable is asymptotically standard normal.

Thanks!

Ensuring correct model specification (diff-in-diff)

$
0
0
Hello!

I am studying the impact of non-monetary rewards incentive introduced by online platform on the quality of online reviews. The reward program was introduced by the platform in 2016. Those users who self-selected to participate in the rewards program had to complete additional registration on the platform in order to start earning points for their contributions (reviews). Although both types of users (i.e., program participants and non-participants) are intrinsically motivated, the assumption is that extrinsically motivated participants will provide reviews of higher quality (measured in terms of readability, cohesion, coherence, length, etc.). To test my hypothesis, I collected post-level yearly (2008-2018) data for a sample of platform users:
Code:
                   Before 2016    After 2016    
Non-participants:     8247            20825      29072
Participants:        13348           112498     125846
                     21595           133323

Total number of posts: 154918
Total number of users: 8473
To estimate the effect, I am using a difference-in-difference model (assuming it is suitable for my case):
Code:
reg y time##treated, vce(cluster user_id)
where y is the outcome of interest, time = 1 if year >= 2016, and treated = 1 if a given post was written by a user that self-selected to participate in the rewards program. I did check the parallel trend assumption (which was met). And I do realize there is self-selection bias, which I will address later.

Does my approach to test the impact of the rewards incentive seem plausible in terms of the difference-in-difference analysis?

Additionally, if the platform randomly selected users that would be participants in the program, then I would think of this research design as a natural experiment. Or, if the platform randomly sent out invitations prompting users to participate and some of them decided to do so, then I would think of it as a randomized controlled trial. However, in my case users have self-select to participate in the program once it was offered by the platform to all its users -- is there a specific name for such research design?

Please let me know if you have any additional questions.

Thank you for your feedback!
Viewing all 72840 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>