Quantcast
Channel: Statalist
Viewing all 72776 articles
Browse latest View live

merge files; only using data on certain participants

$
0
0
Hi there
I am trying to merge two data sets, B into A.

They both have the same number of original participants (n=8000), but
A is my main data set, on which I've done a complete case analysis and dropped 2500 participants (n = 5500)
B is a data set of just one variable, which has measurements on all 8000 participants.

When I merge, I want to add in the data from dataset B, 1:1 only added to all my participants left in dataset A (ie, 5500); I do not want all 8000 participant's measured variable in database B to be added to database A

The latter situation is what happens when I type the code:
merge 1:1 participant_id using database_b.dta


Does anybody know to only merge data for the participants I want?

Thanks
Al

Where are the critical values of DFGLS unit root test stored?

$
0
0
A DFGLS test provides a result table with critical values at 1%, 5%, 10%, e.g. like this:

PHP Code:
dfgls ln_GDP_AUT_sa_shw1 if month >= month <= 156maxlag(12)
 
DF-GLS for ln_CPI_AUT_sa_~1                              Number of obs =   144
 
               DF
-GLS tau      1Critical       5Critical      10Critical
  
[lags]     Test Statistic        Value             Value             Value
------------------------------------------------------------------------------
    
12           -1.373           -3.512            -2.809            -2.532
    11           
-0.688           -3.512            -2.826            -2.548
    10           
-0.672           -3.512            -2.843            -2.563
    9            
-1.200           -3.512            -2.859            -2.578
    8            
-1.457           -3.512            -2.875            -2.593
    7            
-2.146           -3.512            -2.890            -2.607
    6            
-1.849           -3.512            -2.904            -2.620
    5            
-2.451           -3.512            -2.918            -2.632
    4            
-2.090           -3.512            -2.930            -2.644
    3            
-2.786           -3.512            -2.942            -2.655
    2            
-3.344           -3.512            -2.954            -2.665
    1            
-3.716           -3.512            -2.964            -2.674
 
Opt Lag 
(Ng-Perron seq t) = 12 with RMSE  .0070382
Min SC   
=  -9.46413 at lag 12 with RMSE  .0070382
Min MAIC 
= -9.684341 at lag 12 with RMSE  .0070382 
But these critical values are not stored, e.g.
PHP Code:
matrix list r(results
yields only the following table:

PHP Code:
             k        MAIC         SIC        RMSE       DFGLS
r1          12  
-9.6843409  -9.4641303   .00703824  -1.3725487
r1          11   
-9.575829  -9.3296057   .00765897  -.68797794
r1          10  
-9.5905321  -9.3638727   .00765991  -.67219615
r1           9   
-9.501018  -9.3241323   .00794964  -1.1998016
r1           8  
-9.4865895  -9.3479864   .00799211  -1.4566636
r1           7   
-9.377198  -9.3221978   .00823675  -2.1459997
r1           6  
-9.4126014  -9.3408127   .00830248  -1.8488197
r1           5  
-9.3416657   -9.344457   .00843162   -2.450565
r1           4  
-9.3767739  -9.3561343   .00852844  -2.0897924
r1           3  
-9.2949919  -9.3554019   .00868007  -2.7859181
r1           2  
-9.2513679  -9.3781925   .00873109  -3.3436531
r1           1  
-9.2487849  -9.4117624   .00873521  -3.7159741 
Can anybody please tell me where to receive the 5% and 10% critical values, in order to write them with "putexcel" to an Excel file?

Thanks a lot!
Nora

Counting events in intervals (across variables and unique by observation)

$
0
0
Dear Stata Users,

I am working with a two-period panel of individual survey data (in long format). Each individual has been interviewed on different days in the survey sampling period. Based on the geographic location of the individuals I have merged daily weather data over a period of 30 years (in wide format), which looks something like this:


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double id float(survey_date b1_2008 b2_2008 b3_2008 b4_2008 b5_2008 b6_2008 b7_2008 b8_2008 b9_2008 b10_2008)
 1 17905  7.552024  4.619773  3.448182 7.221909 7.731406 7.477907  6.721896  9.776484  8.882518  10.22792
 1 19016  7.552024  4.619773  3.448182 7.221909 7.731406 7.477907  6.721896  9.776484  8.882518  10.22792
 2 18127  7.663669  5.057653  4.334262 6.445424 6.486279 7.203562  4.021591 8.0055685  6.073585 4.5712934
 2 19239  7.663669  5.057653  4.334262 6.445424 6.486279 7.203562  4.021591 8.0055685  6.073585 4.5712934
 3 18127  7.874044   5.43817   4.33868 6.518519 6.899051 7.492062  3.785516  8.283708  6.532521  4.465155
 3 19220  7.874044   5.43817   4.33868 6.518519 6.899051 7.492062  3.785516  8.283708  6.532521  4.465155
 4 18330 10.081314  6.162988 2.8859046 7.558293 7.915502 8.341213  7.818742 10.259943  9.608234  9.788368
 4 19429 10.081314  6.162988 2.8859046 7.558293 7.915502 8.341213  7.818742 10.259943  9.608234  9.788368
 5 18474  9.556124  6.624081  2.664714  7.26975 7.671178 8.176036  7.846913 10.426918  9.642741  10.21776
 5 19591  9.556124  6.624081  2.664714  7.26975 7.671178 8.176036  7.846913 10.426918  9.642741  10.21776
 6 18449  8.467342 4.6309233  3.083597 7.518037 7.600839 7.774574  6.834048  9.931305  9.774855 10.392326
 6 19542  8.467342 4.6309233  3.083597 7.518037 7.600839 7.774574  6.834048  9.931305  9.774855 10.392326
 7 18438  8.467342 4.6309233  3.083597 7.518037 7.600839 7.774574  6.834048  9.931305  9.774855 10.392326
 7 19555  8.467342 4.6309233  3.083597 7.518037 7.600839 7.774574  6.834048  9.931305  9.774855 10.392326
 8 18553  9.031861  5.689291  4.402537 8.380325  8.29756 9.593754 9.4122715 10.620927 10.582848 10.907956
 8 19702  9.031861  5.689291  4.402537 8.380325  8.29756 9.593754 9.4122715 10.620927 10.582848 10.907956
 9 18563  9.929608  8.596379 3.2979546 6.980475  7.20517 6.755096  6.101957  8.037453  7.017509  5.959684
 9 19659  9.028942  6.848519  2.288981 5.800806 6.274816 6.036836  4.974687  7.403302  5.736906  5.278121
10 18562  8.462659  5.483351  3.225748 7.804394 9.113503 8.557564  8.181511 11.001647 10.074936 11.177246
10 19747  8.462659  5.483351  3.225748 7.804394 9.113503 8.557564  8.181511 11.001647 10.074936 11.177246
end
format %td survey_date
Note: b1_2008 is the temperature on 1st January 2008 and so on. I only show 10 days here.

I am now trying to compute different weather variables based on the daily weather observations in the year before the interview (e.g. counting the number of days the daily mean temperature was >10°C). If every individual had been interviewed on the same day (e.g. 01jan2009) this could be achieved with a simple foreach loop:

Code:
gen temp1 = 0
qui foreach v of var b1_2008-b366_2008{
        replace temp1 = temp1 + (`v' > 10 & `v' !=.)
}
However, the difficulty is, that each individual has a different interview date, i.e. the varlist capturing the weather variables is slightly different for each individual. I would like to count the number of days fulfilling a certain condition (as shown above) based on the 365 days prior to each individual's survey date. Hence, I am wondering if there is any way of associating a unique varlist to each observation based on the interview date, or if there is any other work around.

I have considered reshaping the weather data to long format, however, given the size of my dataset (30,000 individuals and at least 8 years of relevant daily weather data), it seems unfeasible.

I would greatly appreciate your support.

Thanks,
Paul







Conindex Results

$
0
0
I wanted to study inequalities in a health variable (negative of HAZ) and therefore wanted to generate concentration index with respect to wealth index score provided in the dataset.

The CI that i got by using conindex is -0.0275. While the CI generated by using the procedure laid down by Owen O' Donnell et. al. in their book was -0.129 which is same as reported by other researchers. Can someone tell me why is this difference coming and what it means in terms of interpretation. I have used the wealth index score for ranking the variables and have used svy wherever applicable.

I used the following command
*****For Conindex*****
conindex neghaz, rankvar(hv271) bounded limits (-6 6) svy

******2nd approach******
clear
set maxvar 10000
set more off
gen wt = hv005/1000000
svyset[pw=wt], psu(sh021) strata(hv022)


**** var Creation**

gen y= neghaz
quietly sum y [aw=wt]
sca m_y = r(mean)
display "mean of y", m_y
di m_y

**** GENERATE WEIGHTED FRACTIONAL RANK VARIABLE
gen x = hv271
sort x
egen raw_rank=rank(x), unique
sort raw_rank
quietly sum wt
gen wi=wt/r(sum)
gen cusum=sum(wi)
gen wj=cusum[_n-1]
replace wj=0 if wj==.
gen rank=wj+0.5*wi

qui sum y [aw=wt]
scalar mean=r(mean)
cor y rank [aw=wt], c
sca CI1=(2/mean)*r(cov_12)
display "concentration index by convenient covariance method", CI1

qui sum rank [aw=wt]
sca var_rank=r(Var)
gen lhs=2*var_rank*(y/mean)
regr lhs rank [pw=wt]
sca CI2=_b[rank]
display "concentration index by convenient regression method", CI2

Thanks in advance

Interaction model in STATA

$
0
0
I estimate the following in STATA:

y=a+b_1*x1+b2*x1x2+b3*x2+FE, where x1 is a continuous variable, x2 is an indicator variable and x1x2 is the interaction between x1 and x2 FE is a set of fixed effects as controls.

I fist run the following command in STATA:

reg y c.x1##i.x2 FE.

I also construct the interaction term myself:

gen x1_x2=x1*x2, and run

reg y x1 x1_x2 x2 FE.

I got really different results for the main effects and interactions for these two methods of estimating. Am I missing something here? Thanks a lot!

Power Repeated - Error

$
0
0
I am using power repeated for a sample size calculation. It is successful in calculating the 'between effect' and 'between–within effect' sample size, but fails with the 'within effect'

Code:
. matrix M = (0.415,0.496,0.549,0.588,0.887,0.881\        ///
> 0.91,0.983,1.011,1.645,2.421,2.917\                     ///
> 0.938,0.814,0.735,0.641,0.599,0.71\                     ///
> 0.754,0.848,0.842,1.112,1.796,2.635)

. matrix C = (0.513,0.308,0.308,0.308,0.308,0.308\                ///
> 0.308,0.513,0.308,0.308,0.308,0.308\            ///
> 0.308,0.308,0.513,0.308,0.308,0.308\            ///
> 0.308,0.308,0.308,0.513,0.308,0.308\            ///
> 0.308,0.308,0.308,0.308,0.513,0.308\            ///
> 0.308,0.308,0.308,0.308,0.308,0.513)


. power repeated M, covmatrix(C)

Performing iteration ...

Estimated sample size for repeated-measures ANOVA
F test for between subjects
Ho: delta = 0  versus  Ha: delta != 0

Study parameters:

        alpha =    0.0500
        power =    0.8000
        delta =    0.7143
          N_g =         4
        N_rep =         6
        means =   <matrix>
        Var_b =    0.1746
       Var_be =    0.3422
          Cov =   <matrix>

Estimated sample sizes:

            N =        28
  N per group =         7


. power repeated M, covmatrix(C) factor(bwithin)

Performing iteration ...

Estimated sample size for repeated-measures ANOVA
F test for between-within subjects with Greenhouse-Geisser correction
Ho: delta = 0  versus  Ha: delta != 0

Study parameters:

        alpha =    0.0500
        power =    0.8000
        delta =    1.9218
          N_g =         4
        N_rep =         6
        means =   <matrix>
       Var_bw =    0.1262
      Var_bwe =    0.0342
          Cov =   <matrix>
    spherical =      true

Estimated sample sizes:

            N =        12
  N per group =         3


. power repeated M, covmatrix(C) factor(within)

Performing iteration ...
failure to compute sample size;
    The computed initial value for the search algorithm failed. It is likely that the estimate is not achievable given the input parameters and the domain limits of the power function.
r(498);
Code:

I would be grateful for any suggestions on how to resolve this problem.

Thank you,
Martyn

Stata 15.1 (IC) Current update level:27 Jun 2018

For Loop Syntax Error

$
0
0
Hello,

I am trying to use a basic for loop but keep getting the error 'Invalid Syntax' which I cannot figure out. The variable id has been defined.

Code:
forval i=1/`id'{
preserve
use if (inrange(Census2001_Lon,`minlon',`maxlon') & inrange(Census2001_Lat,`minlat',`maxlat')) using MainData
save data_`id', replace
restore
}
Could anyone please tell me if they understand the reason for the syntax error. I ran the code without the for loop and using individual values and it worked.

Thanks!

conindex results not matching?

$
0
0
Dear all,

I want to calculate concentration index as a part of research project.

First, i did the calculations using the package conindex and got a value of -0.026. After that i did the calculations using the code given by Owen O' Donnell et. al. in their book and i got a value of -0.122 which is comparable to the earlier published literature.

Why this difference and how can this be interpreted?
I am using negative of HAZ scores (removed the flagged values) as the dependent variable and health index score as the ranking variable in both the calculations. I am using DHS 15-16 for India.

I used the following code:
conindex neghaz, rankvar(wis) bounded limits (-6 6) svy

What is the mistake that i am doing?

Thanks in advance!!

Options for doing small sample adjustments while using melogit for mixed effects logistic regression

$
0
0
Hi Statalisters!

In linear mixed models, small sample bias is typically addressed through restricted maximum likelihood estimation (REML) and a Kenward-Roger correction. How do we go about this for the mixed effects logistic regression model to obtain results that are close to an exact methods approach? Are there other options to running this kind of analysis using exact methods?

Sample Stata code:
xi: melogit VL_afterenrol1 i.age_category i.Sex_c i.core_regimen_current i.Adherence i.months_ART_catnew i.VLatenrollement_categ, || pid:, covariance(unstructured) vce(cluster Site_coded) or intpoints(2)

I am using Stata version 15.1

Thanks for your help!

Comparing coefficients across nested meqrlogit models

$
0
0
Hi all,

I am studying the effect of an individual's relative income on his/her likelihood to vote in elections. For this aim I use a dataset with over 100,000 observations from 71 countries, which includes variables measured at the individual level as well as variables measured at the country-level. Therefore I ran multilevel logisting regression models, using meqrlogit. My question regards the comparison of the effect of income on voting between two nested models. This is my first post ever, so please correctly me if I'm doing something wrong. See below for an example of the dataset:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(idno cntry2) byte(incquin_h voting_h) float(dissat_h trstpol_h) byte(interest_h educ_h) int age float age2 byte female float(gini_c gdppc_c fhdemoc_c comp_vote)
348746 1 1 1 5   0 1 2 60 3600 1 -.0852427 21.071457 1.5105634 0
348324 1 2 1 1 1.5 2 2 77 5929 1 -.0852427 21.071457 1.5105634 0
347809 1 2 1 3 1.2 0 2 71 5041 0 -.0852427 21.071457 1.5105634 0
348597 1 3 1 1 1.2 1 2 66 4356 0 -.0852427 21.071457 1.5105634 0
347332 1 5 0 2 2.1 1 2 55 3025 1 -.0852427 21.071457 1.5105634 0
348332 1 4 1 1 2.7 0 2 38 1444 1 -.0852427 21.071457 1.5105634 0
347539 1 2 1 4  .9 0 2 72 5184 1 -.0852427 21.071457 1.5105634 0
347647 1 3 1 1 1.5 0 2 69 4761 1 -.0852427 21.071457 1.5105634 0
348086 1 3 1 2 2.1 0 2 31  961 1 -.0852427 21.071457 1.5105634 0
347657 1 2 1 4 2.1 1 2 77 5929 0 -.0852427 21.071457 1.5105634 0
end
label values cntry2 def_country
label def def_country 1 "AT", modify
label values incquin_h def15
label def def15 1 "lowest quintile", modify
label def def15 2 "2nd quintile", modify
label def def15 3 "3rd quintile", modify
label def def15 4 "4th quintile", modify
label def def15 5 "highest quintile", modify
label values voting_h def2
label def def2 0 "no", modify
label def def2 1 "yes", modify
label values dissat_h def11
label def def11 1 "very satisfied", modify
label def def11 2 "fairly satisfied", modify
label def def11 3 "neutral", modify
label def def11 4 "not very satisfied", modify
label def def11 5 "not at all satisfied", modify
label values trstpol_h def13
label def def13 0 "none at all", modify
label values interest_h def8
label def def8 0 "not (very)", modify
label def def8 1 "somewhat", modify
label def def8 2 "very/much", modify
label values educ_h def1
label def def1 2 "middle", modify
label values female female
label def female 0 "0. male", modify
label def female 1 "1. female", modify
label values comp_vote _noyes
label def _noyes 0 "no", modify

In model 1, I ran melogit to test the effect of relative income (incquin) on voting, controlling for some individual-level variables and country-level variables, and I include a random slope of income at the country-level. See below for the code:

Code:
meqrlogit     voting_h    incquin   ///  (the effect of relative income)
            i.educ_h age age2 i.female /// (individual-level controls)
            gini_c gdppc_c fhdemoc_c i.comp_vote    /// (country-level controls)
            || cntry2: incquin , laplace



and corresponding output:

Code:
Refining starting values:

Iteration 0:   log likelihood =  -50356.71  (not concave)
Iteration 1:   log likelihood = -50322.602  (not concave)
Iteration 2:   log likelihood = -50302.917  

Performing gradient-based optimization:

Iteration 0:   log likelihood = -50302.917  (not concave)
Iteration 1:   log likelihood = -50266.829  (not concave)
Iteration 2:   log likelihood = -50250.624  (not concave)
Iteration 3:   log likelihood = -50222.191  
Iteration 4:   log likelihood = -50156.553  (not concave)
Iteration 5:   log likelihood =  -50128.84  
Iteration 6:   log likelihood = -50118.001  
Iteration 7:   log likelihood = -50112.054  
Iteration 8:   log likelihood = -50111.894  
Iteration 9:   log likelihood = -50111.893  

Mixed-effects logistic regression               Number of obs     =    106,535
Group variable: cntry2                          Number of groups  =         71

                                                Obs per group:
                                                              min =        365
                                                              avg =    1,500.5
                                                              max =      3,082

Integration points =   1                        Wald chi2(10)     =   10405.74
Log likelihood = -50111.893                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
    voting_h |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   incquin_h |   .0965892   .0161354     5.99   0.000     .0649644    .1282141
             |
      educ_h |
     middle  |   .1922642    .024659     7.80   0.000     .1439334    .2405949
       high  |   .7044092    .029537    23.85   0.000     .6465178    .7623006
             |
         age |   .1559657   .0023865    65.35   0.000     .1512883     .160643
        age2 |  -.0011749   .0000244   -48.24   0.000    -.0012226   -.0011271
             |
      female |
  1. female  |   .0241919   .0159423     1.52   0.129    -.0070545    .0554383
      gini_c |   .4381092   1.774624     0.25   0.805    -3.040089    3.916308
     gdppc_c |  -.0118406   .0073833    -1.60   0.109    -.0263116    .0026304
   fhdemoc_c |  -.0820563    .056478    -1.45   0.146    -.1927511    .0286386
             |
   comp_vote |
        yes  |    .661497   .2171799     3.05   0.002     .2358322    1.087162
       _cons |  -3.681004   .1235347   -29.80   0.000    -3.923128   -3.438881
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
cntry2: Independent          |
               var(incqui~h) |    .014761   .0030403      .0098581    .0221024
                  var(_cons) |   .5628691   .1011927      .3957111    .8006385
------------------------------------------------------------------------------
LR test vs. logistic model: chi2(2) = 5693.94             Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.
Note: Log-likelihood calculations are based on the Laplacian approximation.

In model 2 I add three more explantory variables to the model, and afterwards I would like to compare the effect of relative income in model 1 to the effect of relative income in model 2. First, please take a look at the code and output for model 2:


Code:
. meqrlogit       voting_h        incquin ///
>                         dissat_h trstpol_h interest_h ///
>                         i.educ_h age age2 i.female ///
>                         gini_c gdppc_c fhdemoc_c i.comp_vote    ///
>                         || cntry2: incquin , laplace

Refining starting values:

Iteration 0:   log likelihood = -46422.943  (not concave)
Iteration 1:   log likelihood = -46373.814  (not concave)
Iteration 2:   log likelihood = -46337.772  

Performing gradient-based optimization:

Iteration 0:   log likelihood = -46337.772  (not concave)
Iteration 1:   log likelihood = -46303.366  (not concave)
Iteration 2:   log likelihood = -46275.975  
Iteration 3:   log likelihood = -46229.955  
Iteration 4:   log likelihood =  -46176.96  
Iteration 5:   log likelihood = -46176.217  
Iteration 6:   log likelihood = -46176.214  

Mixed-effects logistic regression               Number of obs     =    101,281
Group variable: cntry2                          Number of groups  =         71

                                                Obs per group:
                                                              min =        326
                                                              avg =    1,426.5
                                                              max =      2,992

Integration points =   1                        Wald chi2(13)     =   11051.25
Log likelihood = -46176.214                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
    voting_h |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   incquin_h |   .0795098    .013856     5.74   0.000     .0523526    .1066671
    dissat_h |  -.0604718   .0078263    -7.73   0.000     -.075811   -.0451326
   trstpol_h |   .1157597    .011594     9.98   0.000     .0930359    .1384834
  interest_h |   .5438414   .0140806    38.62   0.000      .516244    .5714388
             |
      educ_h |
     middle  |    .109462   .0260258     4.21   0.000     .0584524    .1604716
       high  |    .521709   .0311397    16.75   0.000     .4606762    .5827417
             |
         age |   .1604742   .0025156    63.79   0.000     .1555437    .1654047
        age2 |   -.001246   .0000258   -48.27   0.000    -.0012966   -.0011954
             |
      female |
  1. female  |   .1046391   .0167227     6.26   0.000     .0718631    .1374151
      gini_c |  -.0397291   1.562621    -0.03   0.980    -3.102411    3.022953
     gdppc_c |  -.0130286   .0065216    -2.00   0.046    -.0258107   -.0002464
   fhdemoc_c |  -.0466351   .0498051    -0.94   0.349    -.1442513    .0509811
             |
   comp_vote |
        yes  |   .6699579    .191176     3.50   0.000     .2952598    1.044656
       _cons |  -3.837195   .1180434   -32.51   0.000    -4.068556   -3.605834
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
cntry2: Independent          |
               var(incqui~h) |    .009697   .0021587      .0062683    .0150012
                  var(_cons) |   .4289071    .078116       .300148    .6129018
------------------------------------------------------------------------------
LR test vs. logistic model: chi2(2) = 4124.09             Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.
Note: Log-likelihood calculations are based on the Laplacian approximation.
So, again, my question is: how can I test whether the effect of incquin_h in model 1 (b =.0965892; se=.0161354) is significantly different from the effect of incquin_h in model 2 (b= .0795098; se = .013856)?

The only thing I could find so far was to use -suest- for these kinds of comparisons. However, this is not possible in this situation, since suest requires that predict allows the score option which is not the case with the models presented here.

Thanks in advance for considering this post, and please let me know if I should clarify something.

Which tests after -heckman- and -outreg2-?

$
0
0
Dear all,

I run regress characteristics of a firm's offers on client and firm characteristics. Since the firm does not make an offer to every client in the first place, I think I need to control for selection of who gets an offer, so I have run a Heckman specification where I add to the selection equation 2 additional variables that can be argued to affect who receives an offer at all, but not to affect the offer characteristics. I then export results into Stata using Roay Wada's -outreg2-.

Now I'm trying to understand which tests, beyond verbally explaining my exclusion restriction, I need to provide to show how sensible this Heckman specification is, and how to implement these in Stata. I feel a bit strange to ask here, but I'm afraid the answer was not sufficiently clear to me after reading the "heckman" help file. Therefore I would be extremely grateful for any help here.

From the results Stata stores after -heckman- in e() format, I can add -e(lambda)- as an option to the -outreg2- output, but strangely I cannot add e(lambdase), e(p), e(rho) or e(sigma). Should I use any of these as well, and if so how do I best export these with my regression tables into Excel?

Thank you so much and apologies if you find my question in any way weird,
PM

Difference-in-difference coding/syntax

$
0
0

Hi,

Just a quick question regarding the Stata code. I’m running a difference in difference analysis and I have two models, one where the treatment is if a woman has given birth to a son, and the other is if a woman has had any additional children. I have two dummy variables, one is whether a woman has given birth, the other whether a woman has a son or not. I also have another variable that counts the total number of children a woman has. To generate the variables for the first model I run the following code:

Code:
**Time period= 2006 pre 2012 post**
***treatment= has a child and that child is a boy***
 
 
*** Difference in Difference estimations with individuals ***
 
*Generate a dummy variable to indicate when the treatment started:2006*
 
gen time=(round>2006) &!missing(round)
 
*Generate a dummy variable to identify the group exposed to treatment*
 
gen treated =(birthstat==1 & sondum2==0)
* Generate an interaction term between time and treated
 
gen did=time*treated
For the second model where the treatment is if a woman has had any additional children I am not quite sure how to generate my treatment variable (total child + any number of additional children).

Any help with the code is greatly appreciated. Thanks.




I have trouble running countfit since I switched from Stata 14 to 15

$
0
0
I have trouble running countfit since I switched from Stata 14 to 15 two weeks ago.

When I now run a do-file that worked perfectly on 3 July 2018 I get the following error message:

countfit does not work with weights.
graph Graph not found

I did NOT specify weights, nor did I change the syntax for saving the graph.

I have tried removing the ado-file & reinstalling it but that doesn't solve the problem.

Here's a simplified example of the syntax:

countfit depvar indepvar controls, forcevuong replace inflate(indepvar controls)
graph save Graph "C:\blablabla", replace;




Robustness checks and interpretation

$
0
0
Hello guys,

im doing robustnes checks for my logit regression model. So for example im doing the same regression just with modified variables (without year/firm fixed effects, unwinsorized variables etc.). My problem is, that the significance level increases for the model which doesn't consider fixed effects and also for the model with unwinsorized variables. This seems very odd to me since the fixed effects and winsorization should improve my results. Do you have any idea why this is the case?

Kind regards
Leo

Structural break date

$
0
0
Dear all Statalist members,

May I know if the estimated break date produced by the 'estat sbsingle' denotes the last date of a regime or the first date of the next regime? I try to find this information in the manual but it doesn't seem to discuss on this (maybe I overlooked this information). In Eviews, the break dates given are the first date of the next regime. Also, may I know if this 'estat sbsingle' command apply the structural break test of Andrew (1993)?

Thanks,
Janys

Doubts about conditional and missing values

$
0
0
Hello everyone, I am a beginner level and I have questions about the missing values. In this example, I do not understand why in the condition of the third line it establishes that condition for the missing values, if already in the first line and according to the indications, the missing values were considered in level 1.

"We wanted to split people into 3 achievement levels to best address each group's needs.
Generate a variable that divides the students into 3 tracked groups of different levels (call the variable level), those below 25, those above 80 and those in the middle. Assign the missings to the lowest level just for the purposes of this exercise."

gen level = 1 if math < 25 | math == .
replace level = 2 if math >=25 & math < 80
replace level = 3 if math >= 80 & math != .

Thanks

Create a panel data from 2 waves

$
0
0
Dear all,

I am trying to create a panel data from 2 waves (2010 and 2012) of a household survey however I could not make it properly. Followings are my example data and attempts. Any suggestion is greatly appreciated.

In both data sets, pro (province); dis (district); comm (commune); EA (enumeration area); hhid (household ID); and idmem (ID member) uniquely identify observations.

In the data set in 2012, pro2010, dis2010, comm2010, EA2010 and hhid2010 are used to identify household observed in 2010.

Data in 2012
Code:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte pro int dis long comm int EA byte hhid float idmem int age byte(gender pro2010) int dis2010 long comm2010 int EA2010 byte hhid2010 int age2010 byte gender2010
1 1   4  8 13 1 77 1 0 .   .  .  .  . .
1 1   4  8 13 2 70 2 0 .   .  .  .  . .
1 1   7 22 15 2 60 1 1 1   7 22 15 58 1
1 1  16  3 15 1 68 2 0 .   .  .  .  . .
1 1  22 19 13 1 71 2 1 1  22 19 13 69 2
1 1  28 25 14 1 60 2 0 .   .  .  .  . .
1 1  34 25 13 1 70 1 1 1  34 25 13 67 1
1 1  34 25 13 2 67 2 1 1  34 25 13 65 2
1 2  40 16 13 2 62 1 0 .   .  .  .  . .
1 2  40 16 14 1 78 2 0 .   .  .  .  . .
1 2  55 11 13 1 70 1 1 2  55 11 13 68 1
1 2  55 11 13 2 70 2 1 2  55 11 13 68 2
1 2  55 11 15 1 61 1 1 2  55 11 15 59 1
1 2  55 11 15 2 60 2 1 2  55 11 15 58 2
1 2  67 16 14 4 62 1 0 .   .  .  .  . .
1 2  67 16 14 5 66 2 0 .   .  .  .  . .
1 3  91  6 15 1 68 1 0 .   .  .  .  . .
1 3  91  6 15 2 63 2 0 .   .  .  .  . .
1 3 106 13 15 1 82 2 1 3 106 13 15 80 2
1 3 112 17 15 1 72 2 0 .   .  .  .  . .
1 4 118 39 15 1 84 1 1 4 118 39 15 82 1
1 4 118 39 15 2 82 2 1 4 118 39 15 80 2
1 4 124 22 13 1 88 2 0 .   .  .  .  . .
1 4 124 22 13 2 86 1 0 .   .  .  .  . .
1 4 124 22 15 1 81 2 0 .   .  .  .  . .
1 4 124 22 19 1 70 2 0 .   .  .  .  . .
1 4 133 30 14 5 81 2 1 4 133 30 14  . .
1 4 133 30 15 1 60 1 1 4 133 30 15 58 1
1 4 139 30 15 1 65 1 0 .   .  .  .  . .
1 4 148  6 15 1 62 1 1 4 148  6 15 60 1
end
label values gender gender
label def gender 1 "Male", modify
label def gender 2 "Female", modify
label values gender2010 gender2010
label def gender2010 1 "Male", modify
label def gender2010 2 "Female", modify
Data 2010
Code:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(pro dis) float comm int(EA hhid) byte idmem int age byte gender
1 1   4 12 13 1 60 2
1 1   4 12 14 1 61 1
1 1   4 12 15 1 69 1
1 1   4 12 15 2 66 2
1 1  22 19 13 1 69 2
1 1  28 20 13 2 60 1
1 1  28 20 15 1 68 2
1 1  28 20 15 2 72 1
1 1  28 20 19 2 60 1
1 1  34 25 13 1 67 1
1 1  34 25 13 2 65 2
1 2  40  6 14 1 64 2
1 2  40  6 14 2 64 1
1 2  55 11 13 1 68 1
1 2  55 11 13 2 68 2
1 2  67 23 15 1 90 1
1 2  67 23 15 2 92 2
1 3  91  6 13 1 64 1
1 3  91  6 13 2 62 2
1 3  91  6 14 1 66 2
1 3  91  6 14 2 67 1
1 3 106 13 15 1 80 2
1 3 112 20 15 5 90 2
1 3 112 20 20 1 60 1
1 4 118 39 15 1 81 1
1 4 118 39 15 2 80 2
1 4 124 39 15 1 63 2
1 4 124 39 15 2 71 1
1 4 139 50 13 4 66 2
1 4 139 50 19 1 71 2
end
label values gender gender
label def gender 1 "Male", modify
label def gender 2 "Female", modify
My first attempt is as follows but unfortunately error message of uniquely identify observation issue arises even I tried to use -merge- m:1 or 1:m
Code:
use data2012, clear
    ren pro pro2012
    ren dis dis2012  
    ren comm comm2012  
    ren EA EA2012  
    ren hhid hhid2012

    ren pro2010 pro
    ren dis2010 dis
    ren comm2010 comm
    ren EA2010 EA
    ren hhid2010 hhid
    sort pro dis comm EA hhid
merge 1:1 using data1
My 2nd attempt, however not an effective way, is to rename age (e.g., age10 and age12) and gender (e.g., gen10 and gen12) to make them different names in the both data sets and then use following command:

Code:
use data2012
sort pro dis comm EA hhid idmem
merge 1:1 pro dis comm EA hhid idmem using data2010
This code worked but I think I am unable to do it manually with hundreds of variables. Moreover, when I tried to use reshape command to make a panel data, results did not show me age and gender for each year.

Thank you.

DL

Difficulty formatting Forest plot

$
0
0
Hello!

I am trying to create some forest plots on STATA, but am having difficulties formatting!

I have tried various solutions but whenever I used commands such as xlabel I get various error messages...

I have attached an image on what I have achieved so far, and I ideally I would like to get rid of the empty space to the right of the plot, add x-labels and favours(polypill # comparator). It would also be great if I could add totals in bold for polypill N and comparator N, along with additional heterogenity measures, and perhaps have the overall effect marker filled?

The code that I have tried includes: admetan total mean sd var7 var5 var6,nostandard random lcols (var1 var19 total var20 var7) boxsca(60) xsize(18) ysize(15) nowt xlabel(0.5, 1,2,5, 10, 50)

Any help would be much much appreciated!

Thanks!!

Array

Setting up 2 x 2 factorial in Stata

$
0
0
Dear Stata users,

I am planning to analysing a 2 x 2 factorial data for binary outcome using stata and wondering if anyone can share idea on example ways of achieving this.

There's usually a difference between theory and application. Also if anyone have information on suggested readings that demonstrates this using Stata. That will also be appreciated. I haven't had any success with lots of Google Searches.

Thanks

calculating household size

$
0
0
Hi,

Can anyone help me calculating household size by household ID (not repeated hhsize by repeated hhid as below) from the below date example:
Here 'hhid' means 'household ID' and 'pid' means 'person ID'
clear
input float hhsize double hhid float pid
4 1 1
4 1 1
4 1 2
4 1 3
4 1 4
3 2 1
3 2 2
3 2 3
4 3 1
4 3 2
4 3 3
end

Thank you so much, Rumana
Viewing all 72776 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>