Quantcast
Channel: Statalist
Viewing all 73142 articles
Browse latest View live

add new observation in empty dataset

$
0
0
Dear Stata users, I have a quick question. I have a dataset with 10 variables but no observations. I would like to generate a new variable that indicates that the dataset has no observations. I thought the following: sysuse auto, clear drop in 1/74 input outcome 1 end However, I get an error message saying that 1 is not a valid command. Can anyone help me with this? Are there any better way to do this? Thank you very much. Really appreciated. F

Using xi to create design matrix of dummies / alternatives to xi3 and desmat?

$
0
0
Dear community,

I'm looking for a way to create a designmatrix for a model with categorical variables using Stata's factor notation as input. Basically I want to do what the no longer available xi3 prefix command did: Create dummy variables for every category of each variable and each interaction.

There is the ado desmat which basically does exactly what I want only I can't use it because I'm working on code for my own ado and desmat has to be installed seperately before it can be used.

Here's an example of what I want to do that uses desmat:
Code:
clear

input A B C f
1 1 1 23
1 1 2 34
1 2 1 31
1 2 2 49
1 3 1 71
1 3 2 19
2 1 1 78
2 1 2 93
2 2 1 16
2 2 2 48
2 3 1 12
2 3 2 93
end

desmat A*B*C

list, noobs clean

mkmat _x_1-_x_11, mat(design)

mat list design
The result looks like this:
Code:
      _x_1   _x_2   _x_3   _x_4   _x_5   _x_6   _x_7   _x_8   _x_9  _x_10  _x_11
 r1      0      0      0      0      0      0      0      0      0      0      0
 r2      0      0      0      0      0      1      0      0      0      0      0
 r3      0      1      0      0      0      0      0      0      0      0      0
 r4      0      1      0      0      0      1      0      1      0      0      0
 r5      0      0      1      0      0      0      0      0      0      0      0
 r6      0      0      1      0      0      1      0      0      1      0      0
 r7      1      0      0      0      0      0      0      0      0      0      0
 r8      1      0      0      0      0      1      1      0      0      0      0
 r9      1      1      0      1      0      0      0      0      0      0      0
r10      1      1      0      1      0      1      1      1      0      1      0
r11      1      0      1      0      1      0      0      0      0      0      0
r12      1      0      1      0      1      1      1      0      1      0      1
I also know that there is the prefix xi but it doesn't allow for interactions of more than two variables.
Does anyone have a suggestion? How can I use xi to create results like that (without inputting the variable names by hand)?

Best regards,
Max Hörl

Stata Journal cumulative index

$
0
0
All articles in the Stata Journal are indexed in RePEc services such as IDEAS (http://ideas.repec.org) and EconPapers (http://econpapers.repec.org). If authors are registered with RePEc (which is free) they may "claim" their articles in the RePEc Author Service, http://authors.repec.org, and they will appear on their RePEc CV.

Every year at this time, I produce a cumulative Author Index to the Stata Journal, which has now completed 15 volumes (2001-2015). That index only appears online, and may be accessed as a PDF from https://ideas.repec.org/a/tsj/stataj...4cumindex.html or the equivalent in EconPapers.

This year to date, 176 packages were contributed to the Statistical Software Components (SSC) archive which I maintain. You may use -ssc new- to view the last month's additions and updates, and RePEc services' listings for an overview of what is available. However, you should always use the -ssc- command to download and install items from the SSC archive, and -adoupdate- to check their status.

Best wishes of the season.
Kit Baum

Panel data with differing variables

$
0
0
Hi there,

I have 4 time periods of panel data. I want to conduct a panel logistic regression in stata. Most variables in the model are included in all 4 time periods, but a handful of variables were added to the last time period and only appear there.

How would I conduct this panel analysis including the variables that appear across all waves, as well as the variables that are included in only that last wave (my research question pertains to how those variables that are only included in one wave time period have an effect on the rest of the panel data)?

Is this possible to do? If not, is it better to treat this as a repeated cross-section in which I do 1 model (based on the time period wave) for each of the 4 times periods (and then adding the unique variables to the final time period to the last cross-section model)?

I should mention that these additional variables in the last wave are not static (like race or sex), but are opinion variables that would have the possibility of changing in other waves.

Hausman Test.

$
0
0
Dear Statalist

Why is it that hausman test cannot be used with vce(robust), vce(cluster cvar). What is the solution for that?

Thank you

Two stage regression with complex survey data

$
0
0
For the forum:
How would I set up a 2-stage regression for complex survey data in STATA? The outcome variable is either zero or a continuous normally distributed variable, and the covariates are a mixture of continuous and categorical variables. Also, is there a follow up routine to develop the odds ratio for the zero inflated part and the means for the continuous outcome?
Thank you.
Margaret

Summing a variable for each group

$
0
0
I am having some trouble doing a simple sum in my data. I tried to reshape the data as wide, but it did not work.


My data looks something like this:
Dsitrict Education Education_code N
1 Primary degree 1 117
1 Professional qualification (degree status) 2 65
1 Both degree and professional qualification 3 53
1 Post-graduate certificate or diploma 4 67
1 Post-graduate degree (masters) 5 68
1 Doctorate (Ph.D) 6 12
1 Total 7 172
2 Primary degree 1 3177
2 Professional qualification (degree status) 2 10
2 Both degree and professional qualification 3 8
2 Post-graduate certificate or diploma 4 20
2 Post-graduate degree (masters) 5 1
2 Doctorate (Ph.D) 6 54
2 Total 7 1040
3 Primary degree 1 2
3 Professional qualification (degree status) 2 10
3 Both degree and professional qualification 3 3
3 Post-graduate certificate or diploma 4 4
3 Post-graduate degree (masters) 5 1
3 Doctorate (Ph.D) 6 14
3 Total 7 355


Basically, I want to

1) Sum Education_code's 1,2,3,4,5 & 6, for each district.
2) Divide this sum by Education_code == 7, for each district

Any help is greatly appreciated, thank you.



Need help: xtabond with Difference GMM and System GMM in Panel Data

$
0
0
Hello everyone,

I'm trying to use the Stata 13 to estimate a Dynamic Panel Data with the Difference GMM and System GMM. The first difference equations are: Array


Where:
Li,t is the dependent variable (here is Leverage)
Xi,t-1 is the matrix of determinant of the dependent variable (including: prft, tang, growth, size)

Now, in the Differencing GMM (1991), I want to use the instruments including: Array


and in the System GMM (1998), the instruments I want to use in the first difference equations are Array


and for the level equations are: Array



My attempt is below:

1. The Difference GMM (1991) estimation:
Code:
. xtabond leverage lagleverage lagprft lagsize laggrowth lagtang, lags(2) twostep vce(robust) artests(2) small
small is a deprecated option
note: lagleverage dropped because of collinearity

Arellano-Bond dynamic panel-data estimation  Number of obs         =        33
Group variable: id                           Number of groups      =        11
Time variable: t
                                             Obs per group:    min =         3
                                                               avg =         3
                                                               max =         3

Number of instruments =     15               Wald chi2(6)          =    109.04
                                             Prob > chi2           =    0.0000
Two-step results
                                     (Std. Err. adjusted for clustering on id)
------------------------------------------------------------------------------
             |              WC-Robust
    leverage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    leverage |
         L1. |  -.0022249   .1895428    -0.01   0.991     -.373722    .3692722
         L2. |   .1227548   .4008548     0.31   0.759    -.6629062    .9084158
             |
     lagprft |   .1286908   .0810308     1.59   0.112    -.0301267    .2875083
     lagsize |   .1109781   .0509849     2.18   0.030     .0110495    .2109066
   laggrowth |   .0180581    .011342     1.59   0.111    -.0041719     .040288
     lagtang |  -.5714003   .3501781    -1.63   0.103    -1.257737    .1149362
       _cons |  -1.062468   .9021153    -1.18   0.239    -2.830581    .7056455
------------------------------------------------------------------------------
Instruments for differenced equation
        GMM-type: L(2/.).leverage
        Standard: D.lagleverage D.lagprft D.lagsize D.laggrowth D.lagtang
Instruments for level equation
        Standard: _cons
2. The System GMM (1998) estimation:
Code:
. xtdpdsys leverage lagprft lagtang lagsize laggrowth, lags(2) twostep vce(robust) artests(2)

System dynamic panel-data estimation         Number of obs         =        44
Group variable: id                           Number of groups      =        11
Time variable: t
                                             Obs per group:    min =         4
                                                               avg =         4
                                                               max =         4

Number of instruments =     18               Wald chi2(6)          =   1236.16
                                             Prob > chi2           =    0.0000
Two-step results
------------------------------------------------------------------------------
             |              WC-Robust
    leverage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    leverage |
         L1. |   .1122024   .2660097     0.42   0.673     -.409167    .6335719
         L2. |    .232506   .1716271     1.35   0.176     -.103877     .568889
             |
     lagprft |   .2078303   .1267164     1.64   0.101    -.0405292    .4561899
     lagtang |  -.4553018   .0780344    -5.83   0.000    -.6082465   -.3023571
     lagsize |   .1162271   .0578604     2.01   0.045     .0028228    .2296314
   laggrowth |   .0109966   .0234193     0.47   0.639    -.0349044    .0568975
       _cons |  -1.304568   .8091934    -1.61   0.107    -2.890558    .2814224
------------------------------------------------------------------------------
Instruments for differenced equation
        GMM-type: L(2/.).leverage
        Standard: D.lagprft D.lagtang D.lagsize D.laggrowth
Instruments for level equation
        GMM-type: LD.leverage
        Standard: _cons
Also, I have tried to use the xtabond2 command and I got this:
Code:
. xtabond2 leverage lagleverage lagprft lagsize lagtang laggrowth, noleveleq two robust small gmm( leverage lagprft lagsize lagtang laggrowth, lag(2 2))
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
Warning: Number of instruments may be large relative to number of observations.
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
  Difference-in-Sargan statistics may be negative.

Dynamic panel-data estimation, two-step difference GMM
------------------------------------------------------------------------------
Group variable: id                              Number of obs      =        55
Time variable : t                               Number of groups   =        11
Number of instruments = 20                      Obs per group: min =         5
F(5, 11)      =      1.43                                      avg =      5.00
Prob > F      =     0.288                                      max =         5
------------------------------------------------------------------------------
             |              Corrected
    leverage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 lagleverage |  -.0217873   .3974929    -0.05   0.957    -.8966633    .8530887
     lagprft |   .0331341   .2335589     0.14   0.890    -.4809257    .5471938
     lagsize |   .0807723    .066558     1.21   0.250    -.0657209    .2272654
     lagtang |  -.4597916   .5534276    -0.83   0.424    -1.677877    .7582943
   laggrowth |   .0237899   .0384265     0.62   0.548    -.0607863     .108366
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L2.(leverage lagprft lagsize lagtang laggrowth)
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -0.91  Pr > z =  0.365
Arellano-Bond test for AR(2) in first differences: z =   0.45  Pr > z =  0.650
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(15)   =  20.23  Prob > chi2 =  0.163
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(15)   =   9.11  Prob > chi2 =  0.872
  (Robust, but can be weakened by many instruments.)
I know that there were several mistakes in all of my commands but I just could not figure out and make it right the way I want as I have stated earlier.

Here, the p_value are quite large and the number of instruments are larger than the number of groups. (I know my sample is too small but sadly I could not change it). I hope you can help me fix the command to get the significant results.

Also, I am a new Stata user, so please forgive me if I have made any foolish mistakes and feel free to let me know.

Thank you in advance! I do really hope to hearing from you soon!

non normal residuals in vecm??

$
0
0
Hi, I tried to estimate a vector error correction model, but unfortunately, when I ran some postestimation diagnostic tests such as Jacque-Bera test, it turned out that my errors are not normally distributed. Does anyone know how to deal with this? I mean, I assume if the errors aren't normally distribution, inference from my regression results is invalid. Also, my errors are skewed and the kurtosis test null was rejected as well.

thank you for any suggestibas ro how to handle non normally distributed residuals in a vec model

Adjusted R² in Random Effects Model

$
0
0
Hello,

I'm making Regressions with the Random Effects Model, but I don't find the command for the adjusted R-Squared.
Is there any command for the adjusted R² or is there any command for a similar measure which is comparable with the adjusted R² ?

Thank you

Moran test with the command spautoc

$
0
0
Dear statalists

I try to perform the Moran test of spatial autocorrelation but it does not work. I get the error message "varlist not allowed" when I do the command:

spautoc region nei, weight(x)


With region an identifier variable in numeric

nei a variable containing neighborhood information in string (str 244) like this '' 5 7 13 15 16 17"

and x a weight variable also in string (str 244) containing also informations like thise '' .28381 .28381 .28381 .28381 .28381 .28381"

Any help will be very appreciated!

Many thanks in adavance!

Treatment effects with survey data

$
0
0
Hello,

I am working with a stratified sample that has sampling ((inverse probability) weights, which I've set up using svyset.
I am interested in testing a categorical outcome based on a categorical variable which conceptually could be thought of as a "treatment."
Does it make sense to use the Stata teffects command in this case, or does it make more sense not to mix the two paradigms? (I.e. Mix the teffects paradigm with the survey strata/cluster/weights specification).
Does it make more sense just to do the conventional logistic regression with the strata/cluster/weights all set up per the svyset command?
Appreciate any feedback,
John L.

stset multiple failure data

$
0
0
Dear List
I have asked this question in different ways before, but it turns out I'm sure what i am doing right or wrong. I think my problem is how i stset my multiple failure data.

i have the following variables:
id
age_at_start_obs
age_at_end_of_obs
age_at_event
event


I want to stset my data so that i can calculate the incidence rates pr age band using fx stptime or by splitting the data by using stsplit

A dummy dataset
Code:
input id age_at_start_obs age_at_end_of_obs age_at_event event
1 50 60 55 1
1 50 60 56 1
1 50 60 59 1
2 40 45 . 0
2 40 45 44 1
3 75 80 . 0
3 75 80 76 1
3 75 80 77 1
end
​
I am not sure how to best stset these data to achieve my goal.
I have tried the following

Code:
replace age_at_event=age_at_end_of_obs if age_at_event==.
stset age_at_event, id(id) fail(event) exit(age_at_end_of_obs) enter(age_at_start_obs)
stptime, at(35(1)81)
And this seems to give me the results that i what, but i am still wondering:
Is this the correct way to do it?

But when i stsplit my data:
Code:
stsplit years, every(1)
replace event=0 if event==.
tab years event
id=1 i would suspect that this individual would have 10 lines in the split dataset, but there are only 9 (age_at_event=50-59).
Could anyone try to explain why that is?

Hope you can look through it and see if I'm off by a mile.

Thank you
Lars

Need help: Calculating marginal effect of interaction terms after -ivprobit-

$
0
0
Hi everyone,

I need to determine the marginal effects of interaction terms after the -ivprobit command. Here is my code:


* ivprobit fminorsucc (csr1 csr1_capint2 csr1_bBindex1 csr1_volt = csr1iv2 c.csr1iv2#c.capint2 c.csr1iv2#c.bBindex1 c.csr1iv2#c.volt) capint2 bBindex1 volt i.indid ydum10-ydum18 dismissal outsider0 pceofminor2 interim ceopay1seq madjemp avaslack dratio eBindex lambda hsstatus2 madjfirmage madjroe , vce(cluster compid) nolog


The endogenous variable is csr1. The instrumental variable is csr1iv2. I also include interaction terms: csr1_capint2, csr1_bBindex1, and csr1_volt. My issue is that -ivprobit- does not allow me to use interaction operators on the endogenous variable list.

Consequently, I have the following questions:

1) Would margins command still provide accurate marginal effects in this case? If not, how should I address this issue?

2) I have also tried the approach specified by Wiersema & Bowen (2009), which provides the code to calculate the true interaction terms using the formula in Ai & Norton (2003). However, after ivprobit, the standard errors is not accurate. Specifically, I tried to calculate marginal effects of csr1 in the model without interaction terms using both Wiersema & Bowen(2009) approach and -margins- command. The marginal effect is the same, but the Wiersema & Bowen( 2009) approach gives me insignificant results, while the -margin- command gave me significant results. Is there any other way to calculate the true interaction effect after -ivprobit-?

Thank you very much for your consideration and guidance.

References:

Ai, C., & Norton, E. C. (2003). Interaction terms in logit and probit models. Economics letters, 80(1), 123-129.
Wiersema, M. F., & Bowen, H. P. (2009). The use of limited dependent variable techniques in strategy research: Issues and methods. Strategic Management Journal, 30(6), 679-692.




Convert 'accessibility' driving time to logarithmic variable, but it contains zeros

$
0
0
Hello everyone,

Currently I am running a convergence regression. I want to include a variable on the accessibility of a region. The data I have contain several variables on accessibility, all which are driving time in minutes by car to the next high-way access, train station, airport or to several measures of agglomeration centers. However, some values are zero, thus indicating that there is on average no driving time.

The model I have is a log-log model. Of course, except for the dummy variables all variables have been converted to logarithmic function. However, of course when I transform the accessibility variables which contain a value of 0 it generates the 'missing values'.

My question is, what should I do?
A. Nothing and leave the 'missing values' out, but that would mean that all the regions which are very accessible are not included in the regression.
B. Replace all the missing values for 0, thus "replace lnACC_MAJOR = 0 if (lnACC_MAJOR == .)", however that does not seem like a good practice to me.
C. I should not convert the accessibility variables to logarithms.
D. Other, .....

Additionally, I want to include commuter flows, however I try to include net flow, which results in negative values for certain regions, thus higher outflow than inflow of commuters. However, negative values cannot be converted into logs. What should I do here?

Thanks in advance.

State-year fixed effects vs state specific year trends

$
0
0
Hello,

I wanted to clarify if my understanding of including interaction terms in a panel estimation model is correct. Say, we observe outcomes for individual i in group j in state s at time t along with some covariates x. Would the factor variables generated by i.state#c.year be referred to what is known as state specific year trends whereas i.state#i.year would mean state by year fixed effects?

Code:
use http://www.stata-press.com/data/r13/nlswork, clear
xtset idcode year

* assume industry code are state codes
rename ind_code state

* state specific year trends?
xtreg ln_wage i.state#c.year, fe 

* state by year fixed effects?
xtreg ln_wage i.state#i.year, fe
Thanks!

xtivreg2 and the endogeneity tests

$
0
0
Hi,

I am using the xtivreg2 command to estimate a FE-IV model. I would like to ask two questions which regard the endogeneity test, and the versions of it, produced by xtivreg2.

1. If I understand it correctly, if the "robust" and "cluster" options are specified in the xtivreg2 command, xtivreg2 calculates a version of the endogneiety test that is robust to heteroskedasticity and serial correlation within panel groups. I would like to see the exact formula used to calculate the endogeneity test. In the documentation file for "ivregress postestimation" (http://www.stata.com/manuals13/rivre...estimation.pdf) page 15 gives some information. From this information, I suspect the statistic in question is Wooldridge’s (1995) score test because the documentation file states "(...) this test can be made robust to heteroskedasticity, autocorrelation, or clustering by using the appropriate robust VCE (...)". On the other hand, the documentation of ivreg2 does not specify the exact formula used. Which endogeneity test is used here? Where can I find the exact formulas calculating the versions of the test that are robust to heteroskedasticity, robust to serial correlation in the errors within panel groups, and robust to both at the same time?

2. The command I use to obtain the endogeneity test is "xtivreg2 dependent $model, fe endog(varone vartwo) robust cluster(panelid)" which leads to "-endog- option: Endogeneity test of endogenous regressors: 24.977 Chi-sq(2) P-val = 0.0000". I also wanted to check the result produced by the "estat endogenous" command that can be executed after the ivregress command. For this, I first execute "xtdata dependent independent instruments, i(panelid) fe clear" to obtain the differenced data so that I could execute "ivregress 2sls dependent $model, robust cluster(panelid)". Following this, I execute the "estat endogenous" command which leads to "Robust regression F(2,12814) = 12.1324 (p = 0.0000) (Adjusted for 12815 clusters in HHIDPN)".
I expected that the endogeneity option of the xtivreg2 command and the output of the estat endogenous command both use the same endogeneity test and hence lead to the same test value. Apperantly they use different statistics, or that my approach of differncing the data and using the ivregress command incorrect, although I cannot really think of a reason why it would be incorrect because I obtain the very same coefficient estimates in xtivreg2 and in ivregress after differencing. From the documentation of "estat endogenous", the formula used to calculate the endogeneity test, that is robust to heterskedasticity and serial correlation within panel groups, is not clear to me. Is there a Stata reference that clearly states the statistics and formulas used by estat endogenous?


Tunga

How to generate a dummy variable with a max value

$
0
0
Hi,

I currently have a list with birth order, and I would like to generate a new variable that will describe the last born child. Therefore, it will have a value of 1 for the last born child. How is this possible?

Thank you in advance.

Inserting image in Stata graph

$
0
0
Is there a way to insert/paste an image into a Stata graph from within the Stata environment? Let me explain what I need. I have written a program to read in some data, calculate some index numbers, and produce a set of graphs, which are then exported to other formats (TIF/EPS etc.). I need to insert an institutional logo/symbol into each graph. Currently this is being done outside Stata using a graphics editing software, working with the exported graphs. That of course adds a delay to the process, which it would be very desirable to avoid, since this is a repeated task with a tight deadline. So I was wondering if there is some way to automate this process from within Stata, e.g., using a command to insert an external graphic into a special field within the Stata graph (in its graph/plot region). I wonder if any other member has handled this type of task, and could offer some suggestion how to best achieve it? Thanks & regards.

xtfmb command: How to store first-stage regression coefficients

$
0
0
Hi, I am using the Fama-MacBeth (1973) cross-sectional regressions (xtfmb command) and am content with my results; however, I want to be able to store the cross-sectional regression coefficients for each time period. Is there a way to extract this information while using the command, or will I need to do this manually by doing a loop with cross-sectional regressions for each time period? I want to avoid the latter because the command already has the coefficients and newly-west standard errors automatically computed.
Viewing all 73142 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>