Dear Stata users, I have a quick question. I have a dataset with 10 variables but no observations. I would like to generate a new variable that indicates that the dataset has no observations. I thought the following: sysuse auto, clear drop in 1/74 input outcome 1 end However, I get an error message saying that 1 is not a valid command. Can anyone help me with this? Are there any better way to do this? Thank you very much. Really appreciated. F
↧
add new observation in empty dataset
↧
Using xi to create design matrix of dummies / alternatives to xi3 and desmat?
Dear community,
I'm looking for a way to create a designmatrix for a model with categorical variables using Stata's factor notation as input. Basically I want to do what the no longer available xi3 prefix command did: Create dummy variables for every category of each variable and each interaction.
There is the ado desmat which basically does exactly what I want only I can't use it because I'm working on code for my own ado and desmat has to be installed seperately before it can be used.
Here's an example of what I want to do that uses desmat:
The result looks like this:
I also know that there is the prefix xi but it doesn't allow for interactions of more than two variables.
Does anyone have a suggestion? How can I use xi to create results like that (without inputting the variable names by hand)?
Best regards,
Max Hörl
I'm looking for a way to create a designmatrix for a model with categorical variables using Stata's factor notation as input. Basically I want to do what the no longer available xi3 prefix command did: Create dummy variables for every category of each variable and each interaction.
There is the ado desmat which basically does exactly what I want only I can't use it because I'm working on code for my own ado and desmat has to be installed seperately before it can be used.
Here's an example of what I want to do that uses desmat:
Code:
clear input A B C f 1 1 1 23 1 1 2 34 1 2 1 31 1 2 2 49 1 3 1 71 1 3 2 19 2 1 1 78 2 1 2 93 2 2 1 16 2 2 2 48 2 3 1 12 2 3 2 93 end desmat A*B*C list, noobs clean mkmat _x_1-_x_11, mat(design) mat list design
Code:
_x_1 _x_2 _x_3 _x_4 _x_5 _x_6 _x_7 _x_8 _x_9 _x_10 _x_11 r1 0 0 0 0 0 0 0 0 0 0 0 r2 0 0 0 0 0 1 0 0 0 0 0 r3 0 1 0 0 0 0 0 0 0 0 0 r4 0 1 0 0 0 1 0 1 0 0 0 r5 0 0 1 0 0 0 0 0 0 0 0 r6 0 0 1 0 0 1 0 0 1 0 0 r7 1 0 0 0 0 0 0 0 0 0 0 r8 1 0 0 0 0 1 1 0 0 0 0 r9 1 1 0 1 0 0 0 0 0 0 0 r10 1 1 0 1 0 1 1 1 0 1 0 r11 1 0 1 0 1 0 0 0 0 0 0 r12 1 0 1 0 1 1 1 0 1 0 1
Does anyone have a suggestion? How can I use xi to create results like that (without inputting the variable names by hand)?
Best regards,
Max Hörl
↧
↧
Stata Journal cumulative index
All articles in the Stata Journal are indexed in RePEc services such as IDEAS (http://ideas.repec.org) and EconPapers (http://econpapers.repec.org). If authors are registered with RePEc (which is free) they may "claim" their articles in the RePEc Author Service, http://authors.repec.org, and they will appear on their RePEc CV.
Every year at this time, I produce a cumulative Author Index to the Stata Journal, which has now completed 15 volumes (2001-2015). That index only appears online, and may be accessed as a PDF from https://ideas.repec.org/a/tsj/stataj...4cumindex.html or the equivalent in EconPapers.
This year to date, 176 packages were contributed to the Statistical Software Components (SSC) archive which I maintain. You may use -ssc new- to view the last month's additions and updates, and RePEc services' listings for an overview of what is available. However, you should always use the -ssc- command to download and install items from the SSC archive, and -adoupdate- to check their status.
Best wishes of the season.
Kit Baum
Every year at this time, I produce a cumulative Author Index to the Stata Journal, which has now completed 15 volumes (2001-2015). That index only appears online, and may be accessed as a PDF from https://ideas.repec.org/a/tsj/stataj...4cumindex.html or the equivalent in EconPapers.
This year to date, 176 packages were contributed to the Statistical Software Components (SSC) archive which I maintain. You may use -ssc new- to view the last month's additions and updates, and RePEc services' listings for an overview of what is available. However, you should always use the -ssc- command to download and install items from the SSC archive, and -adoupdate- to check their status.
Best wishes of the season.
Kit Baum
↧
Panel data with differing variables
Hi there,
I have 4 time periods of panel data. I want to conduct a panel logistic regression in stata. Most variables in the model are included in all 4 time periods, but a handful of variables were added to the last time period and only appear there.
How would I conduct this panel analysis including the variables that appear across all waves, as well as the variables that are included in only that last wave (my research question pertains to how those variables that are only included in one wave time period have an effect on the rest of the panel data)?
Is this possible to do? If not, is it better to treat this as a repeated cross-section in which I do 1 model (based on the time period wave) for each of the 4 times periods (and then adding the unique variables to the final time period to the last cross-section model)?
I should mention that these additional variables in the last wave are not static (like race or sex), but are opinion variables that would have the possibility of changing in other waves.
I have 4 time periods of panel data. I want to conduct a panel logistic regression in stata. Most variables in the model are included in all 4 time periods, but a handful of variables were added to the last time period and only appear there.
How would I conduct this panel analysis including the variables that appear across all waves, as well as the variables that are included in only that last wave (my research question pertains to how those variables that are only included in one wave time period have an effect on the rest of the panel data)?
Is this possible to do? If not, is it better to treat this as a repeated cross-section in which I do 1 model (based on the time period wave) for each of the 4 times periods (and then adding the unique variables to the final time period to the last cross-section model)?
I should mention that these additional variables in the last wave are not static (like race or sex), but are opinion variables that would have the possibility of changing in other waves.
↧
Hausman Test.
Dear Statalist
Why is it that hausman test cannot be used with vce(robust), vce(cluster cvar). What is the solution for that?
Thank you
Why is it that hausman test cannot be used with vce(robust), vce(cluster cvar). What is the solution for that?
Thank you
↧
↧
Two stage regression with complex survey data
For the forum:
How would I set up a 2-stage regression for complex survey data in STATA? The outcome variable is either zero or a continuous normally distributed variable, and the covariates are a mixture of continuous and categorical variables. Also, is there a follow up routine to develop the odds ratio for the zero inflated part and the means for the continuous outcome?
Thank you.
Margaret
How would I set up a 2-stage regression for complex survey data in STATA? The outcome variable is either zero or a continuous normally distributed variable, and the covariates are a mixture of continuous and categorical variables. Also, is there a follow up routine to develop the odds ratio for the zero inflated part and the means for the continuous outcome?
Thank you.
Margaret
↧
Summing a variable for each group
I am having some trouble doing a simple sum in my data. I tried to reshape the data as wide, but it did not work.
My data looks something like this:
Basically, I want to
1) Sum Education_code's 1,2,3,4,5 & 6, for each district.
2) Divide this sum by Education_code == 7, for each district
Any help is greatly appreciated, thank you.
My data looks something like this:
Dsitrict | Education | Education_code | N |
1 | Primary degree | 1 | 117 |
1 | Professional qualification (degree status) | 2 | 65 |
1 | Both degree and professional qualification | 3 | 53 |
1 | Post-graduate certificate or diploma | 4 | 67 |
1 | Post-graduate degree (masters) | 5 | 68 |
1 | Doctorate (Ph.D) | 6 | 12 |
1 | Total | 7 | 172 |
2 | Primary degree | 1 | 3177 |
2 | Professional qualification (degree status) | 2 | 10 |
2 | Both degree and professional qualification | 3 | 8 |
2 | Post-graduate certificate or diploma | 4 | 20 |
2 | Post-graduate degree (masters) | 5 | 1 |
2 | Doctorate (Ph.D) | 6 | 54 |
2 | Total | 7 | 1040 |
3 | Primary degree | 1 | 2 |
3 | Professional qualification (degree status) | 2 | 10 |
3 | Both degree and professional qualification | 3 | 3 |
3 | Post-graduate certificate or diploma | 4 | 4 |
3 | Post-graduate degree (masters) | 5 | 1 |
3 | Doctorate (Ph.D) | 6 | 14 |
3 | Total | 7 | 355 |
Basically, I want to
1) Sum Education_code's 1,2,3,4,5 & 6, for each district.
2) Divide this sum by Education_code == 7, for each district
Any help is greatly appreciated, thank you.
↧
Need help: xtabond with Difference GMM and System GMM in Panel Data
Hello everyone,
I'm trying to use the Stata 13 to estimate a Dynamic Panel Data with the Difference GMM and System GMM. The first difference equations are: Array
Where:
Li,t is the dependent variable (here is Leverage)
Xi,t-1 is the matrix of determinant of the dependent variable (including: prft, tang, growth, size)
Now, in the Differencing GMM (1991), I want to use the instruments including: Array
and in the System GMM (1998), the instruments I want to use in the first difference equations are Array
and for the level equations are: Array
My attempt is below:
1. The Difference GMM (1991) estimation:
2. The System GMM (1998) estimation:
Also, I have tried to use the xtabond2 command and I got this:
I know that there were several mistakes in all of my commands but I just could not figure out and make it right the way I want as I have stated earlier.
Here, the p_value are quite large and the number of instruments are larger than the number of groups. (I know my sample is too small but sadly I could not change it). I hope you can help me fix the command to get the significant results.
Also, I am a new Stata user, so please forgive me if I have made any foolish mistakes and feel free to let me know.
Thank you in advance! I do really hope to hearing from you soon!
I'm trying to use the Stata 13 to estimate a Dynamic Panel Data with the Difference GMM and System GMM. The first difference equations are: Array
Where:
Li,t is the dependent variable (here is Leverage)
Xi,t-1 is the matrix of determinant of the dependent variable (including: prft, tang, growth, size)
Now, in the Differencing GMM (1991), I want to use the instruments including: Array
and in the System GMM (1998), the instruments I want to use in the first difference equations are Array
and for the level equations are: Array
My attempt is below:
1. The Difference GMM (1991) estimation:
Code:
. xtabond leverage lagleverage lagprft lagsize laggrowth lagtang, lags(2) twostep vce(robust) artests(2) small small is a deprecated option note: lagleverage dropped because of collinearity Arellano-Bond dynamic panel-data estimation Number of obs = 33 Group variable: id Number of groups = 11 Time variable: t Obs per group: min = 3 avg = 3 max = 3 Number of instruments = 15 Wald chi2(6) = 109.04 Prob > chi2 = 0.0000 Two-step results (Std. Err. adjusted for clustering on id) ------------------------------------------------------------------------------ | WC-Robust leverage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- leverage | L1. | -.0022249 .1895428 -0.01 0.991 -.373722 .3692722 L2. | .1227548 .4008548 0.31 0.759 -.6629062 .9084158 | lagprft | .1286908 .0810308 1.59 0.112 -.0301267 .2875083 lagsize | .1109781 .0509849 2.18 0.030 .0110495 .2109066 laggrowth | .0180581 .011342 1.59 0.111 -.0041719 .040288 lagtang | -.5714003 .3501781 -1.63 0.103 -1.257737 .1149362 _cons | -1.062468 .9021153 -1.18 0.239 -2.830581 .7056455 ------------------------------------------------------------------------------ Instruments for differenced equation GMM-type: L(2/.).leverage Standard: D.lagleverage D.lagprft D.lagsize D.laggrowth D.lagtang Instruments for level equation Standard: _cons
Code:
. xtdpdsys leverage lagprft lagtang lagsize laggrowth, lags(2) twostep vce(robust) artests(2) System dynamic panel-data estimation Number of obs = 44 Group variable: id Number of groups = 11 Time variable: t Obs per group: min = 4 avg = 4 max = 4 Number of instruments = 18 Wald chi2(6) = 1236.16 Prob > chi2 = 0.0000 Two-step results ------------------------------------------------------------------------------ | WC-Robust leverage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- leverage | L1. | .1122024 .2660097 0.42 0.673 -.409167 .6335719 L2. | .232506 .1716271 1.35 0.176 -.103877 .568889 | lagprft | .2078303 .1267164 1.64 0.101 -.0405292 .4561899 lagtang | -.4553018 .0780344 -5.83 0.000 -.6082465 -.3023571 lagsize | .1162271 .0578604 2.01 0.045 .0028228 .2296314 laggrowth | .0109966 .0234193 0.47 0.639 -.0349044 .0568975 _cons | -1.304568 .8091934 -1.61 0.107 -2.890558 .2814224 ------------------------------------------------------------------------------ Instruments for differenced equation GMM-type: L(2/.).leverage Standard: D.lagprft D.lagtang D.lagsize D.laggrowth Instruments for level equation GMM-type: LD.leverage Standard: _cons
Code:
. xtabond2 leverage lagleverage lagprft lagsize lagtang laggrowth, noleveleq two robust small gmm( leverage lagprft lagsize lagtang laggrowth, lag(2 2)) Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm. Warning: Number of instruments may be large relative to number of observations. Warning: Two-step estimated covariance matrix of moments is singular. Using a generalized inverse to calculate optimal weighting matrix for two-step estimation. Difference-in-Sargan statistics may be negative. Dynamic panel-data estimation, two-step difference GMM ------------------------------------------------------------------------------ Group variable: id Number of obs = 55 Time variable : t Number of groups = 11 Number of instruments = 20 Obs per group: min = 5 F(5, 11) = 1.43 avg = 5.00 Prob > F = 0.288 max = 5 ------------------------------------------------------------------------------ | Corrected leverage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lagleverage | -.0217873 .3974929 -0.05 0.957 -.8966633 .8530887 lagprft | .0331341 .2335589 0.14 0.890 -.4809257 .5471938 lagsize | .0807723 .066558 1.21 0.250 -.0657209 .2272654 lagtang | -.4597916 .5534276 -0.83 0.424 -1.677877 .7582943 laggrowth | .0237899 .0384265 0.62 0.548 -.0607863 .108366 ------------------------------------------------------------------------------ Instruments for first differences equation GMM-type (missing=0, separate instruments for each period unless collapsed) L2.(leverage lagprft lagsize lagtang laggrowth) ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -0.91 Pr > z = 0.365 Arellano-Bond test for AR(2) in first differences: z = 0.45 Pr > z = 0.650 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(15) = 20.23 Prob > chi2 = 0.163 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(15) = 9.11 Prob > chi2 = 0.872 (Robust, but can be weakened by many instruments.)
Here, the p_value are quite large and the number of instruments are larger than the number of groups. (I know my sample is too small but sadly I could not change it). I hope you can help me fix the command to get the significant results.
Also, I am a new Stata user, so please forgive me if I have made any foolish mistakes and feel free to let me know.
Thank you in advance! I do really hope to hearing from you soon!
↧
non normal residuals in vecm??
Hi, I tried to estimate a vector error correction model, but unfortunately, when I ran some postestimation diagnostic tests such as Jacque-Bera test, it turned out that my errors are not normally distributed. Does anyone know how to deal with this? I mean, I assume if the errors aren't normally distribution, inference from my regression results is invalid. Also, my errors are skewed and the kurtosis test null was rejected as well.
thank you for any suggestibas ro how to handle non normally distributed residuals in a vec model
thank you for any suggestibas ro how to handle non normally distributed residuals in a vec model
↧
↧
Adjusted R² in Random Effects Model
Hello,
I'm making Regressions with the Random Effects Model, but I don't find the command for the adjusted R-Squared.
Is there any command for the adjusted R² or is there any command for a similar measure which is comparable with the adjusted R² ?
Thank you
I'm making Regressions with the Random Effects Model, but I don't find the command for the adjusted R-Squared.
Is there any command for the adjusted R² or is there any command for a similar measure which is comparable with the adjusted R² ?
Thank you
↧
Moran test with the command spautoc
Dear statalists
I try to perform the Moran test of spatial autocorrelation but it does not work. I get the error message "varlist not allowed" when I do the command:
spautoc region nei, weight(x)
With region an identifier variable in numeric
nei a variable containing neighborhood information in string (str 244) like this '' 5 7 13 15 16 17"
and x a weight variable also in string (str 244) containing also informations like thise '' .28381 .28381 .28381 .28381 .28381 .28381"
Any help will be very appreciated!
Many thanks in adavance!
I try to perform the Moran test of spatial autocorrelation but it does not work. I get the error message "varlist not allowed" when I do the command:
spautoc region nei, weight(x)
With region an identifier variable in numeric
nei a variable containing neighborhood information in string (str 244) like this '' 5 7 13 15 16 17"
and x a weight variable also in string (str 244) containing also informations like thise '' .28381 .28381 .28381 .28381 .28381 .28381"
Any help will be very appreciated!
Many thanks in adavance!
↧
Treatment effects with survey data
Hello,
I am working with a stratified sample that has sampling ((inverse probability) weights, which I've set up using svyset.
I am interested in testing a categorical outcome based on a categorical variable which conceptually could be thought of as a "treatment."
Does it make sense to use the Stata teffects command in this case, or does it make more sense not to mix the two paradigms? (I.e. Mix the teffects paradigm with the survey strata/cluster/weights specification).
Does it make more sense just to do the conventional logistic regression with the strata/cluster/weights all set up per the svyset command?
Appreciate any feedback,
John L.
I am working with a stratified sample that has sampling ((inverse probability) weights, which I've set up using svyset.
I am interested in testing a categorical outcome based on a categorical variable which conceptually could be thought of as a "treatment."
Does it make sense to use the Stata teffects command in this case, or does it make more sense not to mix the two paradigms? (I.e. Mix the teffects paradigm with the survey strata/cluster/weights specification).
Does it make more sense just to do the conventional logistic regression with the strata/cluster/weights all set up per the svyset command?
Appreciate any feedback,
John L.
↧
stset multiple failure data
Dear List
I have asked this question in different ways before, but it turns out I'm sure what i am doing right or wrong. I think my problem is how i stset my multiple failure data.
i have the following variables:
id
age_at_start_obs
age_at_end_of_obs
age_at_event
event
I want to stset my data so that i can calculate the incidence rates pr age band using fx stptime or by splitting the data by using stsplit
A dummy dataset
I am not sure how to best stset these data to achieve my goal.
I have tried the following
And this seems to give me the results that i what, but i am still wondering:
Is this the correct way to do it?
But when i stsplit my data:
id=1 i would suspect that this individual would have 10 lines in the split dataset, but there are only 9 (age_at_event=50-59).
Could anyone try to explain why that is?
Hope you can look through it and see if I'm off by a mile.
Thank you
Lars
I have asked this question in different ways before, but it turns out I'm sure what i am doing right or wrong. I think my problem is how i stset my multiple failure data.
i have the following variables:
id
age_at_start_obs
age_at_end_of_obs
age_at_event
event
I want to stset my data so that i can calculate the incidence rates pr age band using fx stptime or by splitting the data by using stsplit
A dummy dataset
Code:
input id age_at_start_obs age_at_end_of_obs age_at_event event 1 50 60 55 1 1 50 60 56 1 1 50 60 59 1 2 40 45 . 0 2 40 45 44 1 3 75 80 . 0 3 75 80 76 1 3 75 80 77 1 end
I have tried the following
Code:
replace age_at_event=age_at_end_of_obs if age_at_event==. stset age_at_event, id(id) fail(event) exit(age_at_end_of_obs) enter(age_at_start_obs) stptime, at(35(1)81)
Is this the correct way to do it?
But when i stsplit my data:
Code:
stsplit years, every(1) replace event=0 if event==. tab years event
Could anyone try to explain why that is?
Hope you can look through it and see if I'm off by a mile.
Thank you
Lars
↧
↧
Need help: Calculating marginal effect of interaction terms after -ivprobit-
Hi everyone,
I need to determine the marginal effects of interaction terms after the -ivprobit command. Here is my code:
* ivprobit fminorsucc (csr1 csr1_capint2 csr1_bBindex1 csr1_volt = csr1iv2 c.csr1iv2#c.capint2 c.csr1iv2#c.bBindex1 c.csr1iv2#c.volt) capint2 bBindex1 volt i.indid ydum10-ydum18 dismissal outsider0 pceofminor2 interim ceopay1seq madjemp avaslack dratio eBindex lambda hsstatus2 madjfirmage madjroe , vce(cluster compid) nolog
The endogenous variable is csr1. The instrumental variable is csr1iv2. I also include interaction terms: csr1_capint2, csr1_bBindex1, and csr1_volt. My issue is that -ivprobit- does not allow me to use interaction operators on the endogenous variable list.
Consequently, I have the following questions:
1) Would margins command still provide accurate marginal effects in this case? If not, how should I address this issue?
2) I have also tried the approach specified by Wiersema & Bowen (2009), which provides the code to calculate the true interaction terms using the formula in Ai & Norton (2003). However, after ivprobit, the standard errors is not accurate. Specifically, I tried to calculate marginal effects of csr1 in the model without interaction terms using both Wiersema & Bowen(2009) approach and -margins- command. The marginal effect is the same, but the Wiersema & Bowen( 2009) approach gives me insignificant results, while the -margin- command gave me significant results. Is there any other way to calculate the true interaction effect after -ivprobit-?
Thank you very much for your consideration and guidance.
References:
Ai, C., & Norton, E. C. (2003). Interaction terms in logit and probit models. Economics letters, 80(1), 123-129.
Wiersema, M. F., & Bowen, H. P. (2009). The use of limited dependent variable techniques in strategy research: Issues and methods. Strategic Management Journal, 30(6), 679-692.
I need to determine the marginal effects of interaction terms after the -ivprobit command. Here is my code:
* ivprobit fminorsucc (csr1 csr1_capint2 csr1_bBindex1 csr1_volt = csr1iv2 c.csr1iv2#c.capint2 c.csr1iv2#c.bBindex1 c.csr1iv2#c.volt) capint2 bBindex1 volt i.indid ydum10-ydum18 dismissal outsider0 pceofminor2 interim ceopay1seq madjemp avaslack dratio eBindex lambda hsstatus2 madjfirmage madjroe , vce(cluster compid) nolog
The endogenous variable is csr1. The instrumental variable is csr1iv2. I also include interaction terms: csr1_capint2, csr1_bBindex1, and csr1_volt. My issue is that -ivprobit- does not allow me to use interaction operators on the endogenous variable list.
Consequently, I have the following questions:
1) Would margins command still provide accurate marginal effects in this case? If not, how should I address this issue?
2) I have also tried the approach specified by Wiersema & Bowen (2009), which provides the code to calculate the true interaction terms using the formula in Ai & Norton (2003). However, after ivprobit, the standard errors is not accurate. Specifically, I tried to calculate marginal effects of csr1 in the model without interaction terms using both Wiersema & Bowen(2009) approach and -margins- command. The marginal effect is the same, but the Wiersema & Bowen( 2009) approach gives me insignificant results, while the -margin- command gave me significant results. Is there any other way to calculate the true interaction effect after -ivprobit-?
Thank you very much for your consideration and guidance.
References:
Ai, C., & Norton, E. C. (2003). Interaction terms in logit and probit models. Economics letters, 80(1), 123-129.
Wiersema, M. F., & Bowen, H. P. (2009). The use of limited dependent variable techniques in strategy research: Issues and methods. Strategic Management Journal, 30(6), 679-692.
↧
Convert 'accessibility' driving time to logarithmic variable, but it contains zeros
Hello everyone,
Currently I am running a convergence regression. I want to include a variable on the accessibility of a region. The data I have contain several variables on accessibility, all which are driving time in minutes by car to the next high-way access, train station, airport or to several measures of agglomeration centers. However, some values are zero, thus indicating that there is on average no driving time.
The model I have is a log-log model. Of course, except for the dummy variables all variables have been converted to logarithmic function. However, of course when I transform the accessibility variables which contain a value of 0 it generates the 'missing values'.
My question is, what should I do?
A. Nothing and leave the 'missing values' out, but that would mean that all the regions which are very accessible are not included in the regression.
B. Replace all the missing values for 0, thus "replace lnACC_MAJOR = 0 if (lnACC_MAJOR == .)", however that does not seem like a good practice to me.
C. I should not convert the accessibility variables to logarithms.
D. Other, .....
Additionally, I want to include commuter flows, however I try to include net flow, which results in negative values for certain regions, thus higher outflow than inflow of commuters. However, negative values cannot be converted into logs. What should I do here?
Thanks in advance.
Currently I am running a convergence regression. I want to include a variable on the accessibility of a region. The data I have contain several variables on accessibility, all which are driving time in minutes by car to the next high-way access, train station, airport or to several measures of agglomeration centers. However, some values are zero, thus indicating that there is on average no driving time.
The model I have is a log-log model. Of course, except for the dummy variables all variables have been converted to logarithmic function. However, of course when I transform the accessibility variables which contain a value of 0 it generates the 'missing values'.
My question is, what should I do?
A. Nothing and leave the 'missing values' out, but that would mean that all the regions which are very accessible are not included in the regression.
B. Replace all the missing values for 0, thus "replace lnACC_MAJOR = 0 if (lnACC_MAJOR == .)", however that does not seem like a good practice to me.
C. I should not convert the accessibility variables to logarithms.
D. Other, .....
Additionally, I want to include commuter flows, however I try to include net flow, which results in negative values for certain regions, thus higher outflow than inflow of commuters. However, negative values cannot be converted into logs. What should I do here?
Thanks in advance.
↧
State-year fixed effects vs state specific year trends
Hello,
I wanted to clarify if my understanding of including interaction terms in a panel estimation model is correct. Say, we observe outcomes for individual i in group j in state s at time t along with some covariates x. Would the factor variables generated by i.state#c.year be referred to what is known as state specific year trends whereas i.state#i.year would mean state by year fixed effects?
Thanks!
I wanted to clarify if my understanding of including interaction terms in a panel estimation model is correct. Say, we observe outcomes for individual i in group j in state s at time t along with some covariates x. Would the factor variables generated by i.state#c.year be referred to what is known as state specific year trends whereas i.state#i.year would mean state by year fixed effects?
Code:
use http://www.stata-press.com/data/r13/nlswork, clear xtset idcode year * assume industry code are state codes rename ind_code state * state specific year trends? xtreg ln_wage i.state#c.year, fe * state by year fixed effects? xtreg ln_wage i.state#i.year, fe
↧
xtivreg2 and the endogeneity tests
Hi,
I am using the xtivreg2 command to estimate a FE-IV model. I would like to ask two questions which regard the endogeneity test, and the versions of it, produced by xtivreg2.
1. If I understand it correctly, if the "robust" and "cluster" options are specified in the xtivreg2 command, xtivreg2 calculates a version of the endogneiety test that is robust to heteroskedasticity and serial correlation within panel groups. I would like to see the exact formula used to calculate the endogeneity test. In the documentation file for "ivregress postestimation" (http://www.stata.com/manuals13/rivre...estimation.pdf) page 15 gives some information. From this information, I suspect the statistic in question is Wooldridge’s (1995) score test because the documentation file states "(...) this test can be made robust to heteroskedasticity, autocorrelation, or clustering by using the appropriate robust VCE (...)". On the other hand, the documentation of ivreg2 does not specify the exact formula used. Which endogeneity test is used here? Where can I find the exact formulas calculating the versions of the test that are robust to heteroskedasticity, robust to serial correlation in the errors within panel groups, and robust to both at the same time?
2. The command I use to obtain the endogeneity test is "xtivreg2 dependent $model, fe endog(varone vartwo) robust cluster(panelid)" which leads to "-endog- option: Endogeneity test of endogenous regressors: 24.977 Chi-sq(2) P-val = 0.0000". I also wanted to check the result produced by the "estat endogenous" command that can be executed after the ivregress command. For this, I first execute "xtdata dependent independent instruments, i(panelid) fe clear" to obtain the differenced data so that I could execute "ivregress 2sls dependent $model, robust cluster(panelid)". Following this, I execute the "estat endogenous" command which leads to "Robust regression F(2,12814) = 12.1324 (p = 0.0000) (Adjusted for 12815 clusters in HHIDPN)". I expected that the endogeneity option of the xtivreg2 command and the output of the estat endogenous command both use the same endogeneity test and hence lead to the same test value. Apperantly they use different statistics, or that my approach of differncing the data and using the ivregress command incorrect, although I cannot really think of a reason why it would be incorrect because I obtain the very same coefficient estimates in xtivreg2 and in ivregress after differencing. From the documentation of "estat endogenous", the formula used to calculate the endogeneity test, that is robust to heterskedasticity and serial correlation within panel groups, is not clear to me. Is there a Stata reference that clearly states the statistics and formulas used by estat endogenous?
Tunga
I am using the xtivreg2 command to estimate a FE-IV model. I would like to ask two questions which regard the endogeneity test, and the versions of it, produced by xtivreg2.
1. If I understand it correctly, if the "robust" and "cluster" options are specified in the xtivreg2 command, xtivreg2 calculates a version of the endogneiety test that is robust to heteroskedasticity and serial correlation within panel groups. I would like to see the exact formula used to calculate the endogeneity test. In the documentation file for "ivregress postestimation" (http://www.stata.com/manuals13/rivre...estimation.pdf) page 15 gives some information. From this information, I suspect the statistic in question is Wooldridge’s (1995) score test because the documentation file states "(...) this test can be made robust to heteroskedasticity, autocorrelation, or clustering by using the appropriate robust VCE (...)". On the other hand, the documentation of ivreg2 does not specify the exact formula used. Which endogeneity test is used here? Where can I find the exact formulas calculating the versions of the test that are robust to heteroskedasticity, robust to serial correlation in the errors within panel groups, and robust to both at the same time?
2. The command I use to obtain the endogeneity test is "xtivreg2 dependent $model, fe endog(varone vartwo) robust cluster(panelid)" which leads to "-endog- option: Endogeneity test of endogenous regressors: 24.977 Chi-sq(2) P-val = 0.0000". I also wanted to check the result produced by the "estat endogenous" command that can be executed after the ivregress command. For this, I first execute "xtdata dependent independent instruments, i(panelid) fe clear" to obtain the differenced data so that I could execute "ivregress 2sls dependent $model, robust cluster(panelid)". Following this, I execute the "estat endogenous" command which leads to "Robust regression F(2,12814) = 12.1324 (p = 0.0000) (Adjusted for 12815 clusters in HHIDPN)". I expected that the endogeneity option of the xtivreg2 command and the output of the estat endogenous command both use the same endogeneity test and hence lead to the same test value. Apperantly they use different statistics, or that my approach of differncing the data and using the ivregress command incorrect, although I cannot really think of a reason why it would be incorrect because I obtain the very same coefficient estimates in xtivreg2 and in ivregress after differencing. From the documentation of "estat endogenous", the formula used to calculate the endogeneity test, that is robust to heterskedasticity and serial correlation within panel groups, is not clear to me. Is there a Stata reference that clearly states the statistics and formulas used by estat endogenous?
Tunga
↧
↧
How to generate a dummy variable with a max value
Hi,
I currently have a list with birth order, and I would like to generate a new variable that will describe the last born child. Therefore, it will have a value of 1 for the last born child. How is this possible?
Thank you in advance.
I currently have a list with birth order, and I would like to generate a new variable that will describe the last born child. Therefore, it will have a value of 1 for the last born child. How is this possible?
Thank you in advance.
↧
Inserting image in Stata graph
Is there a way to insert/paste an image into a Stata graph from within the Stata environment? Let me explain what I need. I have written a program to read in some data, calculate some index numbers, and produce a set of graphs, which are then exported to other formats (TIF/EPS etc.). I need to insert an institutional logo/symbol into each graph. Currently this is being done outside Stata using a graphics editing software, working with the exported graphs. That of course adds a delay to the process, which it would be very desirable to avoid, since this is a repeated task with a tight deadline. So I was wondering if there is some way to automate this process from within Stata, e.g., using a command to insert an external graphic into a special field within the Stata graph (in its graph/plot region). I wonder if any other member has handled this type of task, and could offer some suggestion how to best achieve it? Thanks & regards.
↧
xtfmb command: How to store first-stage regression coefficients
Hi, I am using the Fama-MacBeth (1973) cross-sectional regressions (xtfmb command) and am content with my results; however, I want to be able to store the cross-sectional regression coefficients for each time period. Is there a way to extract this information while using the command, or will I need to do this manually by doing a loop with cross-sectional regressions for each time period? I want to avoid the latter because the command already has the coefficients and newly-west standard errors automatically computed.
↧