Quantcast
Channel: Statalist
Viewing all 73296 articles
Browse latest View live

Need Help to Produce a Descriptive Statistics Table

$
0
0
Hello

I created a regression table and now I would like to get descriptive data for the variables in my regression. I do not want general descriptive information for each variable, but specific descriptive info for the variables used. For instance, in my model I have income. The variable has 2310 total observations in my data set, but in my regression (with other variables) the observations are limited to 1370. I want to get descriptive information for this subset (1370) without having to drop all missing values for all variables I am using in my regression. Any suggestions?

I've searched the internet to find a solution, but I have not found anything.

Thanks!

Multiple imputation using chained equations - imputing specific income from a broad income range

$
0
0
Hello,

I am doing some descriptive analysis (frequency distributions and crosstabulations by income) of cross-sectional survey data (n = 11,002). I have one independent variable which is income and a maximum of 10 dependent variables. The income variable that I will be using in all of the crosstabulations has 30.6% item non-response. I'd like to impute the missing cases as I've done additional analyses that indicates the cases are missing at random. Since I plan to impute income I will also impute the missing values for the other dependent variables 7/10 variables have missing data). I have consulted with a biostatistician who suggested I use multiple imputation using chained equations to impute.

I have two income variables in my database. The first variable is binary and asks respondents to choose from a broad income range (under $45,000 versus over $45,000). There is 22.1% item non-response to this question. The second income variable is more specific with 9 categories. Everyone in the sample who did not respond to the broad income range variable also did not respond to the specific income question -- but an additional 8.5% of people who responded to the broad income question did not respond to the specific income question (making the total item non-response to the specific income variable 30.6%). I'd like to impute the more specific income for the 22.1% who did not respond to the broad income question and I would also like to impute the specific income for the 8.5% of the people who reported a broad income range but not specific income. I'd like to use the broad income range to help in the imputation for this latter group if possible.

Basically I want to say "For those cases where specific income is missing, if broad income = 0 (under $45,000) then impute specific income anywhere between categories 1-4 (which represents incomes below $45,000) or if broad income = 1 (over $45,000) then impute specific income anywhere between categories 5-9 (which represents incomes above $45,000). I am not sure how to do this in Stata (I have version 12) and if I would be using bounds to do so.

Thanks so much in advance!
Linda Wood

C-statistic after xtlogit, re

$
0
0
Hello,

I am analyzing a series of nested models using -xtlogit, re-. I would like to compare their model fit using the C-statistic. Is there a way to compute the C-statistic after -xtlogit, re-? Are there other statistics, besides C-statistics that I could use to assess model fit after -xtlogit, re-?

Thank you,
Caroline

Hypothesis test using xtmelogit model

$
0
0
Hi all,
I've created a 3-level mixed effects binary model using the following command:
xtmelogit binvar var1 var2, || var3: || var4: , binomial(3)

My question is how to use this model to find the 95% CI of var1 where binvar would be positive. Or conversely, how to test the hypothesis if (var1 > constant1), then binvar = 1.

I appreciate any tips.

Thank you,
Richard

"Offical Stata Johansen-Juselius tests for cointegration"

$
0
0
Dear all,
I am not want sure whether by stating that "Stata has an offical Johansen-Juselius tests for cointegration", Kit Baum referred to -vecrank- as I have noticed that credit is only given to Johansen (there is no mention of Juselius K. http://www.stata.com/manuals13/tsvecrank.pdf)
A second question is whether having stationary variables in the system in -vecrank- is an issue?
Thanks,
Anat

Beta coefficients of marginal effects

$
0
0
Hello,

I am writing my master thesis and one of the main regression I perform is a probit. In order to compare the effect of different variables I want to show the different beta coefficients, but not of the coefficients I obtain running the probit command but the beta coefficients of the marginal effects.

Due to a selection problem in my framework I can not use the command "dprobit"in order to estimate the marginal effects.Instead I use the command "margins dx/dy"after having used the command "heckprobit"

Do you know how I can obtain the beta coefficients? Do I just need to multiply the coefficients I get from the command "margins" by σjy (j being the variable of the coefficient and y the dependent variable)?

Thank you very much in advance.

Best regards,

Marco

Generating agegroups with a Loop

$
0
0
Dear all,
I have been trying to find a way to use 'foreach' or 'forvalues' loop to generate agegroups. I have a survey dataset including and individual age variable. Now I would like to create agegroups in steps of 4 (<1, 1-4, 5-9, 10-14, ....., 95-99, >100). The way I am able to do it is continuing as shown below, but as there are more tasks like this I need to do in my analysis it would be very helpful to be able to do it a shorter (and less error prone) way. I tried to use 'forvalues' and give the ages in steps, but I cannot find a solution yet.
Eventually I will need to attach a specific growth rate to each age group which I will try to do in a further loop, or, if possible, in the same one. (That is if it is possible to attach different values to each agegroup in a loop, I imagine I would need to enter each value and then the loop would be redundant.)
Any help or suggestion would be very appreciated!
Thank you very much!

gen agegroup=.
replace agegroup=1 if age<1
replace agegroup=2 if age>=1 & age<=4
replace agegroup=3 if age>=5 & age<=9
replace agegroup=4 if age>=10 & age<=14
replace agegroup=5 if age>=15 & age<=19
...

forvalues age = 1(1)4 5(1)9 10(1)14 15(1)19 20(1)24 25(1)29 30(1)34 35(1)39 {
gen agegroup
}

Compare 2 groups, omitted because of fixed effects

$
0
0
Hello Statalist,

I'm comparing a list of ''Best Companies for Working Mothers'' with a peer group that never have been in that list (with the best suitable match, constructed by hand)
Want to compare the financial performance of the two groups ( ROA is the financial performance measure)
Y variable= ROA, controls are= FirmSize logemployees Risk ResearchandDevelopmentExpense, variables of interest are dummies= HighCompensation SupportforEducation
Listed YN=dummy variable, is a company family friendly=1, 0 otherwise.
Therefore a company has always a 1 when listed, and 0 when never be listed (no variety, so omitted)
I would like to compare the ROA, and see if listed has a positive/negative effect. However, it is omitted since a company will always receive the same value (1 or 0).
How can I solve it that I can use Listed yes/no to see the effect on ROA, with fixed effects? Or is there another way?

Hereby the stata codes:

Stata 13: I have 11 years of panel data, 1 observation per year, 1219 oberservations total.

destring HighCompensation SupportforEducation
xtset GlobalCompanyKey DataYearFiscal
xtreg ROA ListedYN FirmSize logemployees Risk ResearchandDevelopmentExpense HighCompensation SupportforEducation, fe cluster (CompanyName)


Thank you very much for helping!

Fabian

non-nested RE in xtmixed returning error

$
0
0
Hi,

I am using xtmixed to try to calculate non-nested RE for multiple levels and receiving the error “likelihood evaluates to missing” after >15 minutes of evaluation. The code I am using is as follows:

xtmixed depvar ||_all: R.sic ||_all: R.country ||_all: firm_id

There are approximately 15,000 rows in the data and about 11,000 unique firm_ids. I noticed that there was one other post on Statalist that didn’t seem to be answered about the same issue (by David Chan). For reference, I’m using Windows 7 Enterprise and Stata 13.1.

Thanks,
Megan

Filter variables based on middle and last characters of the var name

$
0
0
I need to filter for variables scattered throughout a large-ish dataset based on a combination of characters in the middle of the var names. Although this wildcard notation (awo*b~C) is not valid, it represents conceptually what I would like to filter. For example, I need to select all the variables from all waves (the wave # is the * in "awo*b~C) for scale "b" that have been recoded as "correct" indicated by the letter "C" added to the end of the original var name. I need to repeat variations on this for many different tasks.

This seems like a simple thing to do, but so far I see no obvious way to do this efficiently.
  • Tried various wildcards (like * and ? and ~; in the variables manager and in command lines) .
  • Search archives such as Stata Tips (like inlist and inrange).
  • Also, the rather unsystematic construction of the var names does not lend itself to the ways I normally use loop functions like foreach or forvalues
Can someone please advise how to do this? ~~Thanks!

~~~Using ~ Stata14 ~ Windows~~~

Gravity model problem

$
0
0
Hello,
I am trying to use the gravity model at cross-section and panel data. When I use the "bys year" prefix, the coefficients are reasonable and simillar to other studies. However, when I use panel data modeling using xtreg, areg ,... and including the year fix effects as predictor, the coefficients are in opposite sign. Can anyone help me how to interpret it or how to modify with the regression specification to reach the expected-sign coefficients.
Thanks

bys year: reg lexp LGDP1 LGDP2 ldist adja intra remote1_head remote2_head ddestination* dorigin*, robust
problem with:
areg lexp LGDP1 LGDP2 ldist adja intra remote1_head remote2_head ddestination* dorigin*, robust cluster(pairid) a(year)
or
xtreg lexp LGDP1 LGDP2 ldist adja intra remote1_head remote2_head ddestination* dorigin* i.year, robust

rounding ID numbers in adjacency matrix

$
0
0
Hi everyone,

I am new to Stata and Mata (please bear with me!). I just recently learned how to create adjacency matrices using mata. When I cross-checked the original data set with the matrix, I noticed that the matrix looked funny. All the ID numbers in the matrix are rounded to the nearest 100,000th (91600000 instead of 91578541). I adapted the code from UCLA ATS http://www.ats.ucla.edu/stat/stata/code/adj_matrix.htm. I would appreciate any insights.

Best,

Sarah Trinh

Time-varying fixed effects

$
0
0
Dear all,

I am currently working on my master's thesis where I estimate a FE model which looks the following:

y_it = a_0 + a_i + x_it*beta1+ crisis*(d_0 + d_i + x_it*beta2) + epsilon_it,

where "crisis" is a dummy that has value one with the beginning of recent financial crisis in Europe. "a_i" and "d_i" are country-specific fixed effects and "x_it" explanatory variables. The model allows to investigate changing coefficients "beta1" to "beta1 + beta2" and "a_i" to "a_i + d_i" when the crisis occurs.

Currently I am struggeling with an argumentation whether to include country-specific fixed effects in the crisis interaction or not. This is done in a reference paper. However, results seem to be more economically meaningful when not including the "d_i"-dummies.

The economic argumentation would be - as I understand - that there are time-constant unobservables for each country that only change with the beginning of the crisis.
But are there other econometric argumentations to include these "d_i"? Is it somehow necessary because I use FE estimation? Or is there no econometric cause - only the economic theory described at the beginning of this paragraph? And would you agree on this argumentation or not?

I would very much appreciate some thoughts on this problem.
Thanks in advance!

Chris

Multivalued treatments with endogeneity

$
0
0
I have a multi-valued treatment. It is quite possible that the model does not meet the endogeneity conditions for teffects.

When I run each of the treatment values against a background condition, the etregress estimates suggest it violates endogeneity for some of the values of the treatment but not all.

Is there an estimator for multi-valued treatments with endogeneity?

Phil

Using putexcel with return vector of standard errors of regression coefficients

$
0
0
I'm trying to execute what I think is a fairly straightforward task using -putexcel-, but for some reason I can't figure out the right commands.

I run:
:
sysuse auto
reg price weight mpg i.foreign
putexcel set auto, sh("test2") modify
putexcel A1=matrix(e(b)',rownames)
and now I'd like to put into C1 the vector of standard errors associated with those coefficients. I can retrieve the variance-covariance matrix by using, eg,

:
putexcel H1=matrix(e(V))
but I don't want the entire variance-covariance matrix, just the standard errors.

I've previously noticed that reg y x1 x2 x3 does create a vector of standard errors, because I can also
:
putexcel C1 = (_se[weight]) C2 = (_se[mpg]) C3 = (_se[0.foreign]) C4 = (_se[1.foreign]) C5 = (_se[_cons])
which is odd, because ereturn list does not indicate there is a matrix called se(b). I tried to use putexcel E1=matrix(e(se)) in case it was that simple. It's not, of course; Stata executes without error, but overwrites E1 with a blank cell.

In my actual example I have >600 covariates whose coefficients and SE's I'm trying to enter using putexcel, so manually typing out the name of each covariate is not an attractive solution. Nor is returning the variance-covariance matrix in Excel and taking the square roots of the diagonal elements within Excel.

I've seen some solutions to related problems using Mata, but as I have not previously used Mata, I have not been able to figure out how to incorporate those solutions in with putexcel in this framework.

Many thanks for your assistance,
Robert





Low value of estimated Pareto shape

$
0
0
Dear STATA community,

I am trying to estimate the parameters of a productivity distribution. My prior is it follows a Pareto distribution with a shape parameter higher than 1 (so that the mean exists). However, when I use the Paretofit command, it yields a low Pareto shape (around 0.3). I was wondering what the problem is. I realized that when I choose a high scale parameter, the shape parameter comes closer to 1, but that requires me to eliminate a large amount of my observations. Any suggestions are greatly appreciated.
Thanks,

Tuan Luong

Passionate Request for DSGE-Stata Lecture Notes

$
0
0
Dear all,

This request seems rather off-beat but I didn't have any other choice than resorting to this Forum after my search on Google for DSGE models in Stata turned up nothing! I'm a Doctoral student using Stata13. Please I need any 'lecture notes' on DSGE modeling wrt Stata (basic concepts, introduction and advanced). Most lecture notes I saw on Google made use of Matlab, so if you have any material that can be of help kindly send to me. I can assure you that it will be for my personal use and not for re-distribution.

I've always obtained assistance on this Forum and I know that I will obtain same for this request.

Thanks in anticipation!....Ngozi

prediction command after selmlog

$
0
0
Dear Statalist,

I am running selmlog to study impact of different adaptation strategies on yield. As in my case I have more than one strategy so I had to work with multinomial endogenous switching regression model and had to use selmlog command rather than movestay command as used for simple endogenous switching regression model. After estimation, model allow to estimate treatment effect by using estimates of regression. However in the help file of "movestay" I have found the following command to estimate treatment effect for both both adapters and non-adapters

mspredict newvarname [if exp] [in range] [, statistic]

where statistic is one of

Psel the probability of being in regime 1; this is the default statistic

xb1 the linear prediction for the regression equation in regime 1

xb2 the linear prediction for the regression equation in regime 2

yc1_1 the expected value of the dependent variable in the first regime conditional on the dependent variable being observed in that regime. For example, if earning function is
modeled for two sectors (regimes), then this option predicts the wage rate in sector one for the individual who is currently employed in that sector.

yc2_1 the expected value of the dependent variable in the first equation conditional on the dependent variable not being observed. I.E., that option predicts the wage rate in
sector two for the individual currently working in sector one.

yc1_2 the expected value of the dependent variable in the second equation conditional on the dependent variable not being observed. I.E., that option predicts the wage rate in
sector one for the individual currently working in sector two.

yc2_2 the expected value of the dependent variable in the second equation conditional on the dependent variable being observed in that regime. I.E., that option predicts the
wage rate in sector two for the individual currently working in sector


But I didn't find any such prediction command in the help file of selmlog. As syntax for selmlog is:

selmlog depvar varlist [ifexp][inrange],select(depvar_m=varlist_m) [lee dmf(#) dhl(# [all]) showmlogit wls bootstrap(number_of_replications[sample_size]) mloptions (mlogit options) gen(variable generic name)]

I use "gen" command for prediction of yield but it did not report results of yield and give same results that I have when I use gen(rho_1). I am unable to understand what is going on there and how to do predictions of depend variable. My base paper is "How African Agriculture Can Adapt to Climate Change? A Counterfactual Analysis from Ethiopia" by Salvatore Di Falco and Marcella Veronesi

Your help will be highly appreciated.

Regards
Ayesha

Running Poi’s Quaids manually with nlsur. Problem with computing price elasticity

$
0
0
Dear All

I would really appreciate some help on this. I am using Stata 13. I need to run quaids model manually using nlsur for a four good model to estimate income and price elasticity. Following is how I ran my codes to estimate QUAIDS model, compute expenditure elasticities from it and own price elasticity for good1. The expenditure elasiticities matches exactly with the outputs when using POI’s built-in quaids suite, but the own price elasticity does not (slight mismatch on second decimal figure). Can anyone help me with this? Thanks a lot.

Sami


#delimit;

* lnexptot= ln of total expenditure on the four goods;
*lnprice= ln of prices;
*w = expenditure shares;

scalar a_0 = 5;
nlsur quaids @ w1 w2 w3 lnprice1-lnprice4 lnexptot, ifgnls nequations(3) param(a1 a2 a3 b1 b2 b3 g11 g12 g13 g22 g23 g33 l1 l2 l3) nolog;

matrix B = e(b)';
matrix list B;

nlcom (a1:_b[/a1]) (a2:_b[/a2]) (a3:_b[/a3])
(a4:1-_b[/a1]-_b[/a2]-_b[/a3])
(b1:_b[/b1]) (b2:_b[/b2]) (b3:_b[/b3])
(b4:-_b[/b1]-_b[/b2]-_b[/b3])
(g11:_b[/g11]) (g12:_b[/g12]) (g13:_b[/g13])
(g14:-_b[/g11]-_b[/g12]-_b[/g13])
(g22:_b[/g22]) (g23:_b[/g23])
(g24:-_b[/g12]-_b[/g22]-_b[/g23])
(g33:_b[/g33]) (g34:-_b[/g13]-_b[/g23]-_b[/g33])
(g44:-(_b[/g11]-_b[/g12]-_b[/g13]) -
(_b[/g12]-_b[/g22]-_b[/g23]) -
(_b[/g13]-_b[/g23]-_b[/g33]))
(l1:_b[/l1]) (l2:_b[/l2]) (l3:_b[/l3])
(l4:-_b[/l1]-_b[/l2]-_b[/l3]);

/* Scalars for parameter estimates */;

scalar a1 = _b[/a1];
scalar a2 = _b[/a2];
scalar a3 = _b[/a3];
scalar a4 = 1 - a1 - a2 - a3;
scalar b1 = _b[/b1];
scalar b2 = _b[/b2];
scalar b3 = _b[/b3];
scalar b4 = 0 - b1 - b2 - b3;
scalar g11 = _b[/g11];
scalar g12 = _b[/g12];
scalar g13 = _b[/g13];
scalar g14 = 0 - (g11 + g12 + g13);
scalar g21 = g12;
scalar g22 = _b[/g22];
scalar g23 = _b[/g23];
scalar g24 = 0 - (g12 + g22 + g23);
scalar g31 = g13;
scalar g32 = g23;
scalar g33 = _b[/g33];
scalar g34 = 0 - (g13 + g23 + g33);
scalar g41 = g14;
scalar g42 = g24;
scalar g43 = g34;
scalar g44 = 0 - (g14 + g24 + g34);
scalar l1 = _b[/l1];
scalar l2 = _b[/l2];
scalar l3 = _b[/l3];
scalar l4 = 0 - (l1 + l2 + l3);

predict eshare*

/* This produces the same estimates tan those obtained by running the
aforementioned quaids command */;
* FOR ELASTICITIES;
* Preliminaries;

gen lnb = b1*lnprice1 + b2*lnprice2 + b3*lnprice3 + b4*lnprice4;

gen bindex = exp(lnb);

gen lna = a_0 + a1*lnprice1 + a2*lnprice2 + a3*lnprice3 + a4*lnprice4 +
(1/2)*(g11*lnprice1*lnprice1 + g12*lnprice1*lnprice2 + g13*lnprice1*lnprice3 + g14*lnprice1*lnprice4 +
g21*lnprice2*lnprice1 + g22*lnprice2*lnprice2 + g23*lnprice2*lnprice3 + g24*lnprice2*lnprice4 +
g31*lnprice3*lnprice1 + g32*lnprice3*lnprice2 + g33*lnprice3*lnprice3 + g34*lnprice3*lnprice4 +
g41*lnprice4*lnprice1 + g42*lnprice4*lnprice2 + g43*lnprice4*lnprice3 + g44*lnprice4*lnprice4);

gen aindex = exp(lna);


/* INCOME ELASTICITIES */
gen mu1 = b1 + ((l1*2)/bindex)*(lnexptot - lna);
gen mu2 = b2 +((l2*2)/bindex)*(lnexptot - lna);
gen mu3 = b3 + ((l3*2)/bindex)*(lnexptot - lna);
gen mu4 = b4 + ((l4*2)/bindex)*(lnexptot - lna);
gen em1 = (mu1/w1) + 1;
gen em2 = (mu2/w2) + 1;
gen em3 = (mu3/w3) + 1;
gen em4 = (mu4/w4) + 1;

sum em*;

*calculating own price elasticity for "good 1";
gen muu1=((l1*b1)/bindex)*(lnexptot - lna)^2;
gen mm1=a1+(g23*lnprice2)+(g33*lnprice3);
gen pe1= g11-((b1+mu1)*mm1)-muu1;
gen ope1=-1+(pe1/w1);
sum ope1;

#delimit cr
exit

r(199) error in function evaluator program with nl

$
0
0
Dear stata list users,

I want to estimate a non-linear least squares model but seem to be unable to create a function evaluator program.
The Name of my program is myreg.
The error I get is the following:

nlmyreg returned 199
verify that nlmyreg is a function evaluator program



I attach a sample of my data that contains my LHS variable, rel_estab_size_c_interp, and the RHS variable, ageq.


Here is the program :


nl myreg @ rel_estab_size_c_interp ageq, parameters(gg a2 a3 q delta zeta) initial(gg 1.0008873 a2 0.5 a3 0.3 q 20 delta 0.018 zeta 2.8)


program nlmyreg
version 12
syntax varlist (min=2) if, at(name)
local rel_estab_size_c_interp: word 1 of `varlist'
local ageq: word 2 of `varlist'
// Retrieve parameters
tempname gg a2 a3 q delta zeta
scalar `gg' = `at'[1,1]
scalar `a2' = `at'[1,2]
scalar `a3' = `at'[1,3]
scalar `q' = `at'[1,4]
scalar `delta' = `at'[1,5]
scalar `zeta' = `at'[1,6]
// Calculate correction factor
matrix define z = J(`q',2,0)
matlist z
forval s = 0/`q' {
z[s+1,2] = ( `gg'^s / ( 1 + `a2' * `a3'^s) )^`zeta'
z[s+1,1] = `delta' * (1-`delta')^s
}
generate double `size_avg_estab' = sum(z(:,1).*z(:,2))/sum(z(:,1)) `if'
// Fill in dependent variable
replace `rel_estab_size_c_interp' = (((`gg'^`ageq')/(1+`a2'*`a3'^`ageq'))^`zeta'))/`size_avg_estab' `if'
end



Thanks in advance for any help.
Best,
Eniko
Viewing all 73296 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>