Quantcast
Channel: Statalist
Viewing all 73157 articles
Browse latest View live

Why variables are insignificant

$
0
0
I am trying to identify the factors that most affects 'Access to electricity' using 24 countries over a two year span. Even though this is a short panel data set the article I am replicating used a similar approach.
My results are;
Code:
 xtreg accesstoelectricityofpopulatione loans renew gdp rents edu var24, re

Random-effects GLS regression                   Number of obs      =        48
Group variable: country                         Number of groups   =        24

R-sq:  within  = 0.7520                         Obs per group: min =         2
       between = 0.0001                                        avg =       2.0
       overall = 0.0011                                        max =         2

                                                Wald chi2(6)       =     58.62
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
accesstoel~e |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       loans |   .0287223   .0651532     0.44   0.659    -.0989756    .1564202
       renew |  -.0032136   .1823182    -0.02   0.986    -.3605507    .3541234
         gdp |  -.0009196   .0005703    -1.61   0.107    -.0020374    .0001982
       rents |  -.0590546   .0385032    -1.53   0.125    -.1345195    .0164103
         edu |  -.0051925   .0076635    -0.68   0.498    -.0202127    .0098277
       var24 |   2.309331   .3686467     6.26   0.000     1.586796    3.031865
       _cons |   72.37429    5.85379    12.36   0.000     60.90107     83.8475
-------------+----------------------------------------------------------------
     sigma_u |  28.521862
     sigma_e |  1.1818464
         rho |  .99828596   (fraction of variance due to u_i)
------------------------------------------------------------------------------
My question is not only the variables are insignificant the expected coefficients are signs are generating. Foe example GDP per capita (gdp in model) must be positively correlated with the dependent variable. However it is not true with my case.
Can someone please suggest me any solution?
data goes like this,
time country access loans renew gdp rents edu
1 1 41 3.9 0 1629 2.37 45
2 1 43 4.3 0 1933 1.75 48
1 2 52.2 66 0 2401 4.5 53
2 2 59.6 85 0 2763 3.8 60

STATA v14 and v13 give different results

$
0
0
Running melogit and gsem on STATA v14 provide results but when I re-run the same thing on STATA v13 (home installation) it gives me an error read "initial value not feasible". Is the mechanism behind version 14 and 13 different?

I have read STATA's convergence problem, changed number of iteration, obtain parameter value as new starting value, change integration method, but none helps.

Commands for regression in difference-in-difference design

$
0
0
Hi experts

I've searched, but couldn't find any threads about the actual execution of regression in a difference-in-difference design.

I'm interested in how (X) employee sees their leaders leadershipstyle's effect on (Y) employee sickness absence over a two-year period: before leadership training and after.
So, Perceived leadershipstyle --> sickness absence.

- For X (leadershipstyle) I have three indexes; one for each leadershipstyle, 0-100.
- For Y (sick abscence) I have a variable measured in number of days.
- I have a time-variable measuring 0: before treatment, 1: after treatment.
- I have a treatment-variable with the four groups: one for each of the three leadershipstyle and a control group.
- An id-variable with a unique number for each of the employees.
- Furthermore a range of control variables.

I've changed the dataset for panel data.

Now, how should I proceed using a difference-in-difference design?

Thanks!

Procedure (long-run) error correction model

$
0
0
Hi,

I am currently trying to create my long-run relationship (cointegrating equation) where I mainly look at the relationship between A and B. I have never done an ECM model before, but with the help of others and the internet this is the current procedure I came up with. Can you guys tell me if I am working in the right direction?

To start it is important to know if the variables that I am using are of the same order of integration. Therefore I test the dependent and independent variables that I possible want to include in my long-run relationship.

For example testing of for the order of integration of a control variable age I first look at the number of lags to use for the dfuller test (this case 2 on basis of selection criteria):

The maxlag is set at 36 because I have monthly data (total of 11 years). Verbeeks guide to modern econometrics recommends that with monthly data the max number of lags should be set at least to 36

Code:
varsoc age, maxlag(36)
dfuller age, lag(2)
Conclude by looking at the critical values if there is unit root or not.

Code:
varsoc d.age, maxlag(36)
dfuller d.age, lag(4)
Conclude by looking at the critical values for the order of integration.

I do this for all the variables that I want to potentially include in my long-run regression. If they are all of the same order of integration I start with "making" the long-run relationship.

This is done by simply starting with the simplest regression between the dependent and independent variable and then adding variables and look if they are significant --> if so keep them in.

This results in my case in:
Code:
reg mean_A mean_B age ltv fund_c_deposits i.dummy_year
Then I predict the residual and look if the residual is of I(0).

Code:
predict e, resid
varsoc e, maxlag(36)
Lag selection criteria --> include 5 lags

Code:
dfuller e, lags(5)
Then I look if the critical values of the residuals indicate I(0)

Can you guys tell me if this is anywhere close to the appropriate procedure for estimating the long-run relationship between A and B.

counting the number of categories of a given variable

$
0
0
Hi,

Is there a command that I could use if I want to count how many categories a given variable in my data set is comprised of? Rather than having to manually count them?

Thanks in advance

Using cmp for discrete/continuous estimation

$
0
0
Hi everyone,

I had another thread about this sort of problem, but received good advice about using the cmp command. The context is this. I am estimating a discrete-continuous choice model where individuals choose where to live and, given where they live, how much to work, consume, and "use" their house. Each location has a different amenity (pollution), which also affects their utility.

However, leisure, consumption, and housing are all endogenous, so I also instrument for each of them. cmp is extremely useful to allow for this possibility and still estimate a discrete choice - truly a remarkable command.

However, I got some problems when estimating it that I did not know how to diagnose completely. I am copying a subset of the output (not the estimates) below. But, I will direct attention towards some matrices being ill conditioned and the collinear regressors. However, all these variables work if I'm using reg3 -- there's no reason why they should be ill conditioned or collinear. The only thing I can think of is that it's an extremely tough problem to optimize. I haven't even added the fixed effects in yet...

cmp (lwage_hourly = lleisure lcons_nondur lpoll $aqX $aqstX) (lleisure=$ivweather) (lcons_nondur = $ivcons) (lhprice = lcons_house lcons_nondur lpoll $aqX $aqstX) (lcons_house = $ivhouse) (move = lleisure lcons_nondur lcons_house $aqX $aqstX) (location = lleisure lcons_nondur lcons_house $aqX $aqstX) [w=perwt],indicators($cmp_cont $cmp_cont $cmp_cont $cmp_cont $cmp_cont $cmp_probit $cmp_oprobit) cluster(county)
(sampling weights assumed)

Fitting individual models as starting point for full model fit.
Note: For programming reasons, these initial estimates may deviate from your specification.
For exact fits of each equation alone, run cmp separately on each.

-------------------------------------------------------------------------------

Warning: regressor matrix for lwage_hourly equation appears ill-conditioned. (Condition num
> ber = 1526747.4.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.


----------------------------------------------------------------------------------

Warning: regressor matrix for lleisure equation appears ill-conditioned. (Condition number
> = 196065.85.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.


--------------------------------------------------------------------------------------

Warning: regressor matrix for lcons_nondur equation appears ill-conditioned. (Condition num
> ber = 2253.5663.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

Source | SS df MS Number of obs =235581188
-------------+------------------------------ F( 34,235581153) = .
Model | 32507335.3 34 956098.096 Prob > F = 0.0000
Residual | 99403557.6235581153 .421950382 R-squared = 0.2464
-------------+------------------------------ Adj R-squared = 0.2464
Total | 131910893235581187 .559938145 Root MSE = .64958


-------------------------------------------------------------------------------

Warning: regressor matrix for lhprice equation appears ill-conditioned. (Condition number =
> 1263661.2.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

Source | SS df MS Number of obs =238163538
-------------+------------------------------ F( 1,238163536) =80511.68
Model | 10561.4387 1 10561.4387 Prob > F = 0.0000
Residual | 31242044.7238163536 .131178959 R-squared = 0.0003
-------------+------------------------------ Adj R-squared = 0.0003
Total | 31252606.2238163537 .131223304 Root MSE = .36219

------------------------------------------------------------------------------
lcons_house | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ltrantime | .0088839 .0000313 283.75 0.000 .0088225 .0089453
_cons | 9.577865 .0001006 9.5e+04 0.000 9.577668 9.578062
------------------------------------------------------------------------------

Iteration 0: log likelihood = -1.279e+08
Iteration 1: log likelihood = -1.004e+08
Iteration 2: log likelihood = -98963302
Iteration 3: log likelihood = -98953638
Iteration 4: log likelihood = -98953634

Probit regression Number of obs = 1947523
LR chi2(34) = 5.79e+07
Prob > chi2 = 0.0000
Log likelihood = -98953634 Pseudo R2 = 0.2263


-------------------------------------------------------------------------------

Warning: regressor matrix for moved equation appears ill-conditioned. (Condition number = 1
> 110405.6.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

Iteration 0: log likelihood = -2.679e+08
Iteration 1: log likelihood = -2.405e+08
Iteration 2: log likelihood = -2.392e+08
Iteration 3: log likelihood = -2.392e+08
Iteration 4: log likelihood = -2.392e+08


Note: 14 observations completely determined. Standard errors questionable.

Warning: regressor matrix for _cmp_y7 equation appears ill-conditioned. (Condition number =
> 1110405.6.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

Fitting full model.

cmp_lnL(): 3499 halton2() not found
<istmt>: - function returned error
Mata run-time error
Mata run-time error

Information Criteria

$
0
0
Hello everyone,
I have a question and I would appreciate it if you could help,

I have 2 models-one unadjusted, and the other adjusted for a covariate, using quantile regression.

I am trying to obtain AIC and BIC for each model, so after running the first model (qreg V1 V2, quantile .5), for example, I typed (estate ic). But I get the following error message: likelihood information not found in last estimation results.

How can I get AIC and BIC in quantile regression?

Thank you very much,
Nila

Diff-in-Diff regression with panel data using weights from psmatch2: How to use weights with xtreg re?

$
0
0
Dear all,

I am using the current Version Stata 14 on Windows.
First, I want to provide a short explanation of what my analysis is:
I have an unbalanced panel of firm data for the years 2000-2014. I investigate the consequences of successions in family firms on firm performance using a difference-in-differences estimation approach on a matched sample. In my initial sample I have around 1600 firms out of which 235 firms experienced a succession in one year. To create a matched sample with control firms similar to the treated firms I use propensity score matching applying the Stata psmatch2 command. I consider firms that experienced a succession in one year as treated and firms that never experienced a succession as untreated.
After the matching procedure I run a diff-in-diff panel regression (using xtreg re) to evaluate whether the performance in the years after succession of firms with a succession differs from those firms that did not experience a succession. As performance measures I look at several different outcomes (from Survey answers or balance sheet information) such as the expected development of Business, the expected the development of employment, credit allocation, capital expenditures, debt, cash flow, roa etc.

So in my first step I run a logit regression and obtain pscores. For the logit regression I collapse my dataset to the firm Level and extimate the logit regression in the cross-section. I Regress the dummy of Treatment (succession yes or no) on several firm characteristics such as firm Age, firm Age squared, legal form, industry and employment size dummies.
Here is the code for that step:

* collapse data to firm level
collapse succession_yes state industry year_of_incorporation legal_form employment employment_size l_employment firm_age firm_age_cat state_business exp_business exp_employment orders diff_finan credit_alloc debt capex total_assets size_assets total_equity tangible_assets cash_flow cash_cash_equivalent roa sales operating_revenue gross_profit_loss, by(IDNUM_ZAEHLER)

*logit
logit succession_yes firm_age firm_age_2 i.r_legal_form i.r_employment_size i.industry
est store model1
predict pscore1


In the next step I apply the matching algorithm using psmatch2. For my baseline I use nearest-neighbor matching (1-to-1) without replacement imposing a caliper of 0.05 and common support option. I had to modify the matching procedure because of the following problems I encountered:
1) I looped over all years to guarantee that treatment and controls are taken always from same year
2) before matching I need to exclude firms that are treated in a year other than i, so that those can't be used as controls in year i (because later in the diff-in-diff I look at performance in the following years after treatment)
3) I need to exclude firms that were used as controls in year i (so they can't be used again as controls in other years)
4) I re-run the matching for every outcome as some of the outcomes have a lot worse data availability (many missing) and I wanted each match to create a sample as big as possible

Here is the code:
* loop over possible outcomes

foreach o in $outcomes_survey $outcomes_bs {


*go to folder
cd "${root}/${succession}/results/analysis/1NN-caliper0-05/`o'"


* loop over all years to guarantee that treatment and controls are taken always from same year

* replace outcome here
capture drop outcome
gen outcome = `o'
label variable outcome "`o'"
*1 nearest neighbor without replacement, caliper 0.05
capture drop ident treated control pscore treated2 support weight2 id_2 nn n1 pdif
capture drop _pscore _treated _support _weight _id _n1 _nn _pdif _outcome
foreach var in ident treated control pscore treated2 support weight2 id_2 nn n1 pdif {
gen `var' = .
}
local start = 2000
local end = 2014
forvalue i = `start'(1)`end' {
qui count if year == `i' & succession == 1 & pscore1 != .
local decideon = 0
local decideon = r(N)
if `decideon' > 0 {
capture drop _pscore _treated _weight _id _n1 _nn _pdif
set seed 123456
*DEALING WITH TREATED
*before matching I need to somehow exclude firms that are treated in a year other than i, so that those can't be used as controls in year i
*tagging firms treated in year other than i
sort IDNUM_ZAEHLER year
bysort IDNUM_ZAEHLER (year): gen treatnot`i'=1 if succession==1 & year!=`i'
count if treatnot`i'==1
bysort IDNUM_ZAEHLER: carryforward treatnot`i', gen(treatnot`i'2)
gsort IDNUM_ZAEHLER - year
bysort IDNUM_ZAEHLER: carryforward treatnot`i'2, gen(treatnot`i'final)
cap drop treatnot`i' treatnot`i'2
xtsum treatnot`i'final
sort IDNUM_ZAEHLER year
*save dataset containing firms treated in year other than i
preserve
by IDNUM_ZAEHLER (year): keep if treatnot`i'final==1
save data/treatnot`i'dataset.dta, replace
restore
*drop firms treated in year other than i
sort IDNUM_ZAEHLER year
by IDNUM_ZAEHLER (year): drop if treatnot`i'final==1
*MATCH
capture psmatch2 succession if year == `i' & pscore1 != .,out(`o') p(pscore1) neighbor(1) common caliper(.05) noreplacement
capture replace year_dummy = 1 if _treated!=. & year == `i'
capture replace ident = 1 if _weight != . & year == `i'
capture replace treated = 1 if _treated == 1 & _support == 1 & year == `i'
capture replace control = 1 if _treated == 0 & _support == 1 & year == `i'
capture replace pscore = _pscore if year == `i'
capture replace treated2 = _treated if year == `i'
capture replace support = _support if year == `i'
capture replace weight2 = _weight if year == `i'
capture replace id_2 = _id if year == `i'
capture replace n1 = _n1 if year == `i'
capture replace nn = _nn if year == `i'
capture replace pdif = _pdif if year == `i'
qui count if succession == 1 & year == `i'
di r(N) " treated firms exist in year = `i' "
qui count if _treated == 1 & year == `i'
di r(N) " treated firms are identified by the command in year = `i' "
qui count if _treated == 1 & _support == 0 & year == `i'
di r(N) " treated firms were off support in year = `i' "
*drop variable treatnot i
cap drop treatnot`i'final
*append dataset containing firms treated in year other than i
merge 1:1 IDNUM_ZAEHLER year using data/treatnot`i'dataset.dta
drop _merge
drop treatnot*final
*DEALING WITH CONTROLS
**drop firms that were used as controls in year i (so they can't be used again as controls in other years)
*tag controls
sort IDNUM_ZAEHLER year
bysort IDNUM_ZAEHLER (year): gen control`i'=1 if _treated == 0 & _weight == 1 & year == `i'
count if control`i'==1
bysort IDNUM_ZAEHLER: carryforward control`i', gen(control`i'2)
gsort IDNUM_ZAEHLER - year
bysort IDNUM_ZAEHLER: carryforward control`i'2, gen(control`i'final)
cap drop control`i' control`i'2
xtsum control`i'final
*problem now, as all control firms are dropped, we need to save them and add back in the end
preserve
sort IDNUM_ZAEHLER year
by IDNUM_ZAEHLER (year): keep if control`i'final!=.
if `i' == `start' {
save data/controldataset.dta, replace
}
else {
append using data/controldataset.dta
}
save data/controldataset.dta, replace
restore
*drop controls in i
sort IDNUM_ZAEHLER year
by IDNUM_ZAEHLER (year): drop if control`i'final!=.
cap drop control`i'final
}
}
*merge back controls
merge 1:1 IDNUM_ZAEHLER year using data/controldataset.dta
drop _merge
drop control*final
}


After that I looked at the quality of the match (balancing properties and graph pscore density). I will not post this part here.

As my last step I now want to run the difference-in-differences estimation using the matched sample given by the psmatch2 routine.
For the estimation I want to Regress my outcomes (=firm performance) on a dummy indication succession (yes, no), a dummy indicating the years post-succession (post = 1 if years after succession, 0 otherwise), the treatment effect is then the interaction of succession and post variable. As further controls I include the firm characteristics I used in the logit regression when I calculated the pscores.

In order to run this regression I first need to define the post variable for the matched control firms. For that I use the year of succession for treated firms to compute the counterfactual year also for the matched control group.

* generate post_c with a fake succession event for control group
gen post_c=1 if ident==1 & treated2==0 & weight2==1
* post_c for all years after fake succession
sort IDNUM_ZAEHLER year
forvalues i = 1/15 {
bysort IDNUM_ZAEHLER: replace post_c=1 if ident[_n-`i']==1 & treated2[_n-`i']==0 & weight2[_n-`i']==1
}


The next problem I encountered was than that the weight2 variable is only non missing in the year of succession, but whole firms should be included, otherwise I cant look at the development of performance after succession. So I created a variable that includes the whole firm ID.

* extend weight variable to whole idnum instead of just one year
sort IDNUM_ZAEHLER year
cap drop inmatch
bysort IDNUM_ZAEHLER (year): gen inmatch=1 if weight2 == 1
count if inmatch==1
cap drop inmatch2
bysort IDNUM_ZAEHLER: carryforward inmatch, gen(inmatch2)
gsort IDNUM_ZAEHLER - year
cap drop inmatchfinal
bysort IDNUM_ZAEHLER: carryforward inmatch2, gen(inmatchfinal)
cap drop inmatch inmatch2
xtsum inmatchfinal
sort IDNUM_ZAEHLER year


So now I can finally run my diff-in-diff estimation using the weights from the psmacth2 which I extended to include the whole firms:

I first run pooled OLS:
* DiD treatment effect
xi: reg outcome succession_yes##post firm_age firm_age_2 i.legal_form i.employment_size i.industry i.year [aw=inmatchfinal], cluster(IDNUM_ZAEHLER)
estimates store didatt1`v'

But to account for my panel data I actually want to run panel OLS using random effects.

xi: xtreg outcome succession_yes##post firm_age firm_age_2 i.legal_form i.employment_size i.industry i.year if inmatchfinal!=., re rob
estimates store didatt2`v'

My problem here is that no aweights are allowed with panel OLS RE.
Since my weight with the 1-1- matching is always 1, it should not matter and I just run xtreg re on all nonmissings.
But as robustness tests I run different matching algorithms ( 2NN, 5NN, radius and caliper). When using those matching techniques weights differ by firm and are smaller than 1. As far as I understand how I should run the diff-in-diff on the matched sample, I would have to use the weights also in the xtreg re regression for my panel data. But weights are not allowed for the Stata command xtreg re. I read that the population-averaged xtreg is supposed to be similar to xtreg re. So I tried to run xtreg pa rob instead and include the weights as pweights. But this does not work neither because the weights are not constant within the panel.
So how can I run a panel random-effects OLS regression (diff-in-diff) including the weights from matching?



I hope my procedure and estimations are clear. Your help is greatly appreciated.

I have the following questions:
- Is the Stata code how I perform the matching correct given my research question and data structure?
- Is my understanding of the matching procedure and how I apply it to the diff-in-diff estimation later correct? To run the regression on the matched sample is it enough to use the weights from psmatch2 or do I need to somehow differently account for the pairs created my the match? Because the way it is now, I just run the regression on a smaller sample than the full sample but I do not account for which controls are matched to which treated firms, correct? Or do the weights take care of that?
- And especially important for matching algorithms other than 1 NN: How can I run a panel OLS with XTREG RE including weights??


Thank you in advance,
Marina







What is the best way to plot regression weights, including interaction from SEM analysis?

$
0
0
I have developed a SEM model (latent growth model to be precise) that regresses a latent variable (the latent intercept) on several dichotomous predictors. Trouble is, I also test for interactions, which makes it all the more difficult to understand for the reader.

(I did the analysis with another software than Stata, this software has limited graphical capabilities (Mplus). In Stata I would need to use both -sem- and -gsem- (gsem due to categorical indicators in some analysis), at least that's what I believe. Another option is R/lavaan, but I prefer Stata over R.)

So, within Stata, how should I get plots for regression weights estimated in a SEM-model (-sem- and -gsem-)?

I have three main effects: a, b, and c. All these three variables are dichotomous, a and b are membership in religious groups, c is gender. (Nonreligios are scored zero on both a and b.) I then add two dichtomous variables representing interaction effects between religious affiliation and gender (a*c and b*c, both are dichotomous).

Thus:

Code:
latent intercept <- religousgroup1 religiousgroup2 gender religiousgroup1female religiousgroup2female
The results are bound to be confusing for many people unless I use plots. I found Ben Jann's presentation of -ceofplot- interesting. Before I start digging into this on my own: I wonder which package/approach I should consider first while educating myself on how to use plots for regression weights obtained with -sem- and -gsem-.



Creating difference from the average of the previous period without using tsset (lags)

$
0
0
Hey,

Currently I am trying to create differences from the Rate with the Mean_A of the previous period. So for the 2006m2 observations I want to create the difference 4 - 4.25 = -0.25, the next one 4.4 - 4.25 = 0.15 etcetera. Since I have multiple observations and do not want to collapse these (by creating means) it is not possible to use the
Code:
tsset
and
Code:
l.
commands Since then you get
Code:
. tsset lastrateadj
repeated time values in sample
r(451);
. Does any of you have a suggestion on how to create the differences in this situation?

Code:
Date        Rate    Mean_A
2006m1    4.3    4.25
2006m1    3.9    4.25
2006m1    4.8    4.25
2006m2    4       2.39
2006m2    4.4    2.29
Kind regards,

Danny


ml program question: right syntax to call parameters of an equation

$
0
0
I want to display and use a component of an equation in my ml program, but I couldn't find the right syntax that would call it. lnsigu2 is modeled with a constant and x1, but all of the last three lines below display the same number. For example, display `lnsigu2:_cons' doesn't work to display lnsigu2 equation's constant's parameter during the maximization process. display _b[lnsigu2:_cons] doesn't work during the maximization. So what is the right syntax to display the constant in the lnsigu2 equation throughout the maximization process?


Code:
program mlprog
        args todo b lnf

        tempvar xb lnsigu2 lnsigw2
        mleval `xb' = `b', eq(1)
        mleval `lnsigu2' = `b', eq(2)
        mleval `lnsigw2' = `b', eq(3)

.....

display `lnsigu2'
display `lnsigu2:_cons'
display `lnsigu2:x1'

.....

end





Export summary statistics using tabstat and estpost (esttab)

$
0
0
Hello,

after reading through the forum I found that esttab often causes issues. I hope you can help me with my problem.
I am trying to create a table in stata and export it to Word. In Stata this works fine, but as soon as I export it all the data is lost.

My data contains survey results. I have data from two groups (marked by either 1 or 2).

This is how my table looks like in Stata:

group | e(Y1_5) e(Y6_10) e(totalY) e(totalX)
-------------+----------------------------------------------------------------
1 | 2.886792 3.924528 6.811321 1.867925
2 | 3.041667 3.8125 6.854167 1.708333

Code:
 
global list1 var1 var2 var3 var4
global format1 rtf

estpost tabstat $list1, by(group) stat(mean)
esttab . using Table1.$format1, replace label cells("mean(fmt(%12.0fc)")
1. I would like to have stata use my labels instead of variable names.
2. I would like to export this into Word. So far I used the code (see above) but that did not work (i.e. I got an empty table).

This is what I get in Word:
(1)
mean
Observations 101

Thank you so much in advance!

Deciles of variable from entire dataset based on breakpoints of variable from part of the dataset

$
0
0
Dear community,


my name is Batuhan and I am new to stata. I have one problem that I am dealing with for a long time now and could not resolve yet. I hope you can help me.

I have monthly stock return data from NYSE, NASDAQ and AMEX (stock exchanges). Based on my data, I have calculated the Momentum (MOM) and now I need to categorize my MOM data in deciles based on NYSE breakpoints. The MOM deciles need to be refreshed monthly.

That means that first, I have to compute the MOM deciles and the breakpoints based only on my data from the NYSE. I have to used the calculated breakpoints in order to compute my MOM deciles for the entire database (not only the NYSE).

My problem is that I need to refresh my MOM deciles monthly. What I tried to do is:

I have computed the MOM deciles for my NYSE data only for each month. --> egen NYSE_decile = xtile(cumul), by(month_id) nq(10)
My problem is that I need the breakpoints (not the deciles: 1,2,3...,9,10) which I need to use in order to compute my MOM deciles for the entire database. --> egen MOM_Decile = xtile(cumul), by(month_id) nq (10) cutpoints(NYSE_decile).
I know that, in order to get the breakpoints, I need to use pctile not xtile. But when I use egen NYSE_decile = pctile(cumul), by(month_id) nq(10) genp(percent), I am told that nq() & genp() not allowed.

To sum it up, what could I do in order to compute my MOM deciles for the whole dataset (AMEX, NYSE, NASDAQ stocks) based on the NYSE-breakpoints?

It would be a huge help if you could help me out.


Best,

Batuhan

Deleting by ID

$
0
0
Hey guys, First of all , i'm German,so please excuse My english :D.
I have one big Problem.
I got a dataset Based on Gyms. Every Gym got an ID ( f.e 2001 / 2002 / 2003 etc. ).
To every Gym 2-10 People answered some Questions, and then got sortet by their Gyms.
F.e Person 1- 2001
Person 2 - 2003
Person 3 - 2001

Now i should delete all Gyms in which less then 5 Person answered the Questions. But there is no Variable that shows how much People answered The Questions . Do you got any idea how i can delete then?

Greetings !

Outreg2 Summary Stats: How to treat zeros as missing

$
0
0
Dear Users,

I work with survey data and there are two types of questions. First type is basic demographics which were asked everyone and the second type is entrepreneurship related questions which were asked only relevant individuals. When I use below command to get summary stats:
bysort female: outreg2 using data.doc , replace sum(log) eqdrop(min max)

I get wrong number of observations.

More specifically, I would like to get no of observations for 1s not for 0s. Any suggestions?

Many thanks for your time and help.


age male female businessowner
32 0 1 1
39 0 1 1
40 0 1 1
48 0 1 0
24 0 1 0
33 0 1 0
35 0 1 1
54 0 1 0
36 0 1 1
22 0 1 0





Count distinct observations per row for several variables

$
0
0
Dear Users,

I have a dataset that contains many variables, but in this instance 84 variables of interest (cal_MeX where X=11-17, 21-27, 31-37, 41-47; cal_MiX where X=11-17, 21-27, 31-37, 41-47; cal_DisX where X=11-17, 21-27, 31-37, 41-47 - in other words (cal_Me11-cal_Dis47) (representing 84 variables). Each participant (row) has an observation for each of these variables (although there are some missing within each row).

Each variable represents a depth measurement (integer) at a separate site, and the integers range from -3 to 6.

I am trying to write syntax to count the number of observations that ==2. I am hoping to generate a variable (nsites2) that when tabulated will give a frequency distribution of the number of observations that ==2 per row e.g.

tab nsites2 // would hopefully give:

==2 Freq.
0 x
1 y
2 z

and so on, with x, y, z indicating how many observations ==2 there are per row in the dataset.

I have tried egen anycount; egen count; egen rowtotal, (followed by one of the if statements below), but it appears that a cumulative total/sum results, not a count of the observations.

Also, I am having some trouble with how to indicate the range of variables of interest. I have tried:

if (cal_Me11-cal_Dis47) ==2

and also

if inlist(2, cal_Me11, cal_Me12, cal_Me13, cal_Me14, cal_Me15, cal_Me16, cal_Me17, cal_Me21, cal_Me22, cal_Me23, cal_Me24, cal_Me25, cal_Me26, cal_Me27, cal_Me31, cal_Me32, cal_Me33, cal_Me34, cal_Me35, cal_Me36, cal_Me37, cal_Me41, cal_Me42, cal_Me43, cal_Me44, cal_Me45, cal_Me46, cal_Me47, cal_Mi11, cal_Mi12, cal_Mi13, cal_Mi14, cal_Mi15, cal_Mi16, cal_Mi17, cal_Mi21, cal_Mi22, cal_Mi23, cal_Mi24, cal_Mi25, cal_Mi26, cal_Mi27, cal_Mi31, cal_Mi32, cal_Mi33, cal_Mi34, cal_Mi35, cal_Mi36, cal_Mi37, cal_Mi41, cal_Mi42, cal_Mi43, cal_Mi44, cal_Mi45, cal_Mi46, cal_Mi47, cal_Dis11, cal_Dis12, cal_Dis13, cal_Dis14, cal_Dis15, cal_Dis16, cal_Dis17, cal_Dis21, cal_Dis22, cal_Dis23, cal_Dis24, cal_Dis25, cal_Dis26, cal_Dis27, cal_Dis31, cal_Dis32, cal_Dis33, cal_Dis34, cal_Dis35, cal_Dis36, cal_Dis37, cal_Dis41, cal_Dis42, cal_Dis43, cal_Dis44, cal_Dis45, cal_Dis46, cal_Dis47)

but these each give different results (or an invalid syntax or a code 198 error); therefore, I am not sure which syntax to use.

I apologise if this question has been asked before (I have searched for hours and could not find a solution that fits my particular problem with such a long lost of variables of interest).
Please let me know if I need to provide more information.

Thank you for your time and help, it is much appreciated.

WTP econometric models

$
0
0
Hi statalist users,

I'm trying to estimate WTP values ​​for three different types of public services, ( water supply, aqueduct and waste management ) . So my first thought is use a multinomial logit model (no order presence, and no correlation asumptions) but I have questions:

1. Does the model is right for this kind of sets, or estimate a different model for each one? / these because all three services will be pay in on bill (sum of prices or values).
2. Can I use different independent variables for each public service equation on the multinomial logit?
3. How can I estimate the mean WTP for each one?

Thanks for any kind of help that you can provide me,

All the best.

Julíán


Using assert with missing observations

$
0
0
Dear all

Suppose i have a dataset with 5 variables. var1 and var2 have 100 observations while var3, var4 and var5 look like this:

Code:
var3  var4   var5
54     12      .
56     15    167
89     17    190
34     18    198
.      .       .
.      .       .
.      .       .
.      .       .
I am trying to use assert to see if a condition is satisfied but i do not get the desired result. I have tried with both !missing() and without:

Code:
assert var4 < var4[_n+1] & var5 > var5[_n+1]

assert var4 < !mi(var4[_n+1]) & !mi(var5) > !mi(var5[_n+1])
The assertion ought to be true but i suspect the missing values are interfering.

How can i get the right answer in the above example?

Downloading the levpet command (Levinsohn and Petrin)

$
0
0
Greetings my fellow researchers, students, professionals.

I have a question to ask you regarding the downloading of the levpet command for Stata 14.0

I have read online as well as in your forum that to download this command all you have to do is type ssc install levpet and you should have it. Unfortunately, I have tried this on Stata doesn't it doesn' download it. I would also like to inform you that i have tried the same procedure with other commands such as the xtmg (ssc install xtmg) and i has worked fine, but with the levpet command, there's no progress.

I have downloaded my Stata version from the university archives so I doubt that it is broken or incomplete.

Do you have any suggestions as to why this is happening? Is there another way I can download this command?

I appreciate your time spent on reading my question.

Best regards

Product and sum operators in same variable

$
0
0
Hello,

I am attempting to create the following variable in Stata. I am having trouble with both the sum and product operators. I already have data values for both V and P, so I just need to put everything together. I'm not sure if it's possible to use one loop or potentially several. Can anyone suggest any functions or methodologies that I could employ? Thank you very much.

Trevor


Array
Viewing all 73157 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>