Drop a matrix row

April 11, 2016, 6:34 am

≫ Next: Survival analysis - seed germination data

≪ Previous: VECM interpretation/ serial correlation

Dear Mata users,

I'm trying to drop a matrix row (or column, the matrix being symmetric), or at least not counting it in the row() function.
E.g :

Consider the following matrix

Code:

mata
M=(0,4,0,0\ 4,0,2,0\ 0,2,0,0\ 0,0,0,0)
r=rows(M)
r

The row function counts 4 rows. However the last one is empty.
I'd like in this case to return "3", i.e. the number of non empty rows.

How could I do that?
Best,
Charlie

↧

Survival analysis - seed germination data

April 11, 2016, 6:35 am

≫ Next: Modelling random effect interactions in mixed-effects linear regression

≪ Previous: Drop a matrix row

Hi all,

I am trying to analyze germination data. Initially I thought of using ANOVA, but realized the problems of autocorrelation of the data, so I decided to use the survival analysis. My data look like this:

| parcela epoca varied~e tratam~o repeti~o dsemen~a dcontagem germina |
|-----------------------------------------------------------------------------------|
| 1 1 1 1 1 3/4/2012 10/4/2012 76 |
| 1 1 1 1 1 3/4/2012 17/4/2012 8 |
| 2 1 1 1 2 3/4/2012 10/4/2012 80 |
| 2 1 1 1 2 3/4/2012 17/4/2013 11 |
| 3 1 1 1 3 3/4/2012 10/4/2012 84 |

dsemen is the date of the experiment; dcontagem is the date when we counted the number of seeds that had germinated. We started with 100 seeds. The first row of the data shows that we began the experiment on April 3, 2012. We counted the number of seeds that had been germinated (24 in total), so the variable germina has a value of 76 (seeds that have not germinated). We did a second count on April 17 and only 8 seeds had not germinated. So each combination of variety (variedade), treatment (tratamento) and repetition (repeticao) has two rows in the data.

Here are the set of commands that I used:

gen dia = substr(dcontagem,1,2)
gen mes = substr(dcontagem,4,1)
gen ano = 2012
destring dia, replace
destring mes, replace

gen eventdate = mdy(mes, dia, ano)
format eventdate %td

global time eventdate
global event germina
global group variedade tratamento repeticao

sum $time $event
stset $time, failure($event)

streg i.variedade , nohr distribution(weibull)

The results from streg are as follow:

Weibull regression -- log relative-hazard form

No. of subjects = 800 Number of obs = 800
No. of failures = 687
Time at risk = 15314695
LR chi2(4) = 0.65
Log likelihood = 3175.5148 Prob > chi2 = 0.9569

------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
variedade |
2 | .0532471 .1224401 0.43 0.664 -.186731 .2932253
3 | .0953093 .1211829 0.79 0.432 -.1422048 .3328235
4 | .0605199 .1222239 0.50 0.620 -.1790345 .3000743
5 | .0669383 .1220104 0.55 0.583 -.1721977 .3060744
|
_cons | -5707.152 172.1548 -33.15 0.000 -6044.569 -5369.735
-------------+----------------------------------------------------------------
/ln_p | 6.360893 .0301637 210.88 0.000 6.301773 6.420012
-------------+----------------------------------------------------------------
p | 578.7628 17.4576 545.5383 614.0107
1/p | .0017278 .0000521 .0016286 .0018331

I think my data structure is not appropriate to the survival analysis I am doing because I don't have a censor variable. Instead I have the total number of seeds before the experiment began, the total number of seeds that germinated in time 1, and the total number of seeds that germinated in time 2. Any help is most welcome.

Benedito

↧

Modelling random effect interactions in mixed-effects linear regression

April 11, 2016, 7:05 am

≫ Next: Whole dataset changes

≪ Previous: Survival analysis - seed germination data

Hi everyone,

I've just converted from SPSS to STATA, and was just wondering if anyone knew how to model interactions between random effects in mixed-effect models.

In my design, I have two within-subject factors for every subject (day, and trial blocks within each day) and I modeled this interaction in SPSS using:

/RANDOM = intercept day*block | SUBJECT(subject_ID) COVTYPE(UN)

But I can't seem to find out how to model this same interaction in STATA, as I can only specify each random effect separately:

|| Subject_ID: Day Block, covariance(unstructured)

I just started using STATA so any help would be awesome!

Cheers,

Al

↧

Whole dataset changes

April 11, 2016, 7:49 am

≫ Next: Logit Model Sample [Targets (1 year per firm), and Non-targets (multiple years per firm)]

≪ Previous: Modelling random effect interactions in mixed-effects linear regression

I am not sure how to search for this in the help guide - is there a stata command which would change every value within my dataset that is -9999.99 to missing, or do I have to run a do file for all varnames?

↧

Logit Model Sample [Targets (1 year per firm), and Non-targets (multiple years per firm)]

April 11, 2016, 8:08 am

≫ Next: Table descriptive

≪ Previous: Whole dataset changes

Dear all,

Currently I am conducting research on predicting takeover targets for US Public firms. I am planning to generate a model that estimates the probability whether a company is a target or not. To do so, I am considering to conduct a binomial (conditional?) logistic regression with the following outcomes: 0: Non-target; 1: Target.

To eventually come up with a valid model, I need to combine the target company data with the non-target company data. However, the non-target company data consists of multiple firm-years, whereas the target companies have only one year of data. Since this seems highly incomparable to me, I think it is appropriate to match the target companies with the non-target companies by year, SIC (Standard Industrial Code), and size (as proxied by total assets).

I hope someone can help me to find an appropriate way to match the targets with non-targets and hence estimate the probability of a company being a target (takeover probability).

Thanks in advance for your time/help!

Yours faithfully,

Wesley

↧

Table descriptive

April 11, 2016, 8:20 am

≫ Next: About adding variables into gmmstyle and ivstyle for xtabond2 command

≪ Previous: Logit Model Sample [Targets (1 year per firm), and Non-targets (multiple years per firm)]

Hi all,

I've got a question which is probably very basic but I hope you could help me out since I've tried multiple things already.
I've created a variable which divides my sample into three groups. I want switch the rows and colums. I've tried tabpost, tab, tabstat and esto already but it doesn't end up as i want i t to end up. Some has a code how to do this?

. tab subsample, sum(lnreldv):
Summary of lnreldv
Sample Mean Std. Dev. Freq.

1 .11146348 1.4157916 9,765
2 .61473679 1.3111574 5,869
3 .35423343 1.5471736 2,493

Total .30779693 1.4201971 18,127

Thank you very much!!

↧

About adding variables into gmmstyle and ivstyle for xtabond2 command

April 11, 2016, 8:50 am

≫ Next: Exporting results of -heckoprobit- using -outreg2-

≪ Previous: Table descriptive

Dear Statalists!
My name is Hoang Luong, PhD student at the University of Greenwich, London.
I'm working on a project about the determinants of R&D expenditure using GMM.
In my model, the dependent variable is R&D intensity (R&D over sales). My explanatory variables include Price-Cost margin (for market competition) and the square of it.
I'm confusing where should I put the square term in, is it right to be in ivstyle or should it be in gmmstyle. (I tried several times and the results are only significant when I add the square term in ivstyle, but it does not really support the theory, when my dependent variable affects the PCM, it should affect the square term also).
I hope my question is clear enough, and I'm looking forward to all advice about working with variables in GMM.
Please find below the attached file with my regression results.
Thanks in advance for reading.

Hoang Luong

↧

Exporting results of -heckoprobit- using -outreg2-

April 11, 2016, 9:05 am

≫ Next: Problem with Bootstrap standard errors for marginal effects using Two Part model

≪ Previous: About adding variables into gmmstyle and ivstyle for xtabond2 command

Hello,

I am using the -heckoprobit- command and I would like to export only the final stage and selection equation separately in two different tables using -outreg2-. Unfortunately, I do not know how to export the point estimates separately and could not find information in the help files of -heckoprobit- and -outreg2- respectively.

The current code looks as follows:

Code:

xi: heckoprobit     Y       IRRE     i.FE, vce(robust)  select(offer = distance IRRE  ) 
outreg2 using "$path\RESULTS\result", replace     dec(3) label(insert)   keep(IRRE   ) sortvar(IRRE)  excel

Any suggestions? I would be also excited for suggestions using export commands other than -outreg2-.

Thanks.
Ruediger.

↧

Problem with Bootstrap standard errors for marginal effects using Two Part model

April 11, 2016, 9:09 am

≫ Next: Exporting estimates by group to Excel

≪ Previous: Exporting results of -heckoprobit- using -outreg2-

Dear statalist,

I am conducting a Two-Part model regression with the first part iv-probit and second part iv-regress. The bootstrap option of the margins command in the twopm program that is built-in cannot be used with iv-probit and iv-regress so I am trying to create a program to obtain the bootstrapped standard errors for the marginal effects. However, I am a first-time user of stata programming techniques and therefore I have been struggling for a while. The idea is quite straightforward. (references from Federico Belotti, "twopm: Two Part Models", William H. Dow "")

1. iv probit \\First part
2. predict phat, xb
3. predict pr
4. Then generate the Inverse mills ratio(IMR) = normalden(phat)/normal(phat)
5. ivregress \\Second part
6. predict lyhat
7. get e(rmse)
8. generate yhat=exp(lyhat)*exp(0.5*rmse^2)
9. generate the beta_twopartmodel (marginal effect) = (beta_ivregress+beta_ivprobit*IMR)*(normal(pr)*yha t)
10. and finally obtain bootstrapped standard errors of the beta_twopartmodel.

Here is the program that I attempted to write which does not execute:

capture program drop myboot
program myboot, rclass
version 13
preserve
bsample
ivprobit rate1 lncons lnlength tariff PPP imported ground knowledge LB AP SN AG RM (lnloss=lndensity) \\first part
scalar probloss = _b[lnloss]
predict pr
predict phat, xb
generate IMR= normalden(phat)/normal(phat)
ivregress 2sls lnrate lnlength lncons tariff PPP imported ground knowledge AG AP SN LB RM (lnloss=lndensity) \\second part
scalar bivloss=_b[lnloss]
scalar rmse = e(rmse)
predict lyhat
generate yhat = exp(lyhat)*exp(0.5*rmse^2)
generate margloss = (bivloss+probloss*IMR)*(normal(pr)*yhat) \\marginal effect
summarize margloss
return scalar bmargloss = r(mean) \\mean marginal effect
restore
end
simulate bmargloss=r(mean), reps(100) seed(12345):myboot

Reference access:
http://www.econ.uzh.ch/dam/jcr:00000...5-1.pdf#page=9
http://link.springer.com/article/10....7426320#page-1

Thank you for your assistance in advance.

↧

Exporting estimates by group to Excel

April 12, 2016, 4:34 am

≫ Next: Post-estimation issues in a discrete choice regression

≪ Previous: Problem with Bootstrap standard errors for marginal effects using Two Part model

Hello,

I am trying to export my estimations to Excel but I cannot export them by group. I can export the estimations of the panel but it doesn't work if I repeat the regression for each group. I tried the following command:

by group: ivreg2 variable (a b c=iv*), robust small first ffirst
estimates store vres
xml_tab vres, tstat append below sheet("d") stats(N N_g)
estimates drop *

Could someone help me to export the results to Excel please?

↧

Post-estimation issues in a discrete choice regression

April 12, 2016, 4:45 am

≫ Next: Repeated time values in sample r(451)

≪ Previous: Exporting estimates by group to Excel

Dear Statalists,

Thanks to those who have helped in the past. To start with, I use stata 13 version. I am currently running a discrete choice model and trying to find out the effect of some independent variables and events on transition from renting to the state of ownership for those aged 18 to 39. I am working on a panel data and the equation applicable is:

In(Tit /(1-Tit))=∂0+∂1X1+∂2X2+∂3X3+∂4X4+…+∂nXn .

Some of my respondents never attained home ownership before turning 40.

My post-estimation problems are as follows:

Firstly, I ran the predict probability and my correct predictions are lower than the incorrect ones (i.e. 327 against 373). I am guessing the reason might be as a result of model misspecification or inadequate explanatory covariates?

Secondly, I also ran the marginal effects both with and without the 'atmeans' option and the coefficients seem to be unusually large and look untrue to interpret.

Below is the list of variables and also my model result, including the marginal effects

variables	Meaning	Form	lags
FTowner	First time owner	discrete	N/A
Female	Sex as female (default is male)	discrete	N/A
Age2529	Aged 25-29 group (default is below 25)	discrete	N/A
Age3034	Aged 30-34	discrete	N/A
Age3539	Aged 35 to 39	discrete	N/A
spouse_present	Presence of spouse	discrete	0
breakpartner	left partner in previous year	discrete	0
nkids	Number of kids	continuous	0
nch04	child(ren) in household under 5 years	discrete	1
non_white	Non-white (default is white)	discrete	N/A
unemprate	Regional Unemployment rate	continuous	1
unemployed	Became unemployed in previous year	discrete	0
volincome	Volatility of income	continuous	2
qpermincome	Quantile permanent income	continuous	2
c.qpermincome#c.qpermincome	Quantile permanent income squared	continuous	2
RUCC_per_000	Regional user cost of capital per 1000	continuous	2
housgcost000	Annual housing cost per thousand	continuous	2
netrent000	Annual net rent per thousand	continuous	2

My model estimation is as follows:

xtlogit FTowner female age2529 age3034 age35plus spouse_present breakpartner nkids L.nch04 non_white L.unemprate unemployed L2.volincome L2.c.qpermincome##c.qpermincome L2.RUCC_per_000 L2.netrent000 L2.housgcost000

Random-effects logistic regression Number of obs = 6827
Group variable: pid Number of groups = 976

Random effects u_i ~ Gaussian Obs per group: min = 5
avg = 7.0
max = 7

Integration method: mvaghermite Integration points = 12

Wald chi2(18) = 141.15
Log likelihood = -1372.4452 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------------------------
FTowner | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------------------------+----------------------------------------------------------------
1.female | -.0571591 .1087532 -0.53 0.599 -.2703116 .1559933
1.age2529 | .0996402 .3176867 0.31 0.754 -.5230142 .7222946
1.age3034 | -.2073237 .3331532 -0.62 0.534 -.860292 .4456446
1.age35plus | -.4460012 .3564325 -1.25 0.211 -1.144596 .2525936
1.spouse_present | 1.095416 .145693 7.52 0.000 .8098628 1.380969
1.breakpartner | .9621194 .3469817 2.77 0.006 .2820478 1.642191
nkids | -.1917258 .063391 -3.02 0.002 -.3159698 -.0674819

nch04 |
L1. | -.5038756 .1743783 -2.89 0.004 -.8456507 -.1621004

1.non_white | -2.899464 1.242384 -2.33 0.020 -5.334493 -.4644356

unemprate |
L1. | 6.044926 2.534204 2.39 0.017 1.077978 11.01187

1.unemployed | -.2163111 .3958133 -0.55 0.585 -.9920908 .5594687

volincome |
L2. | -.0249451 .1563159 -0.16 0.873 -.3313187 .2814285

qpermincome |
L2. | .6312718 .1611851 3.92 0.000 .3153548 .9471888
--. | .4267525 .1314694 3.25 0.001 .1690772 .6844278

cL2.qpermincome#c.qpermincome | -.181337 .0449424 -4.03 0.000 -.2694224 -.0932515

RUCC_per_000 |
L2. | -.0161702 .0230305 -0.70 0.483 -.0613091 .0289687

netrent000 |
L2. | .087522 .0223654 3.91 0.000 .0436866 .1313575

housgcost000 |
L2. | -.0549711 .0372986 -1.47 0.141 -.128075 .0181328

_cons | -1.889742 1.342841 -1.41 0.159 -4.521662 .7421772
------------------------------+----------------------------------------------------------------
/lnsig2u | -14.76631 25.12055 -64.00169 34.46907
------------------------------+----------------------------------------------------------------
sigma_u | .0006216 .0078079 1.27e-14 3.05e+07
rho | 1.17e-07 2.95e-06 4.87e-29 1
-----------------------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 0.00 Prob >= chibar2 = 1.000

My post-estimation commands gave the following results:

quietly xtlogit FTowner female age2529 age3034 age35plus spouse_present breakpartner nkids L.nch04 non_white L.unemprate unemployed L2.volincome L2.c.qpermincome##c.qpermincome L2.RUCC_per_000 L2.netrent000 L2.housgcost000

//prediction test
predict prob, xb
gen pred_own =0
replace pred_own =1 if prob>=0.5
tab FTowner pred_own // recreate -estat classification-
drop prob pred_own

| pred_own
FTowner | 0 1 | Total
-----------+----------------------+----------
0 | 6,453 4,559 | 11,012
1 | 373 327 | 700
-----------+----------------------+----------
Total | 6,826 4,886 | 11,712

The above table shows that the correctly predicted (327) is less than the other (373). I am not certain if this is sufficient?

Lastly, the marginal effect below gives the same result with or without the ‘atmeans’ option

quietly xtlogit FTowner i.female i.age2529 i.age3034 i.age35plus i.spouse_present i.breakpartner nkids L.nch04 i.non_white L.unemprate i.unemployed L2.volincome L2.c.qpermincome##c.qpermincome L2.RUCC_per_000 L2.netrent000 L2.housgcost000
margins, dydx (spouse_present breakpartner nkids L.nch04 non_white L2.qpermincome L2.netrent000) atmeans

Conditional marginal effects Number of obs = 6827
Model VCE : OIM

Expression : Linear prediction, predict()
dy/dx w.r.t. : 1.spouse_present 1.breakpartner nkids L.nch04 1.non_white L2.qpermincome L2.netrent000
at :1.female = .530394 (mean)
0.age2529 = .7521605 (mean)
1.age2529 = .2478395 (mean)
0.age3034 = .7134906 (mean)
1.age3034 = .2865094 (mean)
0.age35plus = .565988 (mean)
1.age35plus = .434012 (mean)
0.spouse_p~t = .3446609 (mean)
1.spouse_p~t = .6553391 (mean)
0.breakpar~r = .9830086 (mean)
1.breakpar~r = .0169914 (mean)
nkids = .9767101 (mean)
L.nch04 = .2040428 (mean)
0.non_white = .0004394 (mean)
1.non_white = .9995606 (mean)
L.unemprate = .059908 (mean)
0.unemployed = .9756848 (mean)
1.unemployed = .0243152 (mean)
L2.volincome = .2948439 (mean)
L2.qpermin~e = 2.330892 (mean)
qpermincome = 2.871247 (mean)
L2.RUCC_~000 = 1.449172 (mean)
L2.netre~000 = .3392022 (mean)
L2.housg~000 = 1.89443 (mean)

----------------------------------------------------------------------------------
| Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
1.spouse_present | 1.095416 .145693 7.52 0.000 .8098628 1.380969
1.breakpartner | .9621194 .3469817 2.77 0.006 .2820478 1.642191
nkids | -.1917258 .063391 -3.02 0.002 -.3159698 -.0674819

nch04 |
L1. | -.5038756 .1743783 -2.89 0.004 -.8456507 -.1621004
1.non_white | -2.899464 1.242384 -2.33 0.020 -5.334493 -.4644356

qpermincome |
L2. | .1106087 .0746537 1.48 0.138 -.0357099 .2569273

netrent000 |
L2. | .087522 .0223654 3.91 0.000 .0436866 .1313575
----------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

As shown above, some mean values and coefficients are too large and look untrue, hence difficult to interpret. Apologies for the voluminous explanations. I have read similar posts but up to this point, I do not know what else I am doing wrong.

I would appreciate your help.
Kind regards,
Dammy

↧

Repeated time values in sample r(451)

April 12, 2016, 5:20 am

≫ Next: Save and manipulate results

≪ Previous: Post-estimation issues in a discrete choice regression

Dear Statalist!

I have panel data. I want to select number of lags.
I type:

tsset quocgia year
-> tsset quocgia year
panel variable: quocgia (strongly balanced)
time variable: year, 1998 to 2013
delta: 1 year

varsoc lngdp lnyt lngd
-> repeated time values in sample
r(451);

What should I do next?
Thank you!

↧

Save and manipulate results

April 12, 2016, 5:26 am

≫ Next: ologit and brant test

≪ Previous: Repeated time values in sample r(451)

I am using the command 'hoi' which I initially provides a table with the estimates for the logistic regression and then a table with the outputs of interest, which are: Coverage, Dissemilarity and Human Opportunity Index.
How am I doing this for many countries and years, I would like to save the information of the outputs (Coverage, Dissemilarity and Human Opportunity Index) in a single table and do not know how to do it.
I tried the following command:

estimates store filename

But this captured only the coefficients estimated by logistic regression (first table).
I wonder if there is to do what I want and, if so, how.
I've attached the logfile so you can check how the outputs get my command.

Grateful

↧

ologit and brant test

April 12, 2016, 5:54 am

≫ Next: Diff in Diff

≪ Previous: Save and manipulate results

Hello everyone,

I have conducted a brant test after an ordered logistic regression in order to test for the parallel regression assumption. My main independent variable is foreign_bakgrnd which does not seem to violate the assumption. However, I'm worried about the significant value of all the variables, i.e. "All" with a p<0.05. Should I consider doing mlogit instead or is ologit better in this case? I also conducted other tests which seem to point to ordered logistic regression as the best fit. However, I'm not 100% sure.

Some background information: The dependent variable ranges from 1 to 5 "very good, good, neither good nor bad, bad, very bad," to a proposal "Accept fewer refugees?". Foreign_bakgrnd is the percentage of immigrants in each municipality. There are 50 municipalities(kommun).

Ologit

Code:

ologit refugee  gender age educ income student unemp foreign_bakgrnd tax total_unemp welfare  if raised_swe==1 & mom==1 & dad==1 & f
> 80a==1 & citizen==1, cluster (kommun)

Iteration 0:   log pseudolikelihood = -3262.2904  
Iteration 1:   log pseudolikelihood = -3140.9864  
Iteration 2:   log pseudolikelihood = -3140.2161  
Iteration 3:   log pseudolikelihood = -3140.2157  
Iteration 4:   log pseudolikelihood = -3140.2157  

Ordered logistic regression                       Number of obs   =       2035
                                                  Wald chi2(10)   =     313.20
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -3140.2157                 Pseudo R2       =     0.0374

                                   (Std. Err. adjusted for 50 clusters in kommun)
---------------------------------------------------------------------------------
                |               Robust
        refugee |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
         gender |   .3089709    .070089     4.41   0.000      .171599    .4463427
            age |   .0055836   .0043592     1.28   0.200    -.0029603    .0141275
           educ |    .471975   .0498627     9.47   0.000     .3742459    .5697041
         income |   .0187043   .0176838     1.06   0.290    -.0159553    .0533639
        student |   .7294596    .185481     3.93   0.000     .3659235    1.092996
          unemp |   .2874827   .2435057     1.18   0.238    -.1897796     .764745
foreign_bakgrnd |   .0259353   .0095523     2.72   0.007     .0072131    .0446576
            tax |  -.0011078   .0067755    -0.16   0.870    -.0143875    .0121719
    total_unemp |  -.0617319   .0238923    -2.58   0.010      -.10856   -.0149038
        welfare |   .0026904   .0484414     0.06   0.956    -.0922529    .0976337
----------------+----------------------------------------------------------------
          /cut1 |   .6259649   .7343175                      -.813271    2.065201
          /cut2 |   1.721342    .724796                      .3007678    3.141916
          /cut3 |   2.781114   .7424679                      1.325904    4.236325
          /cut4 |    3.80011   .7355121                      2.358533    5.241687

Mlogit

Code:

  mlogit refugee  gender age educ income student unemp foreign_bakgrnd tax total_unemp welfare  if raised_swe==1 & mom==1 & dad==1 & f
> 80a==1 & citizen==1, cluster (kommun)

Iteration 0:   log pseudolikelihood = -3262.2904  
Iteration 1:   log pseudolikelihood = -3118.6774  
Iteration 2:   log pseudolikelihood = -3114.8747  
Iteration 3:   log pseudolikelihood = -3114.8613  
Iteration 4:   log pseudolikelihood = -3114.8613  

Multinomial logistic regression                   Number of obs   =       2035
                                                  Wald chi2(40)   =    1788.08
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -3114.8613                 Pseudo R2       =     0.0452

                                        (Std. Err. adjusted for 50 clusters in kommun)
--------------------------------------------------------------------------------------
                     |               Robust
             refugee |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
_Very_Good           |
              gender |   -.394604   .1127504    -3.50   0.000    -.6155907   -.1736173
                 age |   -.010347   .0045848    -2.26   0.024     -.019333    -.001361
                educ |  -.3645869   .0721144    -5.06   0.000    -.5059286   -.2232452
              income |  -.0522729   .0287455    -1.82   0.069     -.108613    .0040672
             student |  -.4549115   .3531302    -1.29   0.198    -1.147034    .2372111
               unemp |   .0304258   .3571199     0.09   0.932    -.6695163    .7303679
     foreign_bakgrnd |   .0016004   .0099909     0.16   0.873    -.0179815    .0211823
                 tax |   .0018852   .0105668     0.18   0.858    -.0188254    .0225957
         total_unemp |   .0154779   .0351648     0.44   0.660    -.0534439    .0843997
             welfare |   .0490317   .0599628     0.82   0.414    -.0684932    .1665565
               _cons |   1.632298   1.059698     1.54   0.123    -.4446714    3.709268
---------------------+----------------------------------------------------------------
Fairly_Good          |
              gender |  -.0450976    .121754    -0.37   0.711    -.2837311    .1935359
                 age |  -.0067947   .0046264    -1.47   0.142    -.0158623    .0022729
                educ |  -.2249964   .0519674    -4.33   0.000    -.3268507   -.1231422
              income |  -.0199047   .0257654    -0.77   0.440     -.070404    .0305945
             student |  -.2329129   .3298539    -0.71   0.480    -.8794146    .4135888
               unemp |  -.4127772   .4627189    -0.89   0.372     -1.31969    .4941352
     foreign_bakgrnd |  -.0044638   .0106793    -0.42   0.676    -.0253949    .0164672
                 tax |   .0096151   .0088208     1.09   0.276    -.0076734    .0269036
         total_unemp |    .006274   .0330595     0.19   0.849    -.0585215    .0710695
             welfare |   .0407527   .0541683     0.75   0.452    -.0654151    .1469206
               _cons |  -.0608351   .9518732    -0.06   0.949    -1.926472    1.804802
---------------------+----------------------------------------------------------------
Neither_good_nor_bad |  (base outcome)
---------------------+----------------------------------------------------------------
Fairly_bad           |
              gender |  -.0628752   .1013472    -0.62   0.535    -.2615122    .1357617
                 age |   .0040965   .0060997     0.67   0.502    -.0078586    .0160516
                educ |   .2815299    .084509     3.33   0.001     .1158953    .4471646
              income |   .0234315   .0260612     0.90   0.369    -.0276474    .0745105
             student |   .5616208   .2876051     1.95   0.051     -.002075    1.125316
               unemp |   .0937818   .3774915     0.25   0.804     -.646088    .8336516
     foreign_bakgrnd |   .0112605   .0156116     0.72   0.471    -.0193376    .0418586
                 tax |   .0125842   .0084345     1.49   0.136    -.0039471    .0291155
         total_unemp |  -.0368523   .0329919    -1.12   0.264    -.1015153    .0278107
             welfare |   .0899803   .0785352     1.15   0.252    -.0639458    .2439064
               _cons |  -2.966431   .9481258    -3.13   0.002    -4.824723   -1.108139
---------------------+----------------------------------------------------------------
Very_bad             |
              gender |   .2793758   .1481659     1.89   0.059     -.011024    .5697756
                 age |  -.0066449   .0054863    -1.21   0.226     -.017398    .0041081
                educ |   .4074613   .0708812     5.75   0.000     .2685368    .5463859
              income |  -.0283097    .033077    -0.86   0.392    -.0931394    .0365201
             student |   .5167056   .3865466     1.34   0.181    -.2409118    1.274323
               unemp |   .3544362   .4450767     0.80   0.426    -.5178981    1.226771
     foreign_bakgrnd |   .0511379   .0121621     4.20   0.000     .0273007    .0749752
                 tax |  -.0007457   .0101354    -0.07   0.941    -.0206107    .0191194
         total_unemp |  -.1032037   .0395097    -2.61   0.009    -.1806413   -.0257661
             welfare |   .0197273   .0590026     0.33   0.738    -.0959156    .1353702
               _cons |  -1.737012   1.161006    -1.50   0.135    -4.012543     .538519
--------------------------------------------------------------------------------------

Code:

 brant

Brant test of parallel regression assumption

                  |       chi2     p>chi2      df
 -----------------+------------------------------
              All |      50.52      0.011      30
 -----------------+------------------------------
           gender |       4.86      0.182       3
              age |       8.33      0.040       3
             educ |       2.06      0.561       3
           income |       5.84      0.119       3
          student |       1.57      0.666       3
            unemp |       2.25      0.522       3
  foreign_bakgrnd |       5.61      0.132       3
              tax |       1.82      0.611       3
      total_unemp |       1.29      0.732       3
          welfare |       1.13      0.769       3

A significant test statistic provides evidence that the parallel
regression assumption has been violated.

I have also used the oparallel command:

Code:

oparallel, ic

Tests of the parallel regression assumption

                 |   Chi2     df  P>Chi2
-----------------+----------------------
     Wolfe Gould |  44.44     30   0.043
           Brant |  50.52     30   0.011
           score |  47.37     30   0.023
likelihood ratio |  45.59     30   0.034
            Wald |  48.76     30   0.017



Information criteria

      |     ologit     gologit  difference
------+------------------------------------
  AIC |    6308.43     6322.84      -14.41
  BIC |    6387.09     6570.05     -182.96

I also compared the saved model (ologit) to the current model (mlogit):

Code:

. fitstat, dif force

                         |     Current        Saved   Difference
-------------------------+---------------------------------------
Log-likelihood           |                                      
                   Model |   -3114.861    -3140.216       25.354
          Intercept-only |   -3262.290    -3262.290        0.000
-------------------------+---------------------------------------
Chi-square               |                                      
    D (df=1991/2021/-30) |    6229.723     6280.431      -50.709
      Wald (df=40/10/30) |    1788.084      313.196     1474.888
                 p-value |       0.000        0.000        0.010
-------------------------+---------------------------------------
R2                       |                                      
                McFadden |       0.045        0.037        0.008
     McFadden (adjusted) |       0.032        0.033       -0.001
            Cox-Snell/ML |       0.135        0.113        0.022
  Cragg-Uhler/Nagelkerke |       0.141        0.118        0.023
                   Count |       0.309        0.290        0.019
        Count (adjusted) |       0.096        0.071        0.025
-------------------------+---------------------------------------
IC                       |                                      
                     AIC |    6317.723     6308.431        9.291
        AIC divided by N |       3.105        3.100        0.005
       BIC (df=44/14/30) |    6564.926     6387.087      177.839

Note: Likelihood-ratio test assumes saved model nested in current model.

Difference of  177.839 in BIC provides very strong support for saved model.

↧

Diff in Diff

April 12, 2016, 6:59 am

≫ Next: Save maximum of a variable as local variable?

≪ Previous: ologit and brant test

Dear Users,
I want to run a Diff N Diff regression, to try to understand if my y var has been affected by the entrance in the European Union. My dataset is a panel of 40 countries in 15 years, and looks like that:

country year y id ue due d04 did04
Albania 2000 53.6 1 0 0 0 0 0
Albania 2001 56.6 1 0 0 0 0 0
Albania 2002 56.8 1 0 0 0 0 0
Albania 2003 56.8 1 0 0 0 0 0

In generate both the dummy for the time and the one for the treatment with:

generate due=0
replace due=1 if ue<=year
gen d04 = (year>=2004) & !missing(year)

because my "treatment" is the entrance in the European Union. I generated an interaction variable with:

generate did04=d04*due

When I try to perform the diff n diff, writing

regress y d04 due did04, robust

Stata omitted because of multicollineary the interaction term, reporting this error:

note: did04 omitted because of collinearity

I use Stata 13.1. Where is my error? Thank you in advance for any help, and excuse me for any unintentional mistake I did expressing the problem.

↧

Save maximum of a variable as local variable?

April 12, 2016, 7:18 am

≫ Next: tabplot updated on SSC

≪ Previous: Diff in Diff

Dear All,
as described in the title I would like to store the maximum of a variable across all observations. For example:

Code:

clear
cd "C:\Users\jannic\Desktop\sandbox"
webuse grunfeld, clear
keep if company <8

* the command below does not work unfortnately:
* local max = maximum(invest)

now local = 1486.7 should be the result. Is there a way to achieve this?

Thanks in advance,

best,
Harvey

↧

tabplot updated on SSC

April 12, 2016, 7:56 am

≫ Next: Creating IDs for panel gravity data

≪ Previous: Save maximum of a variable as local variable?

Thanks as always to Kit Baum, the package tabplot on SSC has been
updated with new ado and help files for that program, which goes back to
1999. Stata 8 is required. tabplot is billed as supporting one-, two-
and three-way bar charts for tables, which understates its possibilities
a little, but the whole story need not be given here.

"Multiple bar charts" would be a good umbrella term, except for the need
to explain that doesn't mean stacked or divided bars and it doesn't mean
bars side by side on the same axis (and except for the puzzle that a
single bar would just get lonely, so don't all bar charts have multiple
bars?). (A single bar does not mean a "singles bar".)

The update in code fixes some awkward, indeed deficient, parsing of
calls to the by() option, which ruled out adjustment of a note() call
together with the by() option.

A bigger deal by comparison is much re-writing of the help file, with
restructured explanation of syntax, better-explained and more numerous
examples, and many more references since the last update several months
ago.

If interested, then use

Code:

 
ssc inst tabplot

to install afresh or

Code:

 
ssc inst tabplot, replace

to update an existing installation; some readers may be using

Code:

 
adoupdate

instead.

Bar charts are basic, and may seem very well supported in Stata, as only
a little acquaintance with the documentation reveals four commands,
graph bar, graph hbar, twoway bar and twoway rbar, which might seem
already three more than one might need.

Another command for bar charts (or more; I have others) thus needs a
little explanation. This one is itself just a wrapper for twoway rbar,
but it can do various plots more easily than you could do yourself,
unless you were willing to do a little programming and a lot of fiddling
around.

The main conceit of tabplot is table-like plots. The name is intended to
evoke commands like tabulate with their structured output of tables in
rows and columns.

Incidentally, I note that there is a tabplot package for R with its main
command tableplot; an old Stata command of mine called tableplot also
exists on SSC, but its main capabilities have long since been folded
into tabplot. I don't doubt that tabplot on R is good, but I've never
used it or studied its documentation closely. I am pretty sure that I
used the name first, not that I mind so long as the name remains
distinct within Stata.

Clearly the help file is there with the details you are expected to
want, so the best I can now do for anyone curious is to give a couple of
self-contained examples, together with a moderate sales pitch.

Other applications of tabplot can be found at

http://www.statalist.org/forums/foru...-and-subgraphs

http://www.statalist.org/forums/foru...something-else

http://www.statalist.org/forums/foru...d-with-grc1leg

http://www.statalist.org/forums/foru...lot-or-tabplot

http://stats.stackexchange.com/quest...inal-variables

http://stats.stackexchange.com/quest...ical-variables

Greenacre (2007, p.42; full reference below) gave these data from the
Encuesta Nacional de la Salud (Spanish National Health Survey), 1997.
They are interesting in themselves, but for my purposes they are useful
as an example large enough to be challenging. As with many tables, the
main handle for understanding is to look at the probability distribution
of the response health given the predictor age. tabplot offers options
to calculate percent or proportional/fractional breakdowns on the fly.
Aesthetic preferences or conventions often encourage presentation in
terms of percents. ("Percentage" seems to me too long a word, whatever
dictionaries may say.)

Code:

 
clear
input byte(agegroup health) long freq
1 1 243
1 2 789
1 3 167
1 4 18
1 5 6
2 1 220
2 2 809
2 3 164
2 4 35
2 5 6
3 1 147
3 2 658
3 3 181
3 4 41
3 5 8
4 1 90
4 2 469
4 3 236
4 4 50
4 5 16
5 1 53
5 2 414
5 3 306
5 4 106
5 5 30
6 1 44
6 2 267
6 3 284
6 4 98
6 5 20
7 1 20
7 2 136
7 3 157
7 4 66
7 5 17
end
label values agegroup agegroup
label def agegroup 1 "16-24", modify
label def agegroup 2 "25-34", modify
label def agegroup 3 "35-44", modify
label def agegroup 4 "45-54", modify
label def agegroup 5 "55-64", modify
label def agegroup 6 "65-74", modify
label def agegroup 7 "75+", modify
label values health health
label def health 1 "very good", modify
label def health 2 "good", modify
label def health 3 "regular", modify
label def health 4 "bad", modify
label def health 5 "very bad", modify

tabplot health agegroup [w=freq] , percent(agegroup) showval subtitle(% of age group) xtitle("") bfcolor(none)

Array

What particularly bites here are some very small percents, which are
perfectly credible and not at all unusual for such data. A merit of the
multiple bar charts design is that small values are discernible as such.
Note especially the showval option, which insists on showing values too.

The graph thus deliberately uses table ideas and graph ideas together.
Sometimes people say to me, "But you shouldn't do that!" and some
prohibition emerges that graphs are graphs and tables and tables, and
ne'er the twain shall meet, which seems to me no more than superstition.

Digression. An intriguing suggestion, which I have borrowed elsewhere,
is that the conventional distinction between graphs and tables was a
side-effect of the development of printing. Before printing there were
manuscripts -- those scripted manually, or written by hand -- to which
writers could add illustrations, say of knights, or dragons, or of
sinners being tormented, or something equally entertaining, as they
liked and where they liked. Printed documents encouraged, or even
enforced, a division of labour between typesetters and those who
prepared illustrations. But now that's obsolete.

A detailed objection to numeric values too is that they clutter up the
graph, to which the answers are it depends on how you do it, and if
you strongly object it's not compulsory. But tabplot gives up on
labelling axes with bar magnitudes, so that reduces clutter too.

Given this dataset, how else would you represent the patterns
graphically? Setting aside any temptation to draw multiple pie charts,
one alternative is a stacked bar chart:

Code:

 
* ssc inst catplot needed before 
catplot health agegroup [w=freq], percent(agegroup) asyvars stack subtitle(% of age group)

In recent Stata versions, graph hbar could also do this directly, but the syntax
differs.

Array

I have not tried to hard to optimise this: the colour scheme and legend both need work,
and so forth. Some would prefer vertical bars here.

The key point is whether it could be made better (clearer, more effective,
more attractive) than the previous graph. I note three key issues:

1. Stacking is a well-understood design but very small amounts are hard to work
to discern.

2. A legend necessarily springs into being, but a legend obliges mental "back
and forth" from readers (or else readers give up on looking at the detail).

3. The program would let you add numeric values on top of the bars, but that would
be at least a little messy.

Naturally this is a straw graph that I set up to knock down again, but are there good
alternatives? I've had better results with unstacked bars for this example, but I
will move on.

Let's look at graphs for a three-way table.

Aitkin et al. (1989, p.242; full reference below) reported data from a
survey of student opinion on the Vietnam War taken at the University of
North Carolina in Chapel Hill in May 1967. Students were classified by
sex, year of study, and the policy they supported, given choices of

A. The United States should defeat the power of North Vietnam by
widespread bombing of its industries, ports, and harbors and by land
invasion.

B. The United States should follow the present policy in Vietnam.

C. The United States should de-escalate its military activity, stop
bombing North Vietnam, and intensify its efforts to begin negotiation.

D. The United States should withdraw its military forces from Vietnam
immediately.

The labels A ... D are fairly dopey, but even at this distance
suggesting better ones might be thought contentious politically, so I
will desist.

Code:

 
clear
input str6 sex str8 year str1 policy int freq
"male" "1" "A" 175
"male" "1" "B" 116
"male" "1" "C" 131
"male" "1" "D" 17
"male" "2" "A" 160
"male" "2" "B" 126
"male" "2" "C" 135
"male" "2" "D" 21
"male" "3" "A" 132
"male" "3" "B" 120
"male" "3" "C" 154
"male" "3" "D" 29
"male" "4" "A" 145
"male" "4" "B" 95
"male" "4" "C" 185
"male" "4" "D" 44
"male" "Graduate" "A" 118
"male" "Graduate" "B" 176
"male" "Graduate" "C" 345
"male" "Graduate" "D" 141
"female" "1" "A" 13
"female" "1" "B" 19
"female" "1" "C" 40
"female" "1" "D" 5
"female" "2" "A" 5
"female" "2" "B" 9
"female" "2" "C" 33
"female" "2" "D" 3
"female" "3" "A" 22
"female" "3" "B" 29
"female" "3" "C" 110
"female" "3" "D" 6
"female" "4" "A" 12
"female" "4" "B" 21
"female" "4" "C" 58
"female" "4" "D" 10
"female" "Graduate" "A" 19
"female" "Graduate" "B" 27
"female" "Graduate" "C" 128
"female" "Graduate" "D" 13
end

tabplot policy year [w=freq], by(sex, subtitle(% by sex and year, place(w)) note("")) percent(sex year) showval

Array
The way to plot three-way tables is unsurprisingly by using a by() option to repeat two-way tables.
The syntax for tabplot matches standard conventions such that (as in regress and scatter, for
example) it is usually best to mention the response or outcome variable first (as defining rows of
the plot, and as to be shown on the y axis). There can be trade-offs or compromises,
as no layout is best for all purposes, but big differences can safely be put at a distance (so
males and females here differ markedly in their mix of views), while finer distinctions are
easier to make if bars are close. On top of all that, any ordinal scales should naturally be
respected as such.

Aitkin, M., D. Anderson, B. Francis, and J. Hinde. 1989. Statistical
Modelling in GLIM. Oxford: Oxford University Press

Greenacre, M. 2007. Correspondence analysis in practice. Boca Raton, FL:
Chapman & Hall/CRC

↧

Creating IDs for panel gravity data

April 12, 2016, 7:58 am

≫ Next: teffects psmatch not converging in large sample

≪ Previous: tabplot updated on SSC

I am estimating the gravity model for India's exports of certain goods. I want to know commands to use to create panel id for country pair and year and merging with other data such as GDP

↧

teffects psmatch not converging in large sample

April 12, 2016, 8:04 am

≫ Next: Question About Behavior of two Parameters (Share of people working in Manufacture sector and Public sector)

≪ Previous: Creating IDs for panel gravity data

Hello,
I'm trying to estimate ATT using teffects psmatch in a large sample (approx. 1.9 million observations--1.1M treatment and 0.8M controls), but the program cannot reach a solution; even after waiting 24 hours. I do not get any error messages. However, when I run my code on a smaller (N=1K) sample, I am able to quickly obtain a solution. Can someone tell me if there's a limit on the sample size that teffects psmatch is able to handle? Any ideas as to why this might be happening? Can someone suggest any solutions?

Here is the code I’m trying to run:
teffects psmatch (readmit) (observation covariate_1… covariate_k), atet nneighbor(1) caliper(0.116425)

Thanks so much for any assistance you can provide,
Kafuti

↧

Question About Behavior of two Parameters (Share of people working in Manufacture sector and Public sector)

April 12, 2016, 8:18 am

≫ Next: how to define a date and time as a timeseries?

≪ Previous: teffects psmatch not converging in large sample

Hello Everyone,
Your cooperation is of great value to me. I am doing my master thesis, where I am analyzing the commuting behavior of male and female in Sweden for three years 2000, 2007 and 2014. I am using Gravity model where

Model : 1
MaleCij = size of origin + size of destination- travel time+ (Control variables)

Size of origin (people employed in origin place or night population employed)

Size of destination (people employed in destination or day population employed)

Control variables include

Housing prices in destination place
Wages in destination place
Share of people with higher education in origin place
Share of people working in manufacture sector in destination place

Model. 2

FemaleCij = size of origin + size of destination- travel time+ (Control variables)

(Control variables)
All are same except I adding here share of people working in public sector instead of manufacture

Main Problem:

In case of Public sector and female commuting:
I am getting negative and significant results for public sector, while it is supposed to be positive. As literature says that women work more in public sector. I have checked for correlation also. When I only enter public sector alone or with other control variable, still I gives negative results.

In case of manufacture sector and male commuting:
I get negative results and insignificant when I enter it as the only control variable. But when I enter it with housing prices, then I get the positive and significant results. But in 2007 the wages become negative.

But for public sector I could not solve it either way. Kindly can anyone guide me what I am missing or doing wrong Your cooperation is of great value to me.

the codes for regression in stata I used are,

reg ln_mCij ln_Oi ln_Dj tij share- higher education share of Manufacture sector if ZeroM==0 & year==2000, robust
reg ln_wCij ln_Oi ln_Dj tij share- higher education share of Public sectorr if ZeroW==0 & year==2000, robust

↧