GLM and GSEM syntax

November 2, 2015, 11:00 am

≫ Next: Variance Covariance Matrix from a N*2 Matrix

≪ Previous: Question on Difference-in-Difference estimation

Hi

Is there a difference between GSEM and GLM family/link syntax? I want to estimate a set of fractional response models. Separately, GLM can estimate each equation, so I assumed that GSEM would fit the same model if the same family-link notation was used. However, results are completely different as shown in the example below.

Any though on how to write this fractional logit under GSEM notation would be highly appreciated.

All the best,

Paul

-------------
Example:

use http://www.ats.ucla.edu/stat/stata/faq/proportion, clear
gsem (meals <- yr_rnd parented api99, link(logit) family(binomial) ), vce(robust) nolog
predict v1
glm meals yr_rnd parented api99, link(logit) family(binomial) vce(robust) nolog
predict v2
gen dif=v1-v2
sum dif

--------------
The output is...

. gsem (meals <- yr_rnd parented api99, family(binomial) link(logit) ), vce(robust) nolog

Generalized structural equation model Number of obs = 4257
Log pseudolikelihood = -144.80954

------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
meals <- |
yr_rnd | -.806549 1.043966 -0.77 0.440 -2.852684 1.239586
parented | -2.741766 .5014297 -5.47 0.000 -3.72455 -1.758981
api99 | -.0140403 .0051991 -2.70 0.007 -.0242304 -.0038501
_cons | 26.72637 4.044309 6.61 0.000 18.79967 34.65307
------------------------------------------------------------------------------

.
. glm meals yr_rnd parented api99, link(logit) family(binomial) vce(robust) nolog
note: meals has noninteger values

Generalized linear models No. of obs = 4257
Optimization : ML Residual df = 4253
Scale parameter = 1
Deviance = 395.8141242 (1/df) Deviance = .093067
Pearson = 374.7025759 (1/df) Pearson = .0881031

Variance function: V(u) = u*(1-u/1) [Binomial]
Link function : g(u) = ln(u/(1-u)) [Logit]

AIC = .7220973
Log pseudolikelihood = -1532.984106 BIC = -35143.61

------------------------------------------------------------------------------
| Robust
meals | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yr_rnd | .0482527 .0321714 1.50 0.134 -.0148021 .1113074
parented | -.7662598 .0390715 -19.61 0.000 -.8428386 -.6896811
api99 | -.0073046 .0002156 -33.89 0.000 -.0077271 -.0068821
_cons | 6.75343 .0896767 75.31 0.000 6.577667 6.929193
------------------------------------------------------------------------------

. sum dif

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
dif | 4257 .4718934 .2702031 .0229145 .8948619

↧

Variance Covariance Matrix from a N*2 Matrix

November 2, 2015, 11:57 am

≫ Next: Creating time-series dummies in stata

≪ Previous: GLM and GSEM syntax

Dear all,

I am writing to ask something about generating a simple variance covariance matrix from a Nx2 matrix. My codes are as follows:

set obs 50
scalar beta0 = 3
scalar beta1 = 0.5
scalar sigma2 = 9
scalar k = 500
local i = 1
while `i' <= k {
  gen epsilons_`i' = rnormal(0, sqrt(sigma2))
  gen yvals_`i' = beta0 + beta1*xvals + epsilons_`i'
  reg yvals_`i' xvals
  matrix beta = (nullmat(beta) \ e(b))
  drop epsilons_`i' yvals_`i'
  local i = `i' + 1
}

After I get the matrix beta (a matrix with 500 repeatedly estimated coefficient matrix), I don't know how to calculate the variance covariance matrix from the matrix. From what I know in R, it's simply -var(beta)-, but I do not know how to make it work in Stata.

I look forward to hearing from you! Thank you very much in advance!

Best,
Long

↧

Creating time-series dummies in stata

November 2, 2015, 12:15 pm

≫ Next: how to do propensity score matching by using estimated probability?

≪ Previous: Variance Covariance Matrix from a N*2 Matrix

Hello.

I`m currently working with a data sample from 2012 to 2015, with year and companyid.

Xtset pcid year

I want to create year-dummies for 2012 and 2013, but when I do create them I get all other observations as missing values when not in year 2014 and 2015.

This is the commandsI have used

generate YEAR2012 = .
replace YEAR2012 = 0 if year==2010
replace YEAR2012 = 0 if year==2011
replace YEAR2012 = 0 if year==2013
replace YEAR2012 = 1 if year==2012

Can anyone help med proceed? I know the power of -Fvvarlist- , but I want to go with year-dummies this time.

↧

how to do propensity score matching by using estimated probability?

November 2, 2015, 12:44 pm

≫ Next: Problem with panel data 2sls with first stage dummy variable

≪ Previous: Creating time-series dummies in stata

Dear all,

I have used the random-effects probit model to estimate the propensity score for unbalanced firm-level dataset and I would like to use this estimated propensity score to do the matching and calculate treatment effects.

Nevertheless, it seems that the stata code 'psmatch2' and 'teffects' have incorporated the process of estimating propensity score into the package for pure (pooled) probit or logit models. In other words, I can not use random-effects probit model in either 'psmatch2' or 'teffects' .

Therefore, I firstly use 'xtprobit' code to estimate the random-effects probit model and calculated the score, but I have no idea how to use it directly in 'psmatch2' or 'teffects' .

anyone knows it?

Many thanks

Huan Gao

↧

Problem with panel data 2sls with first stage dummy variable

November 2, 2015, 12:46 pm

≫ Next: Centering at the mean

≪ Previous: how to do propensity score matching by using estimated probability?

Hello everyone,

I'd like to do a 2SLS estimation of panel data (instrumental variable) where the first-stage regression is Probit.

In the first stage, I am estimating Y=b0+ b1*X1+X2, where

Y is a dummy variable of 0,1
X1 is a dummy variable of 0,1
X2 is a set of additional control variables

In second stage, I am estimating Z=bo+b1*Y+X2, where

Z is the dependent variable
Y is the predicted value from first stage
X2 is a set of additional control variables

I want my results in fixed effect

I tried using Probit but I got this message: 0 predicts success perfectly; and did not give me any result.

I want to follow the the two-step procedure described by Wooldridge but I am not getting an output. http://www.stata.com/statalist/archi.../msg00339.html

Please I need assistance on what I need to do.

Many thanks in advance for your response and best regards.
Daniel

↧

Centering at the mean

November 2, 2015, 1:19 pm

≫ Next: Generating bins for hospitalization data

≪ Previous: Problem with panel data 2sls with first stage dummy variable

Dear all,
My question is related to centering at the grand mean. In one of the papers I use as a guideline I found the following:

'We follow Cohen et al.’s (2003) recommendations to center the industrylevel variable (herfindahl-index) at the grand mean and also center the firm-level variables (cash/assets) by the industry mean when testing their interaction effect (Martin, Cullen, and Parboteeah, 2007)'.

Does this mean that I have to replace for example Cash/Assets by (Cash/Assets - Average of Cash/Assets)?
Could someone explain what the statistical reasoning is behind this?

Kind regards,
Emiel Brak

↧

Generating bins for hospitalization data

November 2, 2015, 1:26 pm

≫ Next: Decile for a group

≪ Previous: Centering at the mean

I have a cross-sectional dataset that looks at the effects of health shocks over time (based on respondents recall). I'm trying to look at whether there is a decline in the severity of health shocks (measured by # days hospitalized) after exposure to some program. I have a variable for year of health shock (varies between 2005-2010), and then variables for the # of days hospitalized in 2005, 2006, up to 2012. Therefore, it's possible for those who experienced a shock in 2005 to have 7 data points for number of days hospitalized in each year after.

Given that the year of health shock varies across individuals, I'm not sure the best way to generate variables into separate bins corresponding to total days hospitalized 3 years after shock, 4 years after, etc. Any advice would be appreciated and let me know if more detail is needed.

↧

Decile for a group

November 2, 2015, 1:45 pm

≫ Next: Regress differences in dummy variables and receive error

≪ Previous: Generating bins for hospitalization data

Hey everybody,

I’m really sad because I have a big problem with my master thesis and I don’t know how to handle it.
Hopefully, you can help me.

I have a dataset with fund returns over a specific time period. Every fund belongs to a segment.
Now, I have to calculate the percentiles for the returns for each segment and time point if the fund is bigger than 1 Mio.$.

As an example:
Segment 1 and time point 1: I have 26 returns and then I have to create a variable that shows me to which decile every return belongs.

Later, I have to compare how the funds below or above the median act…

I found the command:

bysort segment time: egen perc = pctile(ret) if tna>1000, p(10)

But I generate just missing observation...

Best wishes
Sebastian

↧

Regress differences in dummy variables and receive error

November 2, 2015, 1:56 pm

≫ Next: Create a dummy variable based upon 2 conditions

≪ Previous: Decile for a group

Hello,

I generated dummy variables for a dataset on firms that contains two time periods: 2008 & 2014. The response "yes" gets a value of 1 and the response "no" gets a value of 2.

I then generated differences to see if the firms changed their response to the same question from 2008 to 2014.

For example, in one instance:

g taxadm_ref = taxadm if year==2008
bysort idquest (taxadm_ref): replace taxadm_ref = taxadm[1]
g diff_taxadm = taxadm - taxadm_ref
replace diff_taxadm=. if year==2008
label var diff_taxadm "Diff. in Dummy: is tax administration an obstacle? yes=1"

I ran regress with the dependent variable being sales growth, and I obtained the error message: "no observations"

Is there a step I'm missing when creating differences for dummy variables and using them in regressions?
The difference values are between -1 and 1.

↧

Create a dummy variable based upon 2 conditions

November 2, 2015, 4:04 pm

≫ Next: Legend option in marginsplot: change order of the keys and keep the same symbol

≪ Previous: Regress differences in dummy variables and receive error

I am using a world panel data, that consists of bilateral trade flows among countries of the world.

I want to create a dummy variable for European countries.

So for exporter, if the importer is France and exporter is Spain, the dummy will take the value of 1.

Any idea on how I can apply that?

↧

Legend option in marginsplot: change order of the keys and keep the same symbol

November 2, 2015, 6:28 pm

≫ Next: help in coding. Thanks in advance

≪ Previous: Create a dummy variable based upon 2 conditions

Hi,
I am wondering why the symbol changes when I change the order of the keys:

sysuse auto, clear
reg price c.mpg##i.rep78
margins, at(mpg=(15(5)25) rep78=(1 2 3 4 5))
marginsplot, legend(col(1))

marginsplot, legend(col(1) order(5 "A" 4 "B" 3 "C" 2 "D" 1 "E"))

Is there a way to change the order of the keys without changing the symbol?
Thanks in advance for your help

↧

help in coding. Thanks in advance

November 2, 2015, 7:23 pm

≫ Next: Urgent help needed to calculate daily price change with blank cells in the price series

≪ Previous: Legend option in marginsplot: change order of the keys and keep the same symbol

Hi everybody,

So I have a patient list that contains a unique ID variable for each patient called "visitlink"
I appended another database that has the same "visitlink" variable but contains emergency department
visits information and have generated a new variable to tell me that this observation is for an emergency department
visit and I called it "ED_visit".

Each ED visit has another variable called "days_to_admission" that tells you how many days before the hospital admission was the emergency department visit. (eg -35, -211)

for example,

Visitlink hospital admission ED_visit days_to_admission
123 0 1 -54
123 1 0 0

I have two questions:
First, how can I add a variable that could tell me how many times that patient was admitted in the prior 180 days ?
gen ED_count=0
replace ED_count = sum of ED_visit & days_to_admission>-180 ?????
the result should be 2 if the patient for example had 2 prior admissions in the past 180 days

the second question is I need to generate another variable to do the following
enter the number of visits or 4 (whichever is smaller) for example if the patient had 5 prior visits to the ER the variable should say 4 and if the patient had 2 visits then it should give 2

gen ED_score=0
replace ED_score= ED_count but max is 4 ??????

I really appreciate all the help and assistance

thanks in advance

↧

Urgent help needed to calculate daily price change with blank cells in the price series

November 2, 2015, 7:44 pm

≫ Next: Saving margin estimates in bootstrap program

≪ Previous: help in coding. Thanks in advance

I would like to know what is command to calculate daily price change of a stock when the stock price series has blank cells in it.
I could not use the command: change=price[_n]-price[_n-1] when previous daily price is blank.
So, how to tell Stata to calculate daily price change between two days that have price data available.
Thanks in advance for your help.

↧

Saving margin estimates in bootstrap program

November 3, 2015, 11:03 am

≫ Next: Use -inlist- with local list

≪ Previous: Urgent help needed to calculate daily price change with blank cells in the price series

Hello,

I am attempting to write a bootstrap program that will save the marginal estimates of a multinomial logisitic regression. I would ultimately like my program to create a dataset that contains just the margin estimates obtained from each iteration of my bootstrap program. My current program will save the coefficient estimates for each iteration, but not the margin estimates. Any suggestions on how to achieve that would be helpful.

I am using a complex survey dataset, the Medical Expenditure Panel Survey, in stata 13.1. My outcome is a four-level categorical variable and my exposure variables are year and income level. The (simplified) code I have written is below:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program savemargins, rclass
svyset varpsu [pweight=perwt], strata(varstr) psu(varpsu)
svy, subpop(subpop): mlogit outcome i.income##c.year
margins, at [specified levels] predict(outcome(1))
matrix list r(b)
end

bootstrap, saving(margins) reps(200): savemargins
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Thank you for your help.

Best,
Doug

↧

Use -inlist- with local list

November 3, 2015, 11:12 am

≫ Next: Links to FAQ sections: Headers disappear

≪ Previous: Saving margin estimates in bootstrap program

I would like to check whether a string exists in a local. I have:

local X "1" "2" "3" ...

And would like

inlist("2", "`X'")

to return 1 (true).

Could someone offer advice on how to do this properly? Thanks.

Note:

Originally had:

local X "1" "2" "3" ...
local x "1"
inlist("`x'","`X'")

↧

Links to FAQ sections: Headers disappear

November 3, 2015, 11:13 am

≫ Next: Exploring funding allocation

≪ Previous: Use -inlist- with local list

Please see the excerpt below from a post in the General forum.

See also http://www.statalist.org/forums/help#basic

When I click on the link to the FAQ, the header "4. Can I post an elementary question?" is missing from the page. The same happens with other links to the FAQ:

I get the same result in Chrome, Firefox and Internet Explorer.

Is this a feature of the forum software or a bug?

↧

Exploring funding allocation

November 3, 2015, 11:28 am

≫ Next: Euler Investment Model

≪ Previous: Links to FAQ sections: Headers disappear

Dear Stata users,
I wonder if you could help me in the following.
I am trying to investigate the determinants of annual government funding allocation in 27 geographical regions of a country for thirteen years. The explanatory variables include geographical poverty rate, unemployment rate, educational level of residents, political orientation of each region’s elected governors, voter turnout ratio for each region etc.
Which approach would you consider most appropriate for constructing the outcome variable (funding allocation)? And which method of analysis would you suggest?
Any suggestions you might have would be more than useful.
Thank you in advance,
Magda

↧

Euler Investment Model

November 3, 2015, 11:51 am

≫ Next: Confidence Interval for lognormal hurdle and lognormal selection model

≪ Previous: Exploring funding allocation

Hello Satalist,

I am working on an Euler investment model with firm investment as dependent variable and cash flow to capital stock, sales to capital stock, the squared value of the dependent variable, and debt to capital stock as independent variables. and for the estimation, I am using system GMM. can anyone please tell me which of the independent variables in the euler model can be treated as predetermined and which of them is endogenous?

Many thanks in advance
Ahmad Alsaraireh

↧

Confidence Interval for lognormal hurdle and lognormal selection model

November 3, 2015, 12:38 pm

≫ Next: How to transform a file containing time intervals in a reshaped long form?

≪ Previous: Euler Investment Model

Dear STATA users,
I am estimating Heckman Selection model for lognormally distributed dependent variable (lnY). I am following Wooldridge (2010), pp694 and Cameron and Trivedi (2009), pp 541. My problem is that I am not finding appropriate method for estimating standard error and confidence interval for mean(Yhat) and median(Yhat) where Yhat is the predicted values of Y. I found some methods described in Parkin et al.(1990) [Parkin et al.1990.Calculating Confidence Intervals for the Mean of a Lognormally Distributed Variable.Soil Sci. Soc. Am. J. 54:321-326] and in Cameron (1991) [Cameron, TA.1991.Interval estimates of non-market resource value for referendum contingent valuation surveys. Land Economics (67). However, these methods have been explained for different types of regression.
My question is if the methods explained above for calculating confidence interval of mean and median works equally for lognormal selection and hurdle model too? If it does not then is there any other method and stata code available? Your suggestion will be greatly appreciated.

-Dadhi Adhikari
Ph.D. Candidate
Department of Economics
University of New Mexico

↧

How to transform a file containing time intervals in a reshaped long form?

November 3, 2015, 1:49 pm

≫ Next: exporting a big dta file to excel

≪ Previous: Confidence Interval for lognormal hurdle and lognormal selection model

Hi everybody

I have a database in this form

Country	StYear	EndYear
USA	2000	2002
FRA	2000	2000

And I should transform it in this

Country	Year
USA	2000
USA	2001
USA	2002
FRA	2000

Or better again in this one:

Country	Year	Y_N
USA	2000	1
USA	2001	1
USA	2002	1
FRA	2000	1
FRA	2001	0
FRA	2002	0

Any suggestions?

Thanks for your help

↧