Using -predictnl- to get marginal effects after Heckman

November 19, 2015, 12:44 pm

≫ Next: Regression on groups of variables

≪ Previous: Renaming variables using loop

Dear all,

After fitting my model using Heckman (MLE), I want to use -predictnl- to calculate the marginal effects for categorical variables, which appear in both selection and outcome equations.

I use the formula suggested by Hoffmann and Kassouf (2005), the marginal effect of the regressors on yi in the observed sample consists of two components. There is the direct effect on the mean of yi , which is β. In addition, for a particular independent variable, if it appears in the probability that z∗ i is positive, then it will influence yi through its presence in λi (i.e. the inverse mills' ratio)(Greene(2012, p.875) gives the formula for continuous variables).

For the 2th level of the categorical variable "household_income_R" that appear in both selection and outcome equations.

Its marginal effect is calculated as:

ME_2.household_income_R = beta_k + beta_lambda*[(normalden(xbprobit_2.household_income_R)/normal(xbprobit_2.household_income_R))-(normalden(xbprobit_base)/normal(xbprobit_base))]

beta_k--coefficient on 2.household_income_R in the outcome equation;
beta_lambda---coefficient on IMR from the outcome equation, MLE estimation, need to retrieve it, bc MLE doesn't estimate lambda directly.
xbprobit_2.household_income_R ---- predicted linear prediction (not probability) for 2th level of the categorical variable: household_income_R.

I know how to calculate predicted linear prediction (not probability) for each level of a categorical variable by using -margins-, as follows:

*xbsel linear prediction for selection equation

Code:

margins household_income_R, atmeans predict(xbsel) noestimcheck

Adjusted predictions                              Number of obs   =       5352
Model VCE    : Robust

Expression   : Linear prediction of prob_buy_OG, predict(xbsel)
at           : LnAnnualPo~G    =   -.8428143 (mean)
...
------------------------------------------------------------------------------------
                   |            Delta-method
                   |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
household_income_R |
         <$24,999  |  -1.755006   .0195863   -89.60   0.000    -1.793395   -1.716618
  $25,000-$49,999  |   -1.68961   .0124251  -135.98   0.000    -1.713963   -1.665257
  $50,000-$69,999  |  -1.638268   .0146462  -111.86   0.000    -1.666974   -1.609562
        >=$70,000  |  -1.602625   .0110286  -145.32   0.000    -1.624241   -1.581009
------------------------------------------------------------------------------------

As shown above, predicted linear prediction (not probability) for each level of a categorical variable is only part of the calculation.
I don't think I can use -predictnl- to incorporate estimates from -margins-.

Does anyone know how to use -predictnl- or some other commands to calculate those marginal effects?

Thanks in advance.

Xiaojin

Hoffmann, R., & Kassouf, A. L. (2005). Deriving conditional and unconditional marginal effects in log earnings equations estimated by Heckman's procedure. Applied Economics, 37(11), 1303-1311.

↧

Regression on groups of variables

November 19, 2015, 2:02 pm

≫ Next: Kalman Filter estimation with lags of state variables in observation equation

≪ Previous: Using -predictnl- to get marginal effects after Heckman

I am running some exploratory data analyses where I have several groups of variables which I would like to add them to the regressions in different combinations. For instance,
model 1: xtreg performance group1 group2,
model 2: xtreg performance group1 group3.
......
I was trying to define them with "local", however, it does not work. Would appreciate if anyone has some suggestions or share the experience in the same situation. Thanks!

↧

Kalman Filter estimation with lags of state variables in observation equation

November 19, 2015, 2:22 pm

≫ Next: How do I count up the total number of households?

≪ Previous: Regression on groups of variables

Dear Stata Users:

I am trying to estimate potential output using a backward looking Phillips curve. However as lags of the output gap (which include lags of the state unobservable variable potential output, which is part of the estimation procedure) enter this observation equation, stata returns an error warning of the presence of lagged state variables (not allowed under sspace syntax).

I can trick stata by defining an additional state equation with an auxiliary variable equal to lagged potential output and including a constraint setting its coefficient equal to 1, and then using this auxiliary variable in my observation equation (Phillips curve), without getting an error message.

However I am concerned that by doing this I might be overparametrizing the model, by forcing Stata to estimate and additional state equation of a variable that I already know, which might change the estimation. Let me know your ideas.

constraint 1 [lgdp]yp = 1 //coefficient of potential output in equation decomposing observed output in cyclical and long run term
constraint 2 [lgdp]yc = 1 // coefficient of cyclical component in equation decomposing observed output in cyclical and long run term
constraint 3 [yp]l.yp = 1 // eq 4: coefficient of y*(t-1) constraint in potential output law of motion modeled as a random walk
constraint 4 [yp]l.g = 1 // eq 4: coefficient of g(t-1) constraint on coefficient on growth rate of potential output, in law of motion of potential output
constraint 5 [g]l.g = 1 // eq 5: coefficient of g(t-1) constraint on coefficient on growth rate of potential output, in its law of motion

constraint 6 [var(yc)]_cons = 1600*[var(g)]_cons // set ratio of variances lambda HP filter = 1600
constraint 7 [ypL]l.yp = 1 // ypL is potential ouput in t-1. In order to cheat Stata because does not allow lags in measurement equations
constraint 8 [inflagap]ypL = -[inflagap]l.lgdp // eq 7: set same coef.*(-1) for y*(t-1) and y(t-1)

sspace (yp L.yp L.g, state noconstant noerror) /// eq. 4
(g L.g , state noconstant) /// eq. 5
(yc, state noconstant) /// eq. 6
(ypL l.yp, state noconstant noerror) /// auxiliry eq. to cheat stata: creates y*(t-1)
(inflagap l.inflagap ypL l.lgdp m2_growth tcn_growth, noconstant) /// eq. 7
(lgdp yp yc, noconstant noerror), /// eq. 3
constraints(1 2 3 4 5 6 7 8) difficult /// constraints on coefficients

↧

How do I count up the total number of households?

November 19, 2015, 2:57 pm

≫ Next: Should all interaction variables be getting omitted from my linear regression?

≪ Previous: Kalman Filter estimation with lags of state variables in observation equation

Hi,

I am working on a panel, and I want to know how many households were surveyed in a specific year. When I tabulate the number of households, I get the full list for that year, but they are double counted or show up twice. They show up twice because it means that more than one individual took the survey from that household. Does anyone know if their is a code that I can put in Stata where I can count how many households their actually were while avoiding the double counting?

Thanks!

↧

Should all interaction variables be getting omitted from my linear regression?

November 19, 2015, 9:09 pm

≫ Next: Does margins control for the effect of variables observed after estimating an "areg" regression command.

≪ Previous: How do I count up the total number of households?

I am working on research, and I want to create an interaction for education level and gender (I am using stata by the way). My education level is split up into 3 dummy variables: Below a High School Education, a High School education, and Above a High School education. My categories are split up into three dummy variables, and I multiply each one by male. When I included all of these variables in my linear OLS regression, one of my education levels got omitted (which makes complete sense because of collinearity), but the interaction term that went with it was also omitted due to "collinearity". My question is, does this also make sense??? I think it makes sense, but I'm told that this should not be happening.

↧

Does margins control for the effect of variables observed after estimating an "areg" regression command.

November 20, 2015, 5:23 am

≫ Next: Noisily displaying commands in ado-files

≪ Previous: Should all interaction variables be getting omitted from my linear regression?

Hello,

Lets say that I am estimating a LSDV model with the areg option due to the presence of a large number of dummies (representing fixed effects).

xi: areg y x1 x2 x3 x4##x5, absorb(fe_dummy)

where
x4 and x5 are categorical variables interacted with each other to estimate its effects.
fe_dummy = dummy variable for the fixed effects.

After the estimation, I use the "margins" command to estimate the average predicted values of my variables of interest which x4, x5 and interacted x4*x5

margins x4##x5

My question is: Will the "margins" command control for the fe_dummy while estimating the average predicted values of x4, x5 and interacted x4*x5

Thanks
shreekanth mahendiran

↧

Noisily displaying commands in ado-files

November 20, 2015, 5:30 am

≫ Next: -egen- using different criteria

≪ Previous: Does margins control for the effect of variables observed after estimating an "areg" regression command.

I'm writing an ado-file which generates variables based on user-input.
It's all working fine but I wanted the command I used to be visible.

For instance:

Code:

gen bcgv = .
capture confirm variable h2
if _rc == 0 {
replace bcgv = 0 if inrange(agemo,`vacref') `isalive' `lastborn'
recode bcgv (0=1) if inrange(h2,1,3)
}

I want to display in the console the replace and recode command lines.
I managed to surround them by noisily { }, execute the ado file as "noisily adoname", and even adding noisily before each command line.
But nothing seemed to work.

Is there any solution to this approach?

Thanks in advance

↧

-egen- using different criteria

November 20, 2015, 6:44 am

≫ Next: graphs: options

≪ Previous: Noisily displaying commands in ado-files

Dear Statalist,

I have the following issue: I do not know how to create a Panel ID using the egen command and further criteria. To give you an idea of what the data I am working with looks like, here is an example:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input year age education previousbirths match
 9 17 12 0 1
 9 20 12 0 1
 9 22 14 0 1
 9 25 14 0 1
 9 26 14 0 1
 9 22 11 1 1
 9 26 13 1 1
 9 27 16 1 1
 9 28 13 1 1
 9 28 14 1 1
 9 40 16 1 1
 9 21 15 2 1
 9 26 10 2 1
 9 28 13 2 1
 9 27 13 3 1
10 19 13 0 1
10 21 12 0 1
10 24 12 0 1
10 25 14 0 1
10 25 16 0 1
end

Based on the variable match (in this case defining a group) I would like to create an ID consistently based on
- each group 1,2,...,n
- age consistency: If a mother is 19 in 1999 at the time of her first birth she can only be 19-21 in 2000 at a potential second birth
- previousbirths consistency: If a mother has her first child in 1999 (previousbirths=0) and a potential second birth in 2000, then previousbirths = 1
- education being the same over years (this is a facilitating assumption)

I would highly appreciate any suggestions on how to code this in Stata since I am completely lost.

Thank you very much in advance for taking the time and helping!

Best,

Max

↧

graphs: options

November 20, 2015, 7:16 am

≫ Next: Data input

≪ Previous: -egen- using different criteria

Hello,

is there an option I can use to change colour and type of writing for my graphs?
For exmaple I need Stata to create all descriptions in my graphs in Times New Roman 12.

Thanks in advance
Lisa

↧

Data input

November 20, 2015, 7:45 am

≫ Next: user written program to compute two step mole in Stata 12:problem of insufficient observations to compute bootstrap standard errors

≪ Previous: graphs: options

Dear all,

it's probably a numb questions.

I have quarterly data in the format "year quarter", e.g. "1993 1" labelled as "date"

I would like to create a new time variable via:

generate date2 = date(date, "YQ")

However, in the data editor all entries of the new variable are "-"

Any help would be highly appreciated.

Best,
Christian

↧

user written program to compute two step mole in Stata 12:problem of insufficient observations to compute bootstrap standard errors

November 20, 2015, 9:08 am

≫ Next: Compare results from -predict- and -margins- after Heckman

≪ Previous: Data input

Dear Statalisters,

I am using a sample of 845 migrants and I want to estimate a remittance model using both a Heckman selection model and a simple two-step model.

My dependent variables are 1) remit (binary variable equal 1 if migrant i remit) and 2) amount remit ( recoded as intervals).

The dependent variable remit is estimated using a probit model; while the variable amount_remit is estimated, conditional on the decision to remit, using interval regression (I have created two variables, amount_remit_L and amount_remit_U, containing the lower and upper endpoints of the amount_remit categories). The migrants sending remittances are 585.

Given that I need to estimate the model in one step (to get the correct standard errors) I did it in the following way:
(x4 and x11 are my exclusion restrictions)

capture program drop myprog
program myprog, eclass
xi: prob remit x1 x2 x3 i.x4 x5 x6 x7 i.x8 x9 x10 x11 x12
predict double xb1,xb
gen double imr1=normalden(xb1)/normal(xb1)
xi:intreg amount_remit_L and amount_remit_U x1 x2 x3 x5 x6 x7 i.x8 x9 x10 x12 imr1 if remit==1
end
bootstrap, reps(50) seed(2364723): myprog

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 50
insufficient observations to compute bootstrap standard errors
no results will be saved

Can you please help me to understand what I am doing wrong? does it depend on the sample size?
How can I estimate my model in one step to get the correct standard errors?

↧

Compare results from -predict- and -margins- after Heckman

November 20, 2015, 9:49 am

≫ Next: Is there any efficient way to convert a frequency table to dummy variables.

≪ Previous: user written program to compute two step mole in Stata 12:problem of insufficient observations to compute bootstrap standard errors

Dear all,

After fitting my model using -Heckman- MLE, I want to predict the probability for the selection equation.

I used -predict- and -margins-, but got slightly different results. I guess the difference is from the fact that -margins- calcuated the predicted values based on the uncensored sample mean rather than the whole sample mean. Does it mean -margins- get incorrect predicted values? Which one should I use? Thanks in advance.

Censored obs = 109272
Uncensored obs = 5352

Code:

****Predicted probabilities of the selection equation.

 
predict prob, psel
 tabstat prob, statistics(mean) by(household_income_R)
 
Summary for variables: prob
     by categories of: household_income_R (RECODE of household_income)

household_income_R |      mean
-------------------+----------
          <$24,999 |  .0353926
   $25,000-$49,999 |  .0412268
   $50,000-$69,999 |  .0485454
         >=$70,000 |  .0556965
-------------------+----------
             Total |  .0466653
------------------------------


margins household_income_R, atmeans predict(psel) noestimcheck

Adjusted predictions           Number of obs   =       5352
Model VCE    : Robust

Expression   : Pr(prob_buy_OG), predict(psel)
at           : LnAnnualPo~G    =   -.8428143 (mean)
             1.house~me_R    =    .1207025 (mean)
             ...
---------------------------------------------------------------------------------
                   |            Delta-method
                   |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
household_income_R |
         <$24,999  |   .0396291   .0016751    23.66   0.000      .036346    .0429122
  $25,000-$49,999  |   .0455513   .0011893    38.30   0.000     .0432203    .0478824
  $50,000-$69,999  |   .0506829    .001527    33.19   0.000     .0476901    .0536757
        >=$70,000  |   .0545087   .0012182    44.75   0.000     .0521212    .0568963
------------------------------------------------------------------------------------
*/

↧

Is there any efficient way to convert a frequency table to dummy variables.

November 20, 2015, 1:16 pm

≫ Next: Logit on a dataset re-created by corr2data?

≪ Previous: Compare results from -predict- and -margins- after Heckman

Hi, I have a frequency table as below, I wonder if there is an efficient way to input this table and generate dummy/categorical variables?
Array

↧

Logit on a dataset re-created by corr2data?

November 20, 2015, 4:21 pm

≫ Next: Monthly date variable from integer variable

≪ Previous: Is there any efficient way to convert a frequency table to dummy variables.

I am trying to reproduce a published set of empirical models based on the statistical properties reported in the paper. I've done this before for OLS regressions using corr2data to create a new dataset with the same statistical properties as the original data, but this time the models are logistic regressions. Is it still appropriate to use corr2data followed by a logit?

↧

Monthly date variable from integer variable

November 20, 2015, 4:30 pm

≫ Next: How to gen the following result. string add and newlines

≪ Previous: Logit on a dataset re-created by corr2data?

I have a variable that takes values 1,2,3...39. Each value indicates a month-year. How could I create a date variable in STATA with this information, starting in 2009m10.

Thanks in advance.

↧

How to gen the following result. string add and newlines

November 20, 2015, 5:47 pm

≫ Next: Mean centering predictor using multiply imputed data (with mi passive)

≪ Previous: Monthly date variable from integer variable

Hi, dear all

Code:

  . do "C:\Users\ADMINI~1\AppData\Local\Temp\STD01000000.tmp"

. clear

. input coef tstat

coef       tstat
1. 0.2 1.96
2. 0.5 2.56
3. 0.65 0.75
4. 0.25 6
5. end

. gen result = string(coef) + "(" + string(tstat) + ")"

. list

+-------------------------+
coef   tstat     result
-------------------------
1.    .2    1.96   .2(1.96)
2.    .5    2.56   .5(2.56)
3.   .65     .75   .65(.75)
4.   .25       6     .25(6)
+-------------------------+

.
end of do-file

I want to obtain the result as following

Code:

  .2
(1.96)

.5
(2.56)

.65
(.75)

.25
(6)

Thanks very much!
Best regards,
wanhaiyou

↧

Mean centering predictor using multiply imputed data (with mi passive)

November 20, 2015, 6:33 pm

≫ Next: merging trouble

≪ Previous: How to gen the following result. string add and newlines

Hi Statalist,

I've been puzzling over “mi” commands today, trying to understand what I can and can't do with imputed data, and would very much appreciate any insight into mean centering a predictor using imputed data. The model will examine predictors for work-related injuries, and the variable "day hours" (hours worked per day) was has 36% missing. My code is -

mi set wide
mi register imputed dayhours
mi register regular daysweek injuries ohr Cash_y threats vio sector Fluent Nodoc agecentred age2 monthscentred months3 Exp dsm_symptomatic sit_break any_memprob
mi impute chained (truncreg, ul(24) ll(1)) dayhours = daysweek injuries ohr Cash_y threats vio sector Fluent Nodoc agecentred age2 monthscentred months3 Exp dsm_symptomatic sit_break any_memprob, add(20) rseed(4409) force
**Create new var "day2" to create new hours worked/week var based on daysweek (days worked/week)
mi passive: gen day2=dayhours
mi passive: gen hoursweek2=.
mi xeq: replace hoursweek2=day2*5 if daysweek==1
mi xeq: replace hoursweek2=day2*6 if daysweek==2
mi xeq: replace hoursweek2=day2*7 if daysweek==3
mi xeq: replace hoursweek2=day2*7 if daysweek==4
**Rescale hours worked/week to 10
mi passive: gen hour10mi=hoursweek2/10

Now, I would like to create a variable, "c_hour10mi", that is mean centered, for each dataset. This is where I run into a problem -

mi xeq: sum hour10mi, meanonly
mi passive: gen c_hour10mi = hour10mi - r(mean)

When I try to create the mean centered variable, the command above generates all missing observations for “c_hour10mi” in all the imputed datasets. I’m not sure where I’m going wrong.. is there a command that works with mi suite that could take care of mean centering that works with?

With many thanks, Nicola

↧

merging trouble

November 20, 2015, 7:56 pm

≫ Next: Firm fixed Effects Regressions and Standard Errors Clustered at Country Level

≪ Previous: Mean centering predictor using multiply imputed data (with mi passive)

Dear Statalisters, I'm hoping someone may be able to help me with the following problem.I'm not sure how to merge this or what code or command may be applicable.

I have two files.
One file has a year column from 1930-2000 and the other columns are cluster numbers ranging from 3 to 500.

year 3 6 8 10
1930 200 13 25 5
1931 252 52 85 89
1932 85 12 20 5
......
1985 5 52 89 10
2000 55 10 5 9

The second file has a column for year but a collective column for cluster

year cluster
1932 3
1985 3
2000 3
1932 8
1985 8
1999 8

I'm trying to merge the data from the first file to the appropriate year and cluster in the second.

As in get

year cluster var
1932 3 85
1985 3 5
2000 3 55

Any help would be appreciated!,
Amy

↧

Firm fixed Effects Regressions and Standard Errors Clustered at Country Level

November 20, 2015, 11:53 pm

≫ Next: how to find date related value

≪ Previous: merging trouble

Hi,

I have panel data (companies and years). The companies are from different countries and I want to run a regression with firm fixed effects in which I want to include standard errors clustered at country level. Can I do this ?.

and can I use "pooled model" (regress command in stata) directly without checking individual effects with the Fisher test?

↧

how to find date related value

November 21, 2015, 2:17 am

≫ Next: Marginal Effects multinomial logit model (Dummy Interpretation)

≪ Previous: Firm fixed Effects Regressions and Standard Errors Clustered at Country Level

Dear All
i am facing a problem with date value.
in my data i have two variable one is "startdate" another one is "enddate". i want to find out those error dates which had the condition bellow.
enddate <startdat. like (start date is 02/02/2014 for this data end date is 02/03/2013)
please help me with command
Regards
Raeed

↧