Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 72762

Heckman with two sources of sample selection and Instrumental Variable (IV)

$
0
0
Dear Statalist,

I would like to estimate the following equation using cross-section data:

Code:
Y = A + B*X0 + B*X1 + U
In this equiation the dependent variable is only observed when a selection rule applies (typical sample selection problem a la Heckman). One of the independent variables, say X0, is a binary endogenous, and I'd like to use instrumental variable to address that issue within the Heckman framework. But there is another problem, which is that X0 also has a selection rule, different of the rule for Y. I've been working in two ways to estimate the main equation.

1. The first is to estimate a Heckman model for X0 (in Stata it would be: heckprobit X0 X2, sel(X3), with X3 the exclusion variables for X0), obtain the linear prediction of X0 (predict X0_hat, xb) and use this variable instead of X0 in the main equation, and then estimate again a Heckman model this time for Y, correcting standard errors by bootstrapping.

2. The second procedure follows Wooldridge (2010) (Section 19.6.2 in the second edition of "Econometric Analysis of Cross Section and Panel Data," MIT Press, 2010). The steps are: (i) estimate a probit model for the selection equation (I=1 when X0 and Y are observed) using all exogenous variables (including instruments Z1 and selection variables Z2). (ii) Obtain the inverse Mills ratios (IMR). (iii) Estimate the structural equation (Y in this case) by 2sls, correcting standard errors by bootstrapping.

Code:
ivregress 2sls Y X1 IMR (X0 = Z1 Z2 IMR)
The problem is I don't know which of the two procedures is more adecuate. The estimates I obtain from the two are ver different (the first is positive and the second is negative, though there are possible weak instruments, but there is another problem). My questions are:


1. As I know the first procedure would be okay if the first stage of 2sls (by hand in this case) would be estimated by OLS (using regress). It is okay to obtain the predicted value of the endogenous variable from Heckman estimates and use it in the second stage to estimate the main equation?

2. I'm aware that the selection that applies for X0 and Y is not the same, this is Y and X0 are not observed always at the same time (there are missing values of X0 when Y is positive). Can I use the procedure anyway?

Do you have any suggestion about this?

Thanks in advance,

Sergio

Viewing all articles
Browse latest Browse all 72762

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>