I want to see how various healthcare variables impact population health. I have panel data on 10 jurisdictions over 38 years (N=10, T=38). My conceptual model is a dynamic panel data model, as health one year leads directly to health the next.
My key explanatory variables (healthcare capital stock, number of healthcare workers, and drug expenditures) are likely endogenous.
A colleague recommended using system GMM as a way to both represent the dynamic panel model and deal with endogeneity.
I ran the following command:
Code:
xtabond2 ln_trt_mort L.ln_trt_mort ln_total_capital_r ln_prov_drugs_r ///
ln_health_workers ln_population i.year, ///
gmm(L.ln_trt_mort ln_total_capital_r ln_prov_drugs_r ln_health_workers, ///
collapse) iv(i.year ln_population) twostep robust small nodiffsargan
I got the following results:
Code:
Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: prov_num Number of obs = 300
Time variable : year Number of groups = 10
Number of instruments = 182 Obs per group: min = 30
F(49, 9) = 26726.95 avg = 30.00
Prob > F = 0.000 max = 30
------------------------------------------------------------------------------------
| Corrected
ln_trt_mort | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
ln_trt_mort |
L1. | 0 (omitted)
|
ln_total_capital_r | -.1949693 .033019 -5.90 0.000 -.2696634 -.1202752
ln_prov_drugs_r | .5323356 .0313076 17.00 0.000 .4615128 .6031584
ln_health_workers | 0 (omitted)
ln_population | 0 (omitted)
|
year |
1975 | 0 (empty)
1976 | 0 (omitted)
1977 | 0 (omitted)
1978 | 0 (omitted)
1979 | 0 (omitted)
1980 | 0 (omitted)
1981 | 0 (omitted)
1982 | 0 (omitted)
1983 | 0 (omitted)
1984 | 0 (omitted)
1985 | 0 (omitted)
1986 | 0 (omitted)
1987 | 0 (omitted)
1988 | .3323847 .0520803 6.38 0.000 .2145708 .4501987
1989 | 0 (omitted)
1990 | 0 (omitted)
1991 | 0 (omitted)
1992 | .1061141 .0508995 2.08 0.067 -.0090286 .2212569
1993 | 0 (omitted)
1994 | 0 (omitted)
1995 | .0717928 .0410346 1.75 0.114 -.0210338 .1646195
1996 | 0 (omitted)
1997 | -.0076909 .0741135 -0.10 0.920 -.1753472 .1599654
1998 | 0 (omitted)
1999 | .0157125 .028684 0.55 0.597 -.0491752 .0806002
2000 | -.0480081 .0404153 -1.19 0.265 -.1394338 .0434176
2001 | 0 (omitted)
2002 | 0 (omitted)
2003 | 0 (omitted)
2004 | 0 (omitted)
2005 | 0 (omitted)
2006 | 0 (omitted)
2007 | -.125124 .0278338 -4.50 0.001 -.1880883 -.0621596
2008 | 0 (omitted)
2009 | 0 (omitted)
2010 | 0 (omitted)
2011 | 0 (omitted)
2012 | -.0886373 .0164099 -5.40 0.000 -.1257592 -.0515154
2013 | 0 (omitted)
2014 | 0 (omitted)
2015 | 0 (omitted)
2016 | 0 (omitted)
2017 | 0 (omitted)
2018 | 0 (omitted)
|
_cons | 0 (omitted)
------------------------------------------------------------------------------------
Instruments for first differences equation
Standard
D.(1975b.year 1976.year 1977.year 1978.year 1979.year 1980.year 1981.year
1982.year 1983.year 1984.year 1985.year 1986.year 1987.year 1988.year
1989.year 1990.year 1991.year 1992.year 1993.year 1994.year 1995.year
1996.year 1997.year 1998.year 1999.year 2000.year 2001.year 2002.year
2003.year 2004.year 2005.year 2006.year 2007.year 2008.year 2009.year
2010.year 2011.year 2012.year 2013.year 2014.year 2015.year 2016.year
2017.year 2018.year ln_population)
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(1/43).(L.ln_trt_mort ln_total_capital_r ln_prov_drugs_r
ln_health_workers) collapsed
Instruments for levels equation
Standard
1975b.year 1976.year 1977.year 1978.year 1979.year 1980.year 1981.year
1982.year 1983.year 1984.year 1985.year 1986.year 1987.year 1988.year
1989.year 1990.year 1991.year 1992.year 1993.year 1994.year 1995.year
1996.year 1997.year 1998.year 1999.year 2000.year 2001.year 2002.year
2003.year 2004.year 2005.year 2006.year 2007.year 2008.year 2009.year
2010.year 2011.year 2012.year 2013.year 2014.year 2015.year 2016.year
2017.year 2018.year ln_population
_cons
GMM-type (missing=0, separate instruments for each period unless collapsed)
D.(L.ln_trt_mort ln_total_capital_r ln_prov_drugs_r ln_health_workers)
collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -2.60 Pr > z = 0.009
Arellano-Bond test for AR(2) in first differences: z = 0.18 Pr > z = 0.858
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(132) = 203.88 Prob > chi2 = 0.000
(Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(132) = 0.00 Prob > chi2 = 1.000
(Robust, but weakened by many instruments.)
Obviously, I have way too many instruments, so the Hansen test result is not reliable.
I next tried to impose lag limits, although I wasn't super comfortable with this as it was not directly recommended in the GMM literature I read.
Code:
xtabond2 ln_trt_mort L.ln_trt_mort ln_total_capital_r ln_prov_drugs_r ///
ln_health_workers ln_population, ///
gmm(L.ln_trt_mort ln_total_capital_r ln_prov_drugs_r ln_health_workers, ///
collapse eq(level) laglimits (0 1)) iv(ln_population, equation(level)) ///
robust small nodiffsargan twostep
I get the following results:
Code:
Dynamic panel-data estimation, one-step system GMM
------------------------------------------------------------------------------
Group variable: prov_num Number of obs = 300
Time variable : year Number of groups = 10
Number of instruments = 10 Obs per group: min = 30
F(5, 9) = 1041.73 avg = 30.00
Prob > F = 0.000 max = 30
------------------------------------------------------------------------------------
| Robust
ln_trt_mort | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
ln_trt_mort |
L1. | .506323 .3194124 1.59 0.147 -.2162381 1.228884
|
ln_total_capital_r | -.0885838 .0328479 -2.70 0.025 -.162891 -.0142767
ln_prov_drugs_r | -.1461226 .1173413 -1.25 0.244 -.4115671 .1193218
ln_health_workers | .170425 .2320239 0.73 0.481 -.3544496 .6952997
ln_population | .5489733 .4645 1.18 0.268 -.5017986 1.599745
_cons | -1.348713 3.142279 -0.43 0.678 -8.457042 5.759616
------------------------------------------------------------------------------------
Instruments for levels equation
Standard
ln_population
_cons
GMM-type (missing=0, separate instruments for each period unless collapsed)
DL(0/1).(L.ln_trt_mort ln_total_capital_r ln_prov_drugs_r
ln_health_workers) collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -1.79 Pr > z = 0.074
Arellano-Bond test for AR(2) in first differences: z = 1.30 Pr > z = 0.194
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(4) = 23.67 Prob > chi2 = 0.000
(Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(4) = 7.00 Prob > chi2 = 0.136
(Robust, but weakened by many instruments.)
My results now seem okay, but only one of my variables is significant and the number of instruments equals the number of groups, which is not what is recommended. Furthermore, I cannot seem to add any controls or do any extended versions of this model without the Hansen test result breaking down. Should I trust these results?
Are my problems a function of the fact that T>N? I tried running some regressions on a model of just the 9 most recent years so that N>T, but that didn't seem to work either.