Dear all,
I am doing analysis using longitudinal counted data. I have some questions regarding what analysis methods that I should use. Could you please answer the following questions?
I am thinking to use xtnbreg, not xtpoission, because the means of variables are not the same with standard deviations of variables. Please see the summary statistics of my sample.
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
nfges_all | 8740 14.81739 8.152234 1 33
ln_firmsize | 8740 .884222 .3701103 .6931472 3.091043
strg_special | 8529 .9874546 .1113081 0 1
innovative | 8715 .172117 .3775038 0 1
other_pay | 8740 .8061785 .3953133 0 1
-------------+--------------------------------------------------------
avg_age | 8740 42.90567 11.33147 18.5 75.5
ln_avg_sta~p | 8740 .5545851 .4788572 0 2.397895
ln_avg_ind | 8740 1.564368 1.086563 0 3.871201
ln_avg_mgr | 8740 2.158557 .8213584 0 3.821369
family | 8740 .7159039 .4510086 0 1
-------------+--------------------------------------------------------
ln_age_sd | 8740 1.246822 .8209818 0 3.321882
gender_d | 8740 .3393736 .2279552 0 .5
eth_d | 8740 .0479357 .1697156 0 1
ln_ind_sd | 8740 1.149404 .9272638 0 3.044523
ln_startup~d | 8740 .3580668 .3778226 0 1.871802
-------------+--------------------------------------------------------
ln_mgr_sd | 8740 1.398575 .8319291 0 3.686488
f_ln_age_sd | 8740 .8865466 .9115653 0 3.321882
f_gender_d | 8740 .3088132 .2400879 0 .5
f_eth_d | 8740 .0276602 .1307402 0 1
f_ln_ind_sd | 8740 .7876592 .9510918 0 3.044523
-------------+--------------------------------------------------------
f_ln_start~d | 8740 .2247479 .3394624 0 1.871802
f_ln_mgr_sd | 8740 1.023029 .9577016 0 3.068021
However, my first question is if it’s okay to use xtnbreg command in this case because it seems the variables are rather underdispersed than overdispersed. And, there is no specific recommendation for the case of underdispersed dataset. For cross-sectional data, we can use 'estate gof’ command to identify what command to use, but for panel data, I couldn’t find appropriate commands in this case.
Anyway, following some recommendations of websites, I conducted ‘overdispersion’ test to see if the data is really over-dispersed or under-dispersed: (1) the website recommended to use s & r at the bottom of xtnbreg outcome to calculate δ = s/(r - 1) (please look at bottom of the table below), but it didn’t say what is the criteria of overdispersion or underdispersion. More specifically, if δ is greater than 1, does it mean overdispersion? And if δ if smaller than 1, does it mean underdispersion?
(2) Another thing that I looked at is statistic information at the bottom of ‘xtnbreg’ outcome: Likelihood-ratio test vs. pooled: chibar2(01) = 5971.35 (please look at bottom of the table below). Since chibar2 is much larger than 1, can I say the data is overdispersed and use xtnbreg command?
The below table is the regression result of using xtnbreg command:
Random-effects negative binomial regression Number of obs = 8504
Group variable: sampid Number of groups = 290
Random effects u_i ~ Beta Obs per group: min = 4
avg = 29.3
max = 65
Wald chi2(37) = 339.60
Log likelihood = -25720.958 Prob > chi2 = 0.0000
---------------------------------------------------------------------------------
nfges_all | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
ln_firmsize | -.1013186 .0855316 -1.18 0.236 -.2689574 .0663203
strg_special | -.0774372 .0733181 -1.06 0.291 -.221138 .0662636
innovative | .0600717 .0350446 1.71 0.087 -.0086144 .1287578
other_pay | -.4664182 .107286 -4.35 0.000 -.676695 -.2561414
avg_age | .023599 .0059207 3.99 0.000 .0119947 .0352032
ln_avg_startup | .322678 .1162967 2.77 0.006 .0947406 .5506154
ln_avg_ind | -.3446654 .0655012 -5.26 0.000 -.4730454 -.2162854
ln_avg_mgr | .1900138 .0746375 2.55 0.011 .0437269 .3363007
family | -.2661666 .242957 -1.10 0.273 -.7423535 .2100203
ln_age_sd | -.6348662 .0905796 -7.01 0.000 -.812399 -.4573335
gender_d | -1.862834 .3287666 -5.67 0.000 -2.507205 -1.218464
eth_d | 1.907085 .541249 3.52 0.000 .8462568 2.967914
ln_ind_sd | .2728686 .0959465 2.84 0.004 .0848169 .4609204
ln_startup_sd | -.9528149 .2106715 -4.52 0.000 -1.365723 -.5399064
ln_mgr_sd | .2921513 .1035819 2.82 0.005 .0891346 .495168
f_ln_age_sd | .5377865 .1053053 5.11 0.000 .3313919 .7441811
f_gender_d | 1.653626 .425103 3.89 0.000 .8204391 2.486812
f_eth_d | -1.202517 .5960664 -2.02 0.044 -2.370785 -.0342479
f_ln_ind_sd | -.0612558 .0939045 -0.65 0.514 -.2453052 .1227937
f_ln_startup_s| .1235049 .2006125 0.62 0.538 -.2696883 .5166981
f_ln_mgr_sd| -.6731969 .1165071 -5.78 0.000 -.9015465 -.4448472
_cons | 3.042684 .2589701 11.75 0.000 2.535112 3.550257
----------------+----------------------------------------------------------------
/ln_r | 1.341416 .1095901 1.126623 1.556208
/ln_s | .7426653 .1170089 .5133321 .9719986
----------------+----------------------------------------------------------------
r | 3.824455 .4191222 3.085221 4.740812
s | 2.101529 .2458976 1.670849 2.643222
---------------------------------------------------------------------------------
Likelihood-ratio test vs. pooled: chibar2(01) = 5971.35 Prob>=chibar2 = 0.000
I hope to have some responses from you and to find answers for this issue.
Thank you very much!
EJ
I am doing analysis using longitudinal counted data. I have some questions regarding what analysis methods that I should use. Could you please answer the following questions?
I am thinking to use xtnbreg, not xtpoission, because the means of variables are not the same with standard deviations of variables. Please see the summary statistics of my sample.
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
nfges_all | 8740 14.81739 8.152234 1 33
ln_firmsize | 8740 .884222 .3701103 .6931472 3.091043
strg_special | 8529 .9874546 .1113081 0 1
innovative | 8715 .172117 .3775038 0 1
other_pay | 8740 .8061785 .3953133 0 1
-------------+--------------------------------------------------------
avg_age | 8740 42.90567 11.33147 18.5 75.5
ln_avg_sta~p | 8740 .5545851 .4788572 0 2.397895
ln_avg_ind | 8740 1.564368 1.086563 0 3.871201
ln_avg_mgr | 8740 2.158557 .8213584 0 3.821369
family | 8740 .7159039 .4510086 0 1
-------------+--------------------------------------------------------
ln_age_sd | 8740 1.246822 .8209818 0 3.321882
gender_d | 8740 .3393736 .2279552 0 .5
eth_d | 8740 .0479357 .1697156 0 1
ln_ind_sd | 8740 1.149404 .9272638 0 3.044523
ln_startup~d | 8740 .3580668 .3778226 0 1.871802
-------------+--------------------------------------------------------
ln_mgr_sd | 8740 1.398575 .8319291 0 3.686488
f_ln_age_sd | 8740 .8865466 .9115653 0 3.321882
f_gender_d | 8740 .3088132 .2400879 0 .5
f_eth_d | 8740 .0276602 .1307402 0 1
f_ln_ind_sd | 8740 .7876592 .9510918 0 3.044523
-------------+--------------------------------------------------------
f_ln_start~d | 8740 .2247479 .3394624 0 1.871802
f_ln_mgr_sd | 8740 1.023029 .9577016 0 3.068021
However, my first question is if it’s okay to use xtnbreg command in this case because it seems the variables are rather underdispersed than overdispersed. And, there is no specific recommendation for the case of underdispersed dataset. For cross-sectional data, we can use 'estate gof’ command to identify what command to use, but for panel data, I couldn’t find appropriate commands in this case.
Anyway, following some recommendations of websites, I conducted ‘overdispersion’ test to see if the data is really over-dispersed or under-dispersed: (1) the website recommended to use s & r at the bottom of xtnbreg outcome to calculate δ = s/(r - 1) (please look at bottom of the table below), but it didn’t say what is the criteria of overdispersion or underdispersion. More specifically, if δ is greater than 1, does it mean overdispersion? And if δ if smaller than 1, does it mean underdispersion?
(2) Another thing that I looked at is statistic information at the bottom of ‘xtnbreg’ outcome: Likelihood-ratio test vs. pooled: chibar2(01) = 5971.35 (please look at bottom of the table below). Since chibar2 is much larger than 1, can I say the data is overdispersed and use xtnbreg command?
The below table is the regression result of using xtnbreg command:
Random-effects negative binomial regression Number of obs = 8504
Group variable: sampid Number of groups = 290
Random effects u_i ~ Beta Obs per group: min = 4
avg = 29.3
max = 65
Wald chi2(37) = 339.60
Log likelihood = -25720.958 Prob > chi2 = 0.0000
---------------------------------------------------------------------------------
nfges_all | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
ln_firmsize | -.1013186 .0855316 -1.18 0.236 -.2689574 .0663203
strg_special | -.0774372 .0733181 -1.06 0.291 -.221138 .0662636
innovative | .0600717 .0350446 1.71 0.087 -.0086144 .1287578
other_pay | -.4664182 .107286 -4.35 0.000 -.676695 -.2561414
avg_age | .023599 .0059207 3.99 0.000 .0119947 .0352032
ln_avg_startup | .322678 .1162967 2.77 0.006 .0947406 .5506154
ln_avg_ind | -.3446654 .0655012 -5.26 0.000 -.4730454 -.2162854
ln_avg_mgr | .1900138 .0746375 2.55 0.011 .0437269 .3363007
family | -.2661666 .242957 -1.10 0.273 -.7423535 .2100203
ln_age_sd | -.6348662 .0905796 -7.01 0.000 -.812399 -.4573335
gender_d | -1.862834 .3287666 -5.67 0.000 -2.507205 -1.218464
eth_d | 1.907085 .541249 3.52 0.000 .8462568 2.967914
ln_ind_sd | .2728686 .0959465 2.84 0.004 .0848169 .4609204
ln_startup_sd | -.9528149 .2106715 -4.52 0.000 -1.365723 -.5399064
ln_mgr_sd | .2921513 .1035819 2.82 0.005 .0891346 .495168
f_ln_age_sd | .5377865 .1053053 5.11 0.000 .3313919 .7441811
f_gender_d | 1.653626 .425103 3.89 0.000 .8204391 2.486812
f_eth_d | -1.202517 .5960664 -2.02 0.044 -2.370785 -.0342479
f_ln_ind_sd | -.0612558 .0939045 -0.65 0.514 -.2453052 .1227937
f_ln_startup_s| .1235049 .2006125 0.62 0.538 -.2696883 .5166981
f_ln_mgr_sd| -.6731969 .1165071 -5.78 0.000 -.9015465 -.4448472
_cons | 3.042684 .2589701 11.75 0.000 2.535112 3.550257
----------------+----------------------------------------------------------------
/ln_r | 1.341416 .1095901 1.126623 1.556208
/ln_s | .7426653 .1170089 .5133321 .9719986
----------------+----------------------------------------------------------------
r | 3.824455 .4191222 3.085221 4.740812
s | 2.101529 .2458976 1.670849 2.643222
---------------------------------------------------------------------------------
Likelihood-ratio test vs. pooled: chibar2(01) = 5971.35 Prob>=chibar2 = 0.000
I hope to have some responses from you and to find answers for this issue.
Thank you very much!
EJ