Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 73237

Extremely large AIC & BIC figures (xtpoisson, ppmlhdfe, xtnbreg)

$
0
0
Dear Statalisters, I would appreciate your kind opinion on the below issue.

I have around 1 million observations from a few thousand manufacturers. I am dealing with an outcome variable which is production quantity. It has a large portion of zeroes (20%).

Code:
. sum dv1

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         dv1 |  1,128,675    17211.52    223438.7          0   1.35e+07
When I run a ppmlhdfe and xtpoisson to predict this variable, I am given AIC figures in billions (!) and a high Pseudo R2. I have never come across AIC and BIC figures this high. When I rescale the outcome variable by dividing it by 1,000, AIC/BIC figures seem to become more “usual”. However, the outcome variable contains figures smaller than 1,000, so I am concerned that this transformation might affect its count nature.

Code:
. ppmlhdfe dv1 l1.dv1   l1.z_cv2 l1.z_cv3 l1.z_cv4 l1.z_cv5 ,a(eproducer year) cluster(eproducer) exp(lagcv1) d
(dropped 51273 observations that are either singletons or separated by a fixed effect)
Iteration 1:   deviance = 1.2802e+10  eps = .         iters = 6    tol = 1.0e-04  min(eta) =  -9.11  P   
Iteration 2:   deviance = 5.7570e+09  eps = 1.22e+00  iters = 6    tol = 1.0e-04  min(eta) =  -9.70      
Iteration 3:   deviance = 4.3877e+09  eps = 3.12e-01  iters = 5    tol = 1.0e-04  min(eta) = -10.07      
Iteration 4:   deviance = 4.0823e+09  eps = 7.48e-02  iters = 5    tol = 1.0e-04  min(eta) = -10.59      
Iteration 5:   deviance = 4.0171e+09  eps = 1.63e-02  iters = 4    tol = 1.0e-04  min(eta) = -11.49      
Iteration 6:   deviance = 4.0040e+09  eps = 3.25e-03  iters = 3    tol = 1.0e-04  min(eta) = -12.30      
Iteration 7:   deviance = 4.0015e+09  eps = 6.25e-04  iters = 2    tol = 1.0e-04  min(eta) = -12.96      
Iteration 8:   deviance = 4.0011e+09  eps = 1.15e-04  iters = 2    tol = 1.0e-04  min(eta) = -13.96      
Iteration 9:   deviance = 4.0010e+09  eps = 2.10e-05  iters = 2    tol = 1.0e-04  min(eta) = -14.96      
Iteration 10:  deviance = 4.0010e+09  eps = 4.28e-06  iters = 2    tol = 1.0e-05  min(eta) = -15.94      
Iteration 11:  deviance = 4.0010e+09  eps = 9.03e-07  iters = 2    tol = 1.0e-06  min(eta) = -16.91   S  
Iteration 12:  deviance = 4.0010e+09  eps = 1.88e-07  iters = 2    tol = 1.0e-07  min(eta) = -17.82   S  
Iteration 13:  deviance = 4.0010e+09  eps = 3.81e-08  iters = 2    tol = 1.0e-07  min(eta) = -18.61   S  
Iteration 14:  deviance = 4.0010e+09  eps = 6.80e-09  iters = 2    tol = 1.0e-08  min(eta) = -19.14   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 14 iterations and 45 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =    942,909
Absorbing 2 HDFE groups                           Residual df     =      4,129
Statistics robust to heteroskedasticity           Wald chi2(5)    =      84.28
Deviance             =   4000971161               Prob > chi2     =     0.0000
Log pseudolikelihood =  -2003530151               Pseudo R2       =     0.9694

Number of clusters (eproducer)=     4,130
                          (Std. err. adjusted for 4,130 clusters in eproducer)
------------------------------------------------------------------------------
             |               Robust
         dv1 | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         dv1 |
         L1. |   1.70e-07   2.86e-08     5.95   0.000     1.14e-07    2.26e-07
             |
       z_cv2 |
         L1. |   .1590041   .2122519     0.75   0.454     -.257002    .5750101
             |
       z_cv3 |
         L1. |  -.0043667   .0082502    -0.53   0.597    -.0205368    .0118033
             |
       z_cv4 |
         L1. |  -.2516427   .1229716    -2.05   0.041    -.4926626   -.0106228
             |
       z_cv5 |
         L1. |   .0085431   .0072125     1.18   0.236    -.0055932    .0226794
             |
       _cons |   5.650281   .1897047    29.78   0.000     5.278467    6.022095
  ln(lagcv1) |          1  (exposure)
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
   eproducer |      4130        4130           0    *|
        year |        46           0          46     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |          N   ll(null)  ll(model)      df        AIC        BIC
-------------+---------------------------------------------------------------
           . |    942,909  -6.55e+10  -2.00e+09       6   4.01e+09   4.01e+09
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.
On the other hand, when I run xtnbreg, I receive better (still in millions) AIC and BIC figures. However, xtnbreg has been sensitive to model specification and often doesn't converge when I add or remove predictors.

Code:

. xtnbreg dv1 l1.dv1   l1.z_cv2 l1.z_cv3 l1.z_cv4 l1.z_cv5 i.year ,fe exp(lagcv1)
note: 135 groups (135 obs) dropped because of only one obs per group
note: 629 groups (49921 obs) dropped because of all zero outcomes

Iteration 0:   log likelihood = -1.225e+09  (not concave)
Iteration 1:   log likelihood = -4.990e+08  
Iteration 2:   log likelihood = -8027179.1  
Iteration 3:   log likelihood = -7995697.9  (not concave)
Iteration 4:   log likelihood = -7517186.9  
Iteration 5:   log likelihood = -7057044.2  (backed up)
Iteration 6:   log likelihood = -6798324.5  
Iteration 7:   log likelihood =   -6734128  
Iteration 8:   log likelihood = -6504148.3  
Iteration 9:   log likelihood = -6355948.2  
Iteration 10:  log likelihood = -6330436.9  
Iteration 11:  log likelihood =   -6328037  
Iteration 12:  log likelihood = -6327554.1  
Iteration 13:  log likelihood = -6327459.7  
Iteration 14:  log likelihood = -6327439.4  
Iteration 15:  log likelihood = -6327434.6  
Iteration 16:  log likelihood = -6327433.5  
Iteration 17:  log likelihood = -6327433.2  
Iteration 18:  log likelihood = -6327433.2  
Iteration 19:  log likelihood = -6327433.2  
Iteration 20:  log likelihood = -6327433.2  
Iteration 21:  log likelihood = -6327433.2  
Iteration 22:  log likelihood = -6327433.2  
Iteration 23:  log likelihood = -6327433.2  

Conditional FE negative binomial regression      Number of obs    =    944,126
Group variable: eproducer                        Number of groups =      4,130

                                                 Obs per group:
                                                              min =          2
                                                              avg =      228.6
                                                              max =        553

                                                 Wald chi2(51)    = 1033911.20
Log likelihood = -6327433.2                      Prob > chi2      =     0.0000

------------------------------------------------------------------------------
         dv1 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         dv1 |
         L1. |   1.11e-07   5.15e-10   214.94   0.000     1.10e-07    1.12e-07
             |
       z_cv2 |
         L1. |  -.6988149   .0015773  -443.05   0.000    -.7019063   -.6957235
             |
       z_cv3 |
         L1. |  -.0279359   .0004649   -60.09   0.000    -.0288471   -.0270246
             |
       z_cv4 |
         L1. |   .4380268   .0156258    28.03   0.000     .4074009    .4686528
             |
       z_cv5 |
         L1. |    .016793   .0038206     4.40   0.000     .0093048    .0242811
             |
        year |

             |
       _cons |  -1.826474   .0291059   -62.75   0.000     -1.88352   -1.769427
  ln(lagcv1) |          1  (exposure)
------------------------------------------------------------------------------

. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |          N   ll(null)  ll(model)      df        AIC        BIC
-------------+---------------------------------------------------------------
           . |    944,126          .   -6327433      52   1.27e+07   1.27e+07
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.
Do you think I should be concerned about using fe poisson to predict this variable due to the large AIC figures? Does overdispersion make xtnbreg a better choice? In any case, ΔAIC is in millions across models (always 6 zeros), so I am also not sure on how to present the results.
Thank you!

Viewing all articles
Browse latest Browse all 73237

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>