Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 73255

Determining the level of fixed effects - appying Papke and Wooldridge (2023) method

$
0
0
Hi,

I am writing this post to ask for your help in determining the level of fixed effect to be controlled suggested in Papke and Wooldridge (2023) - A simple, robust test for choosing the level of fixed effects in linear panel data models (link)
(I am studying the effect of village-level treatment on household-level outcomes, so trying to figure out whether I should include household- or village-FE)

I was able to follow the procedure using Stata, but have two questions to check the procedure and final steps.
I post this question here as it seems many of Statalist users seem to be familiar with this method; I saw some users referred to this paper as their answers to other questions.


In this post I use Stata's NLSY data as an example. Suppose I estimate the effect of hours worked on ln(wage) as below, with a vector of controls including time-varying (age and weeks_worked) and time-invarying (race) at individual-level.
lnwageit = b0 + b1*hoursit + b2*Xit + (Fixed Effects)
Denote b1hatiFE and b1hatgFE as the estimates of b1 under unit- and group- FE.

My goal is to test whether b1 (coefficient on hoursit) is robust to the choice of fixed effect; individual-level or group-level (industry-level in this example).

Here's the procedure suggested in the paper (procedure 3.2 in Chapter 3.3 "Testing a single coefficient")


Step 1: Run unit-FE regression with time dummies and controls, and obtain the residuals. Repeat it with group-FE.

Step 2. Run unit-FE regression of the variable of interest (hoursit in this example) on time dummies and controls, and obtain the residuals. Repeat it with group-FE.

Step 3. Compute the average of (unit-FE residuals)2 (from step 2) across i and t, and the average of (group-FE residuals)2 (from step 2) across i and t.

Step 4. Construct q_hat, the difference in {(residuals from step 2) * (residuals from step 1) / (the average from step 3)} between unit-FE model and group-FE model (equation 3.16 in the paper)


Step 5. Obtain SE(b1hatiFE - b1hatgFE), the standard error of (b1hatiFE - b1hatgFE), from regressing q_hat (from step 4) on the constant value 1, probably clustering at the group-level or at least at the individual-level. The single estimated coefficient will be identically zero.

Then the paper wrote we can use a t statistic version of the Hausman test, obtained as (b1hatiFE - b1hatgFE) / SE(b1hatiFE - b1hatgFE), to test whether we can use individual-FE or group-FE.


I would like to ask two questions about this procedure.

1. In step 5, how do I obtain SE(b1hatiFE - b1hatgFE) from regressing q_hat on 1? Is it the standard error of the coefficient on 1? I assume it is, based on what authors wrote in the previous section ("...and the cluster-robust variance-covariance matrix will be V1_hat"), but want to double-check this.

2.
Once I computed the final t-statistic, can I just interpret it as regular t-statistic reported in regression? For example, if my t-statistic is greater than 1.96, I can reject the null hypothesis that unit_FE estimator and group-FE estimator are the same at p=0.05, and stick to unit-FE estimator?


Any comments are appreciated it.



Here's the code I used to replicate the procedure.
In this example my t-statistic was 0.15, failing to reject null hypothesis. It was consistent with the fact that unit-FE estimator and group-FE estimator were nearly the same.


Code:
use https://www.stata-press.com/data/r18/nlswork, clear
lobal    Y    ln_wage
global   T    hours
global   G    ind_code // industry-identifier
global   Xs    age race wks_work

*    (0)    Make balanced panel data by keeping balanced individuals only (not strongly required, but for convenience)
        
*    Keep only observations which all variables in the regression are non-missing.
egen    num_missing    =    rowmiss(${Y}    ${T}    ${G}    ${Xs})
keep    if    num_missing==0
        
*    Keep only individuals surveyed across all years
keep    if    inrange(year,68,73)    //    Keep it shorter to make a larger sample of balanced data
bys    idcode:    egen    num_surveyed    =    count(ln_wage)
keep    if    num_surveyed==6



*    (1-1)    Run individual-FE model with time dummies, and get residuals (SE clustered at individual-level)
    xtset    idcode year // idcode is unit identifier
    xtreg    ${Y} ${T}    ${Xs}    i.year, fe vce(cluster    idcode)
    scalar    b1hat_iFE=e(b)[1,1]    //    indivdiual-FE estimator
    predict    uhat_iFE_resid,    residual    //    residuals
    
    *    (1-2)    Run industry-FE model with time dummies, and get residuals    (SE clustered at individual-level)
    reg    ${Y}    ${T}    ${Xs}    i.${G}    i.year,    vce(cluster    idcode)
    scalar    b1hat_gFE=e(b)[1,1]    //    group-FE estimator
    predict    uhat_gFE_resid,    residual

*    (2-1)    Run unit-level FE regression of T on time dummies and covariates, and get residuals (x_doubledot)
    xtreg    ${T}    ${Xs}    i.year,    fe    vce(cluster idcode)
    predict    x_doubledot,    residual
        
        *    (2-1)    Run group-level FE regression of T on time dummies and covariates, and get residuals (x_singledot)
    reg    ${T}    ${Xs}    i.${G}    i.year,    vce(cluster    idcode)
    predict    x_singledot,    residual



*    (3)    Compute the average of (x_doubledot)^2 (ahat_doubledot) across all i and t
    gen    x_doubledot_sq    =    (x_doubledot)^2
   egen    ahat_doubledot    =    mean(x_doubledot_sq)

* Compute the average of (x_singledot)^2 (ahat_singledot) across all i and t
    gen    x_singledot_sq    =    (x_singledot)^2
   egen    ahat_singledot    =    mean(x_singledot_sq)


*    (4)    Compute q_hat (equation 3.16)

    gen    qhat    =    ((x_doubledot * uhat_iFE_resid) / ahat_doubledot)  ///
                   -    ((x_singledot * uhat_gFE_resid) / ahat_singledot)




*    (5)    Obtain SE(b1hatiFE - b1hatgFE) by regressing qhat on 1 (cosntant), clustering at individual-level.
    gen    vector_1    =    1
    reg    qhat    vector_1, vce(cluster idcode)
    scalar    SE_delta    =    sqrt(e(V)[2,2]) // SE(b1hatiFE - b1hatgFE)

* (6) Compute t-statistic
    scalar t = (b1hat_iFE - b1hat_gFE) / SE_delta
    scalar    list t    //    Show t-statistic computed

Viewing all articles
Browse latest Browse all 73255

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>