Determining the level of fixed effects - appying Papke and Wooldridge (2023) method

Hi,

I am writing this post to ask for your help in determining the level of fixed effect to be controlled suggested in Papke and Wooldridge (2023) - A simple, robust test for choosing the level of fixed effects in linear panel data models (link)
(I am studying the effect of village-level treatment on household-level outcomes, so trying to figure out whether I should include household- or village-FE)

I was able to follow the procedure using Stata, but have two questions to check the procedure and final steps.
I post this question here as it seems many of Statalist users seem to be familiar with this method; I saw some users referred to this paper as their answers to other questions.

In this post I use Stata's NLSY data as an example. Suppose I estimate the effect of hours worked on ln(wage) as below, with a vector of controls including time-varying (age and weeks_worked) and time-invarying (race) at individual-level.
lnwage_it = b0 + b1*hours_it + b2*X_it + (Fixed Effects)
Denote b1hat_iFE and b1hat_gFE as the estimates of b1 under unit- and group- FE.

My goal is to test whether b1 (coefficient on hours_it) is robust to the choice of fixed effect; individual-level or group-level (industry-level in this example).

Here's the procedure suggested in the paper (procedure 3.2 in Chapter 3.3 "Testing a single coefficient")

Step 1: Run unit-FE regression with time dummies and controls, and obtain the residuals. Repeat it with group-FE.

Step 2. Run unit-FE regression of the variable of interest (hours_it in this example) on time dummies and controls, and obtain the residuals. Repeat it with group-FE.

Step 3. Compute the average of (unit-FE residuals)² (from step 2) across i and t, and the average of (group-FE residuals)² (from step 2) across i and t.

Step 4. Construct q_hat, the difference in {(residuals from step 2) * (residuals from step 1) / (the average from step 3)} between unit-FE model and group-FE model (equation 3.16 in the paper)

Step 5. Obtain SE(b1hat_iFE - b1hat_gFE), the standard error of (b1hat_iFE - b1hat_gFE), from regressing q_hat (from step 4) on the constant value 1, probably clustering at the group-level or at least at the individual-level. The single estimated coefficient will be identically zero.

Then the paper wrote we can use a t statistic version of the Hausman test, obtained as (b1hat_iFE - b1hat_gFE) / SE(b1hat_iFE - b1hat_gFE), to test whether we can use individual-FE or group-FE.

I would like to ask two questions about this procedure.

1. In step 5, how do I obtain SE(b1hat_iFE - b1hat_gFE) from regressing q_hat on 1? Is it the standard error of the coefficient on 1? I assume it is, based on what authors wrote in the previous section ("...and the cluster-robust variance-covariance matrix will be V1_hat"), but want to double-check this.

2. Once I computed the final t-statistic, can I just interpret it as regular t-statistic reported in regression? For example, if my t-statistic is greater than 1.96, I can reject the null hypothesis that unit_FE estimator and group-FE estimator are the same at p=0.05, and stick to unit-FE estimator?

Any comments are appreciated it.

Here's the code I used to replicate the procedure.
In this example my t-statistic was 0.15, failing to reject null hypothesis. It was consistent with the fact that unit-FE estimator and group-FE estimator were nearly the same.

Code:

use https://www.stata-press.com/data/r18/nlswork, clear
lobal    Y    ln_wage
global   T    hours
global   G    ind_code // industry-identifier
global   Xs    age race wks_work

*    (0)    Make balanced panel data by keeping balanced individuals only (not strongly required, but for convenience)
        
*    Keep only observations which all variables in the regression are non-missing.
egen    num_missing    =    rowmiss(${Y}    ${T}    ${G}    ${Xs})
keep    if    num_missing==0
        
*    Keep only individuals surveyed across all years
keep    if    inrange(year,68,73)    //    Keep it shorter to make a larger sample of balanced data
bys    idcode:    egen    num_surveyed    =    count(ln_wage)
keep    if    num_surveyed==6



*    (1-1)    Run individual-FE model with time dummies, and get residuals (SE clustered at individual-level)
    xtset    idcode year // idcode is unit identifier
    xtreg    ${Y} ${T}    ${Xs}    i.year, fe vce(cluster    idcode)
    scalar    b1hat_iFE=e(b)[1,1]    //    indivdiual-FE estimator
    predict    uhat_iFE_resid,    residual    //    residuals
    
    *    (1-2)    Run industry-FE model with time dummies, and get residuals    (SE clustered at individual-level)
    reg    ${Y}    ${T}    ${Xs}    i.${G}    i.year,    vce(cluster    idcode)
    scalar    b1hat_gFE=e(b)[1,1]    //    group-FE estimator
    predict    uhat_gFE_resid,    residual

*    (2-1)    Run unit-level FE regression of T on time dummies and covariates, and get residuals (x_doubledot)
    xtreg    ${T}    ${Xs}    i.year,    fe    vce(cluster idcode)
    predict    x_doubledot,    residual
        
        *    (2-1)    Run group-level FE regression of T on time dummies and covariates, and get residuals (x_singledot)
    reg    ${T}    ${Xs}    i.${G}    i.year,    vce(cluster    idcode)
    predict    x_singledot,    residual



*    (3)    Compute the average of (x_doubledot)^2 (ahat_doubledot) across all i and t
    gen    x_doubledot_sq    =    (x_doubledot)^2
   egen    ahat_doubledot    =    mean(x_doubledot_sq)

* Compute the average of (x_singledot)^2 (ahat_singledot) across all i and t
    gen    x_singledot_sq    =    (x_singledot)^2
   egen    ahat_singledot    =    mean(x_singledot_sq)


*    (4)    Compute q_hat (equation 3.16)

    gen    qhat    =    ((x_doubledot * uhat_iFE_resid) / ahat_doubledot)  ///
                   -    ((x_singledot * uhat_gFE_resid) / ahat_singledot)




*    (5)    Obtain SE(b1hat_iFE - b1hat_gFE) by regressing qhat on 1 (cosntant), clustering at individual-level.
    gen    vector_1    =    1
    reg    qhat    vector_1, vce(cluster idcode)
    scalar    SE_delta    =    sqrt(e(V)[2,2]) // SE(b1hat_iFE - b1hat_gFE)

* (6) Compute t-statistic
    scalar t = (b1hat_iFE - b1hat_gFE) / SE_delta
    scalar    list t    //    Show t-statistic computed

Determining the level of fixed effects - appying Papke and Wooldridge (2023) method

Trending Articles

Gabriela Bee & Powfu – Blue – Single [iTunes Plus M4A]

David Perell - Write of Passage 2025

Shatta Wale – You Shock Me (Prod. by Willis Beatz)

99 God Status for Whatsapp, Facebook

[GET] Fabian Markl – AI Automations and Agent Templates

The 10 Tennessee Cities With The Largest Black Population For 2021

In the context of Data Services an unknown internal server error occured

Killer Peter Stark granted pass to leave prison

Mp3 Download: Mdu - Mazola

Woman's body found on Lincolnshire beach

Three walk

Black Angus Grilled Artichokes

Sarah Samis, Emil Bove III

Trailer Park Boys Jail S01-S02 1080p NF WEB-DL H264-FLUX

[Download MP3] Iyzeal Feat. Okpo Records –“Ekaette Ibak”

Oracle GoldenGate 12c New Features: Trail Encryption and Credentials with...

2014 kambi phone calls recording Mp3 Audio clips surabila yamangal

Lauren Alaina – All My Exes (feat. Chase Matthew) – Single [iTunes Plus M4A]

Download: Rich Bizzy -Panono Ukwenda (Cover)

ATTACHED: Here Is The Confession From Jacob Juma’s Alleged Assasin