Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 73262

xtreg/mixed standard deviation

$
0
0
I'm doing some analysis on school value added where I estimate a regression of the form

yit = a + xit B + vi + eit

I'm specifically interested in the distribution of the unit-specific error term vi. I have two questions when using the xtreg, fe or mixed command to estimate these terms, as well as a simulation that shows what I'm talking about.
  1. Both the xtreg and mixed commands report standard deviations of the unit-specific error term vi ("sigma_u" for xtreg and "sd(_cons)" for mixed) and the "usual" error term eit ("sigma_e" for xtreg and "sd(Residual)" for mixed). These values are different than if you use the predict postestimation command to estimate vi and eit and then use summarize to calculate the standard deviation. In my simulation the values are very close but not exactly the same (in the return list "reported_sd_fe_u" should be equal to "calculated_sd_fe_u" and the same for the other "reported" and "calculated" values), but with real data I have seen larger discrepancies. Why are these values different?
  2. Based on my simulation, the standard deviation of the usual error term eit should be 0.65, yet all reported and calculated values of the standard deviation of the usual error term are 0.85 which is the standard deviation of (vi + eit). Are the reported values or notation incorrect?
    1. Based on the xtreg manual, e(sigma e) should return the standard deviation of eit, but it appears it actually reports the standard deviation of (vi + eit).
    2. Based on the xtreg manual, the predict, e postestimation command should calculate the overall error component eit, but it appears it actually calculates (vi + eit).
    3. Based on the mixed manual, the matrix e(b) should contain ln(standard deviation of eit), but it appears it actually contains ln(standard deviation of (vi + eit)).
    4. Based on the mixed manual, the predict, residuals command should calculate responses minus fitted values and take into account random effects from all levels in the model, which is equal to eit, but it appears that it actually calculates (vi + eit).
Code:
version 14.1
cap log close
clear all
set more off
set varabbrev off
set graphics off

***************
* Description *
***************
/*
This file runs simulations of value added to compute the standard deviation.
*/

*****************
* Begin Do File *
*****************
******** Create Program
cap program drop sd_simulation
program sd_simulation, rclass
    args n_years n_schools min_school_size max_school_size sd_school_value_added sd_student_ability sd_residual
    drop _all
    tempname n_observations
    scalar `n_observations' = `n_schools' * `n_years'
    
    
    **** Data Generating Process
    * Set number of observations
    set obs `n_schools'
    * Generate school IDs
    gen school_id = _n
    * Generate average number of students per year for each school
    gen mean_school_cohort_size = runiformint(`min_school_size', `max_school_size')
    * Generate value added
    gen school_value_added = rnormal(0, `sd_school_value_added')
    * Expand dataset by number of years
    expand `n_years'
    * Generate years
    bysort school_id: gen year = _n
    * Generate number of students in each year (within 10% of school average)
    gen school_cohort_size = mean_school_cohort_size ///
        + round(0.1 * runiform(-1, 1) * mean_school_cohort_size)
    * Expand dataset by number of students
    expand school_cohort_size
    * Generate student IDs
    bysort school_id year: gen local_student_id = _n
    egen student_id = group(school_id year local_student_id)
    * Generate student abilities
    gen student_ability = rnormal(0, `sd_student_ability')
    * Generate test scores
    gen test_z_score_1 = student_ability + rnormal(0, `sd_residual')
    gen test_z_score_2 = student_ability + rnormal(0, `sd_residual')
    gen test_z_score_3 = student_ability + rnormal(0, `sd_residual')
    gen test_z_score_4 = student_ability + rnormal(0, `sd_residual')
    gen test_z_score_5 = student_ability + rnormal(0, `sd_residual')
    gen test_z_score_6 = student_ability + rnormal(0, `sd_residual')
    gen test_z_score_7 = student_ability + rnormal(0, `sd_residual')
    gen test_z_score_8 = student_ability + rnormal(0, `sd_residual')
    gen test_z_score_9 = student_ability + school_value_added + rnormal(0, `sd_residual')
    gen test_z_score_10 = student_ability + school_value_added + rnormal(0, `sd_residual')
    gen test_z_score_11 = student_ability + school_value_added + rnormal(0, `sd_residual')
    gen test_z_score_12 = student_ability + school_value_added + rnormal(0, `sd_residual')
    
    
    
    
    **** Regression Analysis
    * Fixed effects
    xtreg test_z_score_11 ///
        test_z_score_8 ///
        i.year ///
        , i(school_id) fe
    tempname reported_sd_fe_u reported_sd_fe_e
    scalar `reported_sd_fe_u' = e(sigma_u)
    scalar `reported_sd_fe_e' = e(sigma_e)
    predict fe_u, u
    predict fe_e, e
    summ fe_u
    tempname calculated_mean_fe_u calculated_sd_fe_u
    scalar `calculated_mean_fe_u' = r(mean)
    scalar `calculated_sd_fe_u' = r(sd)
    summ fe_e
    tempname calculated_mean_fe_e calculated_sd_fe_e
    scalar `calculated_mean_fe_e' = r(mean)
    scalar `calculated_sd_fe_e' = r(sd)
    
    * Bayesian shrinkage estimates
    mixed test_z_score_11 ///
        test_z_score_8 ///
        i.year ///
        || school_id: ///
        , stddeviations
    matrix b = e(b)
    local id_var_col = colnumb(b, "lns1_1_1:_cons")
    local res_var_col = colnumb(b, "lnsig_e:_cons")
    tempname reported_sd_bse_u reported_sd_bse_e
    scalar `reported_sd_bse_u' = exp(b[1, `id_var_col'])
    scalar `reported_sd_bse_e' = exp(b[1, `res_var_col'])
    predict bse_u, reffects
    predict bse_e, residuals
    summ bse_u
    tempname calculated_mean_bse_u calculated_sd_bse_u
    scalar `calculated_mean_bse_u' = r(mean)
    scalar `calculated_sd_bse_u' = r(sd)
    summ bse_e
    tempname calculated_mean_bse_e calculated_sd_bse_e
    scalar `calculated_mean_bse_e' = r(mean)
    scalar `calculated_sd_bse_e' = r(sd)
    
    
    
    
    **** Return Values
    return scalar sd_value_added = `sd_school_value_added'
    return scalar reported_sd_fe_u = `reported_sd_fe_u'
    return scalar calculated_sd_fe_u = `calculated_sd_fe_u'
    return scalar reported_sd_bse_u = `reported_sd_bse_u'
    return scalar calculated_sd_bse_u = `calculated_sd_bse_u'

    return scalar sd_residual = `sd_residual'
    return scalar reported_sd_fe_e = `reported_sd_fe_e'
    return scalar calculated_sd_fe_e = `calculated_sd_fe_e'
    return scalar reported_sd_bse_e = `reported_sd_bse_e'
    return scalar calculated_sd_bse_e = `calculated_sd_bse_e'
    
    return scalar mean_value_added = 0
    return scalar calculated_mean_fe_u = `calculated_mean_fe_u'
    return scalar calculated_mean_bse_u = `calculated_mean_bse_u'
    
    return scalar mean_residual = 0
    return scalar calculated_mean_fe_e = `calculated_mean_fe_e'
    return scalar calculated_mean_bse_e = `calculated_mean_bse_e'
end


******** Run Program
**** 10 years of data, 150 schools
set seed 1984
sd_simulation 10 150 10 100 0.2 1 0.65
return list

Viewing all articles
Browse latest Browse all 73262

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>