Dear stata experts,
I could use some help with the following problem. I am running a multivariate OLS regression with (standardized) test scores as the dependent variable, and a set of continuous and categorical variables as independent variables. For some of the factor variables, I added an extra category for 'missings'. This works fine for most categorical variables, however for the variable mum_age_deliv_cat (maternal age at delivery), this category is omitted in stata output automatically without specification of reason (multicollinearity etc).
Code for multivariate regression is the following:
Code:
regress zks4_GCSE_tot mum_smokes##c.zea1_pgs i.sex ib3.mum_age_deliv_cat zdepression ib3.mum_SES ib3.marital_st_mum ib3.mum_ed_add ib6.cig_change, robust allbaselevels
Linear regression Number of obs = 5,627
F(28, 5598) = 156.00
Prob > F = 0.0000
R-squared = 0.1924
Root MSE = .85361
------------------------------------------------------------------------------------------
| Robust
zks4_GCSE_tot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
mum_smokes |
doesn't smoke | 0 (base)
smokes | -.1532702 .0622202 -2.46 0.014 -.2752459 -.0312945
|
zea1_pgs | .0896178 .0126652 7.08 0.000 .0647892 .1144464
|
mum_smokes#c.zea1_pgs |
doesn't smoke | 0 (base)
smokes | .055319 .0326839 1.69 0.091 -.0087542 .1193922
|
sex |
Male | 0 (base)
Female | .2701045 .0228218 11.84 0.000 .2253649 .3148442
|
mum_age_deliv_cat |
<20 | -.1404631 .0941917 -1.49 0.136 -.3251154 .0441892
20-24 | -.110315 .036715 -3.00 0.003 -.1822907 -.0383393
25-29 | 0 (base)
30-34 | .0396163 .0277931 1.43 0.154 -.0148689 .0941014
35+ | .1217735 .0380242 3.20 0.001 .0472314 .1963156
|
zdepression | -.0516424 .0123182 -4.19 0.000 -.0757908 -.027494
|
mum_SES |
I | .1243631 .0588214 2.11 0.035 .0090503 .2396759
II | .0022687 .0313728 0.07 0.942 -.059234 .0637715
III (non-manual labour) | 0 (base)
III (manual labour) | -.1506617 .049965 -3.02 0.003 -.2486125 -.052711
IV | -.1566356 .0502407 -3.12 0.002 -.2551268 -.0581443
V | -.380365 .1006184 -3.78 0.000 -.5776161 -.1831139
Missing | -.2539358 .0404962 -6.27 0.000 -.3333241 -.1745475
|
marital_st_mum |
Never married | -.1206388 .0385989 -3.13 0.002 -.1963076 -.0449699
Separated | -.1357422 .0572148 -2.37 0.018 -.2479053 -.023579
Ever married | 0 (base)
Missing | -.1130427 .1802733 -0.63 0.531 -.4664482 .2403628
|
mum_ed_add |
CSE / None | -.2884407 .0391122 -7.37 0.000 -.3651157 -.2117657
Vocational | -.1568851 .0447715 -3.50 0.000 -.2446547 -.0691156
O-levels | 0 (base)
A-levels | .1809204 .0312505 5.79 0.000 .1196574 .2421835
Degree | .4228745 .0440691 9.60 0.000 .336482 .5092671
Missing | -.0050217 .0820701 -0.06 0.951 -.165911 .1558676
|
cig_change |
Went off it | -.1052319 .0450578 -2.34 0.020 -.1935626 -.0169012
Cut down | .0007196 .0611025 0.01 0.991 -.1190651 .1205042
Craved more | -.0448434 .2700721 -0.17 0.868 -.5742895 .4846027
Had more | -.4333814 .0764357 -5.67 0.000 -.5832251 -.2835377
NO Change | -.0952739 .0793533 -1.20 0.230 -.250837 .0602893
Never has this | 0 (base)
|
_cons | .1129289 .0281212 4.02 0.000 .0578005 .1680574
------------------------------------------------------------------------------------------
The missing category for mum_age_deliv_cat isn't omitted until I include zdepression or mum_smokes to the regression.
For example:
Code:
regress zks4_GCSE_tot i.mum_age_deliv_cat sex i.marital_st_mum i.mum_ed_add i.mum_SES
Source | SS df MS Number of obs = 11,904
-------------+---------------------------------- F(20, 11883) = 134.48
Model | 2197.07793 20 109.853896 Prob > F = 0.0000
Residual | 9707.18812 11,883 .81689709 R-squared = 0.1846
-------------+---------------------------------- Adj R-squared = 0.1832
Total | 11904.266 11,903 1.00010636 Root MSE = .90382
------------------------------------------------------------------------------------------
zks4_GCSE_tot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
mum_age_deliv_cat |
20-24 | .0309369 .0453468 0.68 0.495 -.0579502 .119824
25-29 | .1836477 .0453645 4.05 0.000 .0947259 .2725695
30-34 | .246943 .04722 5.23 0.000 .1543841 .339502
35+ | .2855106 .052452 5.44 0.000 .182696 .3883251
Missing | .6171147 .064453 9.57 0.000 .4907763 .7434531
|
sex | .2518333 .0165871 15.18 0.000 .2193198 .2843467
|
marital_st_mum |
Separated | -.0312996 .0435235 -0.72 0.472 -.1166129 .0540136
Ever married | .2037728 .0250902 8.12 0.000 .1545919 .2529536
Missing | -.0736996 .0423146 -1.74 0.082 -.1566431 .0092439
|
mum_ed_add |
Vocational | .1721054 .0345848 4.98 0.000 .1043134 .2398973
O-levels | .3589388 .0260158 13.80 0.000 .3079437 .409934
A-levels | .5817595 .030404 19.13 0.000 .5221626 .6413564
Degree | .9064845 .0398891 22.73 0.000 .8282955 .9846736
Missing | .2245926 .0366603 6.13 0.000 .1527325 .2964527
|
mum_SES |
II | -.1239549 .0526468 -2.35 0.019 -.2271513 -.0207585
III (non-manual labour) | -.0957971 .0546101 -1.75 0.079 -.2028419 .0112477
III (manual labour) | -.2680993 .0634493 -4.23 0.000 -.3924703 -.1437283
IV | -.316245 .0617112 -5.12 0.000 -.437209 -.1952809
V | -.420445 .0846138 -4.97 0.000 -.5863018 -.2545882
Missing | -.3964954 .0566912 -6.99 0.000 -.5076195 -.2853714
|
_cons | -.8257235 .0739581 -11.16 0.000 -.9706935 -.6807534
------------------------------------------------------------------------------------------
shows missing category for mum_age_deliv_cat correctly.
I (manually) checked in data browser whether the missings for mum_age_deliv are the same observations as mum_smokes or zdepression, however this is not the case. Also see:
Code:
tab mum_age_deliv_cat
Age of |
mother at |
delivery, |
grouped | Freq. Percent Cum.
------------+-----------------------------------
<20 | 656 4.21 4.21
20-24 | 2,705 17.38 21.59
25-29 | 5,440 34.95 56.54
30-34 | 3,878 24.91 81.46
35+ | 1,397 8.98 90.43
Missing | 1,489 9.57 100.00
------------+-----------------------------------
Total | 15,565 100.00
Code:
tab mum_smokes if mum_age_deliv_cat==6
mother smokes |
any amount of |
cigs during |
pregnancy | Freq. Percent Cum.
--------------+-----------------------------------
doesn't smoke | 312 78.00 78.00
smokes | 88 22.00 100.00
--------------+-----------------------------------
Total | 400 100.00
Code:
tab mum_age_deliv_cat if missing(mum_smokes)
Age of |
mother at |
delivery, |
grouped | Freq. Percent Cum.
------------+-----------------------------------
<20 | 177 6.20 6.20
20-24 | 510 17.85 24.05
25-29 | 572 20.02 44.07
30-34 | 350 12.25 56.32
35+ | 159 5.57 61.88
Missing | 1,089 38.12 100.00
------------+-----------------------------------
Total | 2,857 100.00
Finally, when I try to run the regression with the missing category set as the baselevel, this is the response I get:
Code:
. regress zks4_GCSE_tot mum_smokes ib6.mum_age_deliv_cat
note: 5.mum_age_deliv_cat omitted because of collinearity
note: 6b.mum_age_deliv_cat identifies no observations in the sample
Source | SS df MS Number of obs = 9,936
-------------+---------------------------------- F(5, 9930) = 161.76
Model | 729.671972 5 145.934394 Prob > F = 0.0000
Residual | 8958.40157 9,930 .902155244 R-squared = 0.0753
-------------+---------------------------------- Adj R-squared = 0.0749
Total | 9688.07354 9,935 .975145802 Root MSE = .94982
-----------------------------------------------------------------------------------
zks4_GCSE_tot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------+----------------------------------------------------------------
mum_smokes | -.4360266 .0240728 -18.11 0.000 -.4832141 -.3888391
|
mum_age_deliv_cat |
<20 | -.6483968 .0581204 -11.16 0.000 -.7623246 -.534469
20-24 | -.466057 .0384945 -12.11 0.000 -.5415141 -.3905999
25-29 | -.2004997 .0344241 -5.82 0.000 -.267978 -.1330215
30-34 | -.055606 .0358554 -1.55 0.121 -.1258899 .0146779
35+ | 0 (omitted)
Missing | 0 (empty)
|
_cons | .3428121 .0312033 10.99 0.000 .2816473 .4039768
-----------------------------------------------------------------------------------
I am at a loss as to why this happens, and it now states that there are no observations in the sample. Hope someone can help me!
PS: This is my first post, so I hope I formatted everything the right way. Apologies upfront if not!
Kind regards,
Wouter