Hi - I've been reading this forum for years, but this is my first time posting, so forgive me in advance if I screw up the protocol. I have a question about the way predict works after mlogit. (I am using mlogit right now, but I suppose this could extend to the way predict works after any regression.)
I am using a dataset, which I'll call dataset1.dta, to run a multinomial logit with a DV, yvar, that has three categories: a, b, c. The independent variables are x1 and x2. I estimate the models and then I close the dataset, open a new dataset (which I will call dataset2.data), and use the stored slopes to predict outcomes among cases in dataset2.dta. I have pasted my code below, using generic variable names, so you can see what I'm doing.
Here is my question: It just so happens that in dataset2, there is no variation on one of the independent variables (x2). In all cases, x2 = UP. I had assumed this would mean that when I estimate the predictions using the input data from dataset2, all of the cases would be treated as "UP" on x2 (and would be assigned the corresponding slope). Instead, from what I can tell, Stata appears to be dropping x2 from the model. I realized this because I wanted to see what would happen if I made all cases "DOWN" instead of up. When I made x2 = DOWN, the predictions were exactly the same. (Likewise when I made x2 = AVERAGE.) The only time I can get the predictions to change is when I introduce some variation on the x2 variable. I didn't realize that Stata worked this way until now. It makes sense to me that Stata would drop x2 from the estimation model (i.e., when I use mlogit) if there is no variation on x2. But it doesn't make as much sense to me that it would drop it from the predictions.
Can anyone confirm that this is what Stata is doing?! And if the answer is yes, can anyone explain why x2 would be dropped from the model instead of treated as a constant (e.g., x2=UP)?
Many thanks!
Rebecca
***
use dataset1.dta, clear
gen x1 = "AVERAGE"
sum madeupvar1 madeupvar1<1000
replace x1 = "HIGH" if madeupvar1>= r(mean) + r(sd)
replace x1 = "LOW" if madeupvar1<= r(mean) - r(sd)
gen x2 = "AVERAGE"
sum madeupvar2 madeupvar2<10000
replace x2 = "HIGH" if madeupvar2>= r(mean) + r(sd)
replace x2 = "LOW" if madeupvar2<= r(mean) - r(sd)
encode x1, gen(xvar1)
encode x2, gen(xvar2)
mlogit yvar ib1.xvar1 ib1.xvar2 //running mlogit; slopes are saved and used to predict outcomes among cases in dataset2.dta
use dataset2.data, clear
gen x1 = "AVERAGE"
sum madeupvar1 madeupvar1<1000
replace x1 = "HIGH" if madeupvar1>= r(mean) + r(sd)
replace x1 = "LOW" if madeupvar1<= r(mean) - r(sd)
gen x2 = "AVERAGE"
sum madeupvar2 madeupvar2<10000
replace x2 = "HIGH" if madeupvar2>= r(mean) + r(sd)
replace x2 = "LOW" if madeupvar2<= r(mean) - r(sd)
predict yhat_b, outcome(b)
predict yhat_c, outcome(c)
***
I am using a dataset, which I'll call dataset1.dta, to run a multinomial logit with a DV, yvar, that has three categories: a, b, c. The independent variables are x1 and x2. I estimate the models and then I close the dataset, open a new dataset (which I will call dataset2.data), and use the stored slopes to predict outcomes among cases in dataset2.dta. I have pasted my code below, using generic variable names, so you can see what I'm doing.
Here is my question: It just so happens that in dataset2, there is no variation on one of the independent variables (x2). In all cases, x2 = UP. I had assumed this would mean that when I estimate the predictions using the input data from dataset2, all of the cases would be treated as "UP" on x2 (and would be assigned the corresponding slope). Instead, from what I can tell, Stata appears to be dropping x2 from the model. I realized this because I wanted to see what would happen if I made all cases "DOWN" instead of up. When I made x2 = DOWN, the predictions were exactly the same. (Likewise when I made x2 = AVERAGE.) The only time I can get the predictions to change is when I introduce some variation on the x2 variable. I didn't realize that Stata worked this way until now. It makes sense to me that Stata would drop x2 from the estimation model (i.e., when I use mlogit) if there is no variation on x2. But it doesn't make as much sense to me that it would drop it from the predictions.
Can anyone confirm that this is what Stata is doing?! And if the answer is yes, can anyone explain why x2 would be dropped from the model instead of treated as a constant (e.g., x2=UP)?
Many thanks!
Rebecca
***
use dataset1.dta, clear
gen x1 = "AVERAGE"
sum madeupvar1 madeupvar1<1000
replace x1 = "HIGH" if madeupvar1>= r(mean) + r(sd)
replace x1 = "LOW" if madeupvar1<= r(mean) - r(sd)
gen x2 = "AVERAGE"
sum madeupvar2 madeupvar2<10000
replace x2 = "HIGH" if madeupvar2>= r(mean) + r(sd)
replace x2 = "LOW" if madeupvar2<= r(mean) - r(sd)
encode x1, gen(xvar1)
encode x2, gen(xvar2)
mlogit yvar ib1.xvar1 ib1.xvar2 //running mlogit; slopes are saved and used to predict outcomes among cases in dataset2.dta
use dataset2.data, clear
gen x1 = "AVERAGE"
sum madeupvar1 madeupvar1<1000
replace x1 = "HIGH" if madeupvar1>= r(mean) + r(sd)
replace x1 = "LOW" if madeupvar1<= r(mean) - r(sd)
gen x2 = "AVERAGE"
sum madeupvar2 madeupvar2<10000
replace x2 = "HIGH" if madeupvar2>= r(mean) + r(sd)
replace x2 = "LOW" if madeupvar2<= r(mean) - r(sd)
predict yhat_b, outcome(b)
predict yhat_c, outcome(c)
***