Hello
I am doing an analysis of the determinants of census tract unemployment rates. Some of the previous literature on my topic has used straight OLS regression, and I started with this type of analysis, but it seems to me after my own further reading that a Generalized Linear Model is better. This is especially because I am interested in presenting predicted values for the census tracts' unemployment rates based on my regression and I would like these to be appropriately bounded. My unemployment rates include 0s for some census tracts so I would need to take this into account.
My questions are:
1) whether -fracreg logit- is equivalent to -glm- with a logit link and binomial family? (I have read about using the -glm- version in a few places including here but see that fracreg is a new-ish command which seems to serve the same purpose). Can I specify an equivalent to the -robust- option when using -fracreg logit-?
2) if using -fracreg-, on what basis should I decide to use a fractional probit (-fracreg probit-) or fractional logit (-fracreg logit-) regression?
3) a simply (probably ignorant) question of interpretation: I see that the -fracreg- and -glm- regressions mentioned above don't report an R-squared value. Is there an equivalent measure for these regressions I can calculate? My OLS R-squared values have been reasonably high and this has been a point of reassurance for me, so I'd like to see how these models compare (though I know R-squared isn't everything!).
4) if using these models are there any additional restrictions or assumptions (such as additional assumptions beyond the BLUE of OLS) that I should keep in mind? With my OLS regressions I have taken the natural log of unemployment rates (makes my residuals more normal, higher R-squared, and convenient interpretation). Could I do the same with the -fracreg or -glm- regressions above?
It's been a while since I formally studied limited dependent variables so please excuse my ignorance on these issues. Thank you very much for any help! This is my first post in this forum but I have always found it to be a very useful place for information on Stata and information on statistical analysis generally.
Regards
Oliver Kendrick
I am doing an analysis of the determinants of census tract unemployment rates. Some of the previous literature on my topic has used straight OLS regression, and I started with this type of analysis, but it seems to me after my own further reading that a Generalized Linear Model is better. This is especially because I am interested in presenting predicted values for the census tracts' unemployment rates based on my regression and I would like these to be appropriately bounded. My unemployment rates include 0s for some census tracts so I would need to take this into account.
My questions are:
1) whether -fracreg logit- is equivalent to -glm- with a logit link and binomial family? (I have read about using the -glm- version in a few places including here but see that fracreg is a new-ish command which seems to serve the same purpose). Can I specify an equivalent to the -robust- option when using -fracreg logit-?
2) if using -fracreg-, on what basis should I decide to use a fractional probit (-fracreg probit-) or fractional logit (-fracreg logit-) regression?
3) a simply (probably ignorant) question of interpretation: I see that the -fracreg- and -glm- regressions mentioned above don't report an R-squared value. Is there an equivalent measure for these regressions I can calculate? My OLS R-squared values have been reasonably high and this has been a point of reassurance for me, so I'd like to see how these models compare (though I know R-squared isn't everything!).
4) if using these models are there any additional restrictions or assumptions (such as additional assumptions beyond the BLUE of OLS) that I should keep in mind? With my OLS regressions I have taken the natural log of unemployment rates (makes my residuals more normal, higher R-squared, and convenient interpretation). Could I do the same with the -fracreg or -glm- regressions above?
It's been a while since I formally studied limited dependent variables so please excuse my ignorance on these issues. Thank you very much for any help! This is my first post in this forum but I have always found it to be a very useful place for information on Stata and information on statistical analysis generally.
Regards
Oliver Kendrick