I want to do a seemingly unrelated regression using nbreg as my family, since I have a long, narrow panel of count data that are overdispersed. My research indicates that I should use the gsem framework. The challenge is that my data are in a long format, with a structure as below:
One way to use gsem on this data is to use reshape to put it into wide format:
Then I can run a gsem regression as follows*:
gsem (noOfSwimmersAtLocation1 noOfSwimmersAtLocation2 <- hoursSunshine waveHeight), nbreg
I have got this to work, but the command gets pretty ugly when I have 20 different locations and three variables that are different for each location, as does the reshape command I have to use.
I would like to know whether there is a way to use the original long format data to perform the same regression. I feel it should be something like:
gsem (i.locationID#c.swimmers <- hoursSunshine waveHeight)
or
gsem (swimmers <- hoursSunshine waveHeight by(locationID)) (note how I optimistically put the made up "by" command inside the brackets, so that it's not doing separate regressions but doing them all in the same seemingly unrelated regression)
Is there a way of doing the seemingly unrelated regression, or do I have to just carefully reshape my data and learn to deal with 100s of variables?
*Note that I'm aware that as it stands I actually have a multivariate regression rather than a seemingly unrelated regression, since the explanatory variables are the same for all dependent variables, but I do have some explanatory variables that vary by location.
date | locationID | NoOfSwimmers | HoursSunshine | WaveHeight |
4/5 | 1 | 44 | 4 | 1.2 |
5/5 | 1 | 34 | 6 | 1.1 |
6/5 | 1 | 32 | ||
4/5 | 1 | 44 | ||
5/5 | 2 | 12 |
One way to use gsem on this data is to use reshape to put it into wide format:
date | locationID | noOfSwimmersAtLocation1 | noOfSwimmersAtLocation2 | hoursSunshine | waveHeight |
4/5 | 1 | 44 | 12 | 4 | 1.2 |
5/5 | 1 | 34 | 23 | 6 | 1.1 |
6/5 | 1 | 32 | |||
4/5 | 1 |
Then I can run a gsem regression as follows*:
gsem (noOfSwimmersAtLocation1 noOfSwimmersAtLocation2 <- hoursSunshine waveHeight), nbreg
I have got this to work, but the command gets pretty ugly when I have 20 different locations and three variables that are different for each location, as does the reshape command I have to use.
I would like to know whether there is a way to use the original long format data to perform the same regression. I feel it should be something like:
gsem (i.locationID#c.swimmers <- hoursSunshine waveHeight)
or
gsem (swimmers <- hoursSunshine waveHeight by(locationID)) (note how I optimistically put the made up "by" command inside the brackets, so that it's not doing separate regressions but doing them all in the same seemingly unrelated regression)
Is there a way of doing the seemingly unrelated regression, or do I have to just carefully reshape my data and learn to deal with 100s of variables?
*Note that I'm aware that as it stands I actually have a multivariate regression rather than a seemingly unrelated regression, since the explanatory variables are the same for all dependent variables, but I do have some explanatory variables that vary by location.