fixed effect and cluster on the same variable

November 22, 2015, 8:00 pm

≫ Next: Creating mutiple observations from a single observation

≪ Previous: nested regressions -does the order matter?

My data are unbalanced panel with large N and small T (6 years).

I plan to run a regression like this (cluster2 is written by Mitchell Petersen http://www.kellogg.northwestern.edu/...rogramming.htm)

cluster2 Y X1 X2 X3 industry_dummies year_dummies, fcluster(firm_id) tcluster(year_id)

The reason for including year fixed effects is that there could be some exogenous shock (such as macroeconomic shock) in some years that affect all firms in the same year.
The reason for clustered standard errors by year is that I think the errors for firms within the same year could be correlated due to some unobserved or unknown factors.

Stata allows me to run it, but my question is whether it is theoretically correct to include year_dummies (for year fixed effects) and then cluster the standard errors by year again?

Thank you.

↧

Creating mutiple observations from a single observation

November 23, 2015, 9:27 am

≫ Next: Model Comparison after Mixed with REML

≪ Previous: fixed effect and cluster on the same variable

Hi there
I have a relatively simple problem, or at least i think so, but after two hours of search, i must admit that i still can't solve it.
What I have is a dataset looking like:
id week_start week_end year x1
1 1 52 1998 0
1 53 79 1999 1
1 78 104 1999 1
2 . . .
And so on for the all my id. So I have that each observation have som characterisk for some interval of time (week_end-week_start).
What I want is to "split each observation by into monthly interval:
id week_start week_end year x1
1 1 4 1998 0
1 5 8 1998 0
1 9 12 1998 0
1 13 16 1998 0
1 17 20 1998 0
1 21 24 1998 0
1 25 28 1998 0
1 29 32 1998 0
1 33 36 1998 0
1 37 41 1998 0
1 42 46 1998 0
1 47 52 1998 0
1 53 56 1999 1

And so on...
I do not have a clue, what i am going to do. It could be really nice, if someone could help me.

Thanks

↧

Model Comparison after Mixed with REML

November 23, 2015, 11:42 am

≫ Next: grouping question

≪ Previous: Creating mutiple observations from a single observation

I have a similar problem as is described (but not answered) here: http://www.stata.com/statalist/archi.../msg00435.html

I have two nested linear mixed models that I would like to compare in terms of goodness of fit. Because of a small sample size I have been using REML.

I noticed that lrtest is not working after mixed with REML. It gives me the following error:

REML criterion is not comparable under different fixed-effects specifications

I found this page online describing why it is nonsense to compare models that were fitted using REML with a likelihood-based methods: http://stats.stackexchange.com/quest...ed-effects-but

My questions is: is there a way of comparing the two models?

Currently, my plan is to compare the models using ML fit and use REML for the final model (as was suggested here: http://stats.stackexchange.com/quest...ed-effects-but).

↧

grouping question

November 23, 2015, 1:10 pm

≫ Next: Multiple variable outputs into one variable

≪ Previous: Model Comparison after Mixed with REML

I have a question about 'group', considering other specific variable's observation.

Dataset is here.
order name father mother reas_1 note_reas_1 reas_2 note_reas_2 reas_3 note_reas_3 reas_4 note_reas_4
head john charles mary firstmarriage jessica secondmarriage amy thirdmarriage jennifer death 1999
wife jessica michael emily death 1985
wife amy david sarah
wife jennifer jacob ashley
first son danial john jessica
second son tyler john jessica
third son andrew john amy
first daughter amanda john jennifer
head XXX
wife YYY
first son ZZZ
...more

I want to give 'family_id' to each household, following father and mother.
At the first time, I didn't consider remarriage, then I tried below command.

Code:

egen family_id = group(father mother)

Then, third son and first daugher who has another natural mother, didn't receive same 'family_id'

How can I give same family_id to third son and first daughter's case, considering stepmother?

Code:

gen remarr = .
gen wife1 = .
gen wife2 = .
tostring wife1, replace
tostring wife2, replace
forvalue i=1/8 {
replace remarr = 1 if regexm(reas_`i', "secondmarriage") & fam_member == "head"
replace wife1 = substr(note_reas_`i',1,6) if remarr == 1 & regexm(reas_`i', "secondmarriage")
replace wife2 = substr(note_reas_`i',1,6) if remarr == 1 & regexm(reas_`i', "thirdmarriage")
}

* I use substr(note_reas_`i',1,6). because, originally all the names are 2bytes korean charset.

uhhhm... I have no idea what to do next.

↧

Multiple variable outputs into one variable

November 23, 2015, 1:14 pm

≫ Next: Connecting to SQL Server via ODBC: Able to see table and fields, but then error SQLSTATE=42S02 / r(682)

≪ Previous: grouping question

Hi all,
I am trying to figure out how to create a variable that gives me one table with the results from three separate variables.
I made the table using tabm (SSC) listing the three separate variables, but was hoping there's a way I could just create one variable "cost" that shows the results of all three when I put "tab cost".

This is what I have (with made up variables/frequencies)
Variable X: Frequency = 5
Variable Y: Frequency = 10
Variable Z: Frequency = 20

X, Y, & Z are dummy variables with the 0 values marked as missing so that I just see the count of those who chose those specific (X,Y, Z) responses.

What I need is to create one variable with all of these results in a table (if possible).

Something that looks like this:

Cost Frequency
X 5
Y 10
Z 20

Is this possible? Thanks!

↧

Connecting to SQL Server via ODBC: Able to see table and fields, but then error SQLSTATE=42S02 / r(682)

November 23, 2015, 1:17 pm

≫ Next: Adding rows with elements 0 conditional on if statements

≪ Previous: Multiple variable outputs into one variable

Hello,

I'm attempting to connect Stata to a SQL Server database. I can query the table in SQL Server Management Studio and in Tableau, so I don't believe it's a permissions issue.

I've tried:

odbc load, exec("SELECT * FROM student_metrics_c;") dsn("ODS_PROD")

and
odbc load, exec("SELECT * FROM student_metrics_c;") dsn("ODS_PROD") dialog(prompt), then re-indicating the directory in the dialog

as well as going through stepwise:

odbc query "ODS_PROD"
odbc desc "student_metrics_c"
odbc load, exec("SELECT * FROM student_metrics_c;") dsn("ODS_PROD")

This last approach shows me all the available tables within ODS_PROD, then all the available fields on my table (student_metrics_c), but then gives the same error as the others.

With each, I get the error:

The ODBC driver reported the following diagnostics
[Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'student_metrics_c'.
SQLSTATE=42S02
r(682);

At this point, I'm out of ideas. Any help would be very, very much appreciated!

Vanessa

↧

Adding rows with elements 0 conditional on if statements

November 23, 2015, 1:37 pm

≫ Next: verifying if one variable has always the same value for the same id_variable

≪ Previous: Connecting to SQL Server via ODBC: Able to see table and fields, but then error SQLSTATE=42S02 / r(682)

I have a question regarding inserting observations in my dataset. My dataset currently looks like:

Year X Y Z

2013 1 2 z(1,2)13
2014 1 2 z(1,2)14
2013 1 3 z(1,3)13
2014 1 3 z(1,3)14
2014 1 4 z(1,4)14
2013 1 5 z(1,5)12
2014 1 5 z(1,5)14

Here, think of X and Y as different individuals. The way to read this is that individuals X and Y are linked through a common value that they share Z(x,y) in year (13) or (14). Hence, Z(1,2)13 reads the value of variable Z for individuals 1 and 2 in the year 13. I want to create a new variable Z' such that it is the first difference of the variable Z by individual pairing over the 2 years. However, my problem is as follows. As can be seen in the example dataset, individuals 1,4 only have observations for one time period.

I want to ideally create a row of zeros whenever this happens.
A complications is that when I take the difference, I want it to take the (value -0) if the missing year is 2013 but (0-value) if the missing year is 2014. I do not know how to implement this in the dataset. I have tried numerous things in vain.

I guess if I were able to declare my data as panel, first difference operations may be easier to recognize. However, given that my data is not really longitudinal in the conventional sense, Stata responds with an error message:

repeated time values within panel
r(451);

Any help is greatly appreciated.
Thanks!

↧

verifying if one variable has always the same value for the same id_variable

November 23, 2015, 1:44 pm

≫ Next: Manually having a Variable input give an output

≪ Previous: Adding rows with elements 0 conditional on if statements

I have a dataset with two main variables: procedures codes (cod_proc) and the a indicador of type of procedure (type_proc). I verified that I had duplicate lines for procedure code. I would like to know if the variable for type of procedure has always the same value for the same procedure code. How could I do so with:

Code:

by cod_proc:

?
Both variables are numeric ones.

↧

Manually having a Variable input give an output

November 23, 2015, 2:00 pm

≫ Next: empirical bayes estimator or shrinkage

≪ Previous: verifying if one variable has always the same value for the same id_variable

Hi,

I am trying to make a variable of years to number of attacks for a paper assignment. But I don't know how to do it.
attack_date number times total_killed
2007 8 153
2008 7 58
2009 1 2
2010 2 5
2011 3 20
2012 2 1

I don't know how to manually make it so that if I type in attack_date it will spit out number times in their respective orders.
Please and thank you

I been struggling with this for a while

↧

empirical bayes estimator or shrinkage

November 23, 2015, 2:29 pm

≫ Next: Problem with [P] syntax

≪ Previous: Manually having a Variable input give an output

Hi,
I wonder if it is already built into Stata how group-level averages (or leave-out means) are often adjusted for more noise in smaller groups. Is such a correction something easily available from -mixed-, or even simpler "multilevel model" tools?

E.g. one can have data on judge-level leniency (share of cases with positive binary outcome) but shrink it towards the population means. How do I get the adjusted (leave-out) means?

I am grateful for any pointers, thanks in advance!

↧

Problem with [P] syntax

November 23, 2015, 2:55 pm

≫ Next: F test after regression using factor variables

≪ Previous: empirical bayes estimator or shrinkage

I am having a peculiar programming problem. My environment is Stata/SE 13.1 for Mac (64-bit Intel) Revision 12 Nov 2015.

When I run the following code from within the do-file editor in a freshly-launched Stata session

Code:

capture program drop gnxl
program define gnxl
syntax , foo
if _rc {
    display "Syntax return code _rc = " _rc
    exit
    }
if "`foo'"=="foo" {
    display "it's foo!"
    exit
    }
end

gnxl, foo
gnxl, foo

capture program drop gnxl
program define gnxl
syntax , foo
if _rc {
    display "Syntax return code _rc = " _rc
    exit
    }
if "`foo'"=="foo" {
    display "it's foo!"
    exit
    }
end

gnxl, foo
gnxl, foo

the results are

Code:

. capture program drop gnxl

. program define gnxl
  1. syntax , foo
  2. if _rc {
  3.     display "Syntax return code _rc = " _rc
  4.         exit
  5.     }
  6. if "`foo'"=="foo" {
  7.     display "it's foo!"
  8.     exit
  9.     }
 10. end

. 
. gnxl, foo
Syntax return code _rc = 111

. gnxl, foo
Syntax return code _rc = 111

. 
. capture program drop gnxl

. program define gnxl
  1. syntax , foo
  2. if _rc {
  3.     display "Syntax return code _rc = " _rc
  4.         exit
  5.     }
  6. if "`foo'"=="foo" {
  7.     display "it's foo!"
  8.     exit
  9.     }
 10. end

. 
. gnxl, foo
it's foo!

. gnxl, foo
it's foo!

. 
end of do-file

↧

F test after regression using factor variables

November 23, 2015, 3:47 pm

≫ Next: Converting NIS database load program from SAS to STATA do file

≪ Previous: Problem with [P] syntax

Hi Stata list,

I am trying to perform a post estimation partial-F (Chow) test after a regression specified using a factor variable.

I have a model of the effect of price on the probability of purchase of different goods. Since the level of purchase and the effect of price may both differ by the good, I estimate a different intercept and slope for each good (a specific type of fixed effect model, I think). Here's the code:

Code:

glm yvar_ind c.price##i.good, family(binomial) link(probit)

I'd like to test whether the slopes for the different goods really differ. Per http://www.jblumenstock.com/files/co...4/FEModels.pdf, I am performing the
partial-F (Chow).

To do this after running the regression what is the appropriate code. Is it just:

Code:

test c.price#i.good

Thanks!

↧

Converting NIS database load program from SAS to STATA do file

November 23, 2015, 5:09 pm

≫ Next: Regression loop and store specific coefficients

≪ Previous: F test after regression using factor variables

Hello everyone,

I want to use the year 2003 of NIS database (Nationwide Inpatient Sample) in Stata but unfortunately they don't provide the so called 'load program' file —which is basically a do file for recoding and labeling the .asc format file— for Stata for this particular year. I tried to translate it myself but was afraid that I may make a mistake which can result in incorrect analyses later.

This is the load program for 2003 in SAS:
https://www.hcup-us.ahrq.gov/db/nation/nis/tools/pgms/SASload_NIS_2003_SEVERITY.sas
you can right click and download the file. Also, I have attached the .txt format here.

I greatly appreciate it if anyone can help me in this regard.

Thank you.

Reza

↧

Regression loop and store specific coefficients

November 23, 2015, 5:26 pm

≫ Next: sspace and prior distribution

≪ Previous: Converting NIS database load program from SAS to STATA do file

I am going to do two things in STATA: (1) loop a regression over a certain criteria for many times; (2) store a certain coefficient from each regression results. I am giving an example of what I am doing below:

Code:

clear
sysuse auto.dta
local x = 2000
while `x' < 5000{
xi: regress price mpg length gear_ratio i.foreign if weight < `x'
est sto model_`x'
local x = `x' + 100
}
est dir

As to all stored regression coefficients results, I actually just care about one of them, for example, mpg here. Ideally, I want STATA to extract coefficients of mpg from each result and compile them into one independent file(any file is OK, .dta would be great). By doing this, I want to see the trend of coefficient of `mpg` as `weight` increases. What I am doing right now is to use`estout` to export the results, something like:

Code:

esttab * using test.rtf, replace se stats(r2_a N, labels(R-squared)) starl(* 0.10 ** 0.05 *** 0.01) nogap onecell title(regression tables)

`estout` will export everything and I need to edit them by myself. Technically, It works for regressions with smaller amount of independent variables, but my real working dataset has more than 30 variables and the regression will loop for at least 100 times ( I have a variable `Distance` which is in a range of (0 ~ 30,000) and I use it as `weight` as in the above example). Therefore, it is really difficult for me to edit the results by myself without making mistakes.

I am wondering are there any other efficient ways to solve my problem? especially the second one. I definitely would like to know a silver bullet method (if there is one) to solve them both at one time. Since my case is not looping over a group variable, but over a certain criteria. The `statsby` function seems not working well here. Any comments or suggestions are greatly appreciated!

↧

sspace and prior distribution

November 23, 2015, 6:38 pm

≫ Next: Problem on raking using survwgt package

≪ Previous: Regression loop and store specific coefficients

Can I apply prior distribution options which is used 'Bayesian estimation' part to 'sspace' commands? (Stata 14)

↧

Problem on raking using survwgt package

November 23, 2015, 7:32 pm

≫ Next: hello i want to make a regression model

≪ Previous: sspace and prior distribution

Hi all,

I am currently working on a project where I need to rake my data. I have searched online and found this handy "survwgt" package written by Mr. Nick Winter.

My plan was to rake the data by several demographic variables such as gender, age, ethnicity, etc. So, I generated a set of corresponding variables that contain the population parameters and wished to rake the selected demographic variables with the target variables.

Selected variables:
n_Gender (2 levels)
n_Age (4 levels)
n_Ethnicity (2 levels)

Target variables:
N_Gender
N_Age
N_Ethnicity

Before I rake, I also used the egen function to group the Island variable with each of the selected variables. Island is a categorical variable with 4 levels.

After grouping, I have:

Grouped selected variables:
Gn_Gender
Gn_Age
Gn_Ethnicity

The command is as follow:

Code:

survwgt rake preweight, by(Gn_Gender Gn_Age Gn_Ethnicity) totvar(N_Gender N_Age N_Ethnicity) generate(postweight) maxrep(100)

and I got the error "Total across dimensions 1 and 2 are not equal"

I understand that this error may occur if the total population summing across N_Gender is different from N_Age. But then when I double-checked the total for N_Gender and N_Age, they are summing up to the same number. So, my question is: Are there any other factors that could have caused this error? Any comments or suggestions are deeply appreciated!

↧

hello i want to make a regression model

November 24, 2015, 9:24 am

≫ Next: How can I produce multiple bar graphs? A matrix of bar graphs? Can it be done with catplot? or tabplot?

≪ Previous: Problem on raking using survwgt package

I'm uisng ver13.stata
'm trying to make a regression model through the stata.
so far, i want to know that is there a difference in wages based on educational level(masters vs undergrad)
hoewver, my DTA file have a lot of label lists,
for example
70 Post Grad, 71 Post Grad,,72 Post Grad, ,73 Post Grad, 74 Post Grad, 75 Post Grad 76 Post Grad,78 Post Grad,

so far i did like this

independent variable = wages
z2041_h_educ == 5(undergrade)
z2041_h_educ == 70 to 78 ( masters)

gen f_master=0 if z2041_h_educ == 5

forval x=70/78 {
replace f_master=1 if z2041_h_educ==`x'
}

but i want to check the linear relationship ,,,, how can I do ? I'm stuck here.

regress wages f_master ( this is not what i wanted..)

i want to make like this =

regres wages masters undergrade

could please give me a hint? thanks.

↧

How can I produce multiple bar graphs? A matrix of bar graphs? Can it be done with catplot? or tabplot?

November 24, 2015, 9:39 am

≫ Next: Creating New Variable to Categorize Based on Maximum of Several Variables

≪ Previous: hello i want to make a regression model

Using Stata SE12

Data: Patient-level health data; patient characteristics and responses to a quality of life questionnaire

I want to produce multiple bar charts displaying the categorical data distribution (proportion of patients in each category) for each item on my questionnaire, and show this separately for patients in 4 different settings. The aim is to provide a visual summary of patient responses to the questions, comparing differences between settings of care - I want to show a lot of information on one page as a visual summary NB: this is not for an academic paper - im reporting to health care teams on the data they have collected.

I'm attaching the graph I have produced using catplot (SSC), code: catplot iposq2pain2_3, by(setting) percent(setting) blabel(bar, format(%4.1f) pos(top))

This is how I want each bar graph to look - but I want multiple items/questions included, by setting (4 bar charts per question) - all within the same graph. Is this possible?

Im also attaching a crude mock up of the graph I ideally want - a cut and paste of the graphs I have produced with catplot. Array

One final point - there is an example of a 'matrix of bar graphs' here: http://blog.stata.com/tag/sem/ does anyone know how this was produced?

Thank you

↧

Creating New Variable to Categorize Based on Maximum of Several Variables

November 24, 2015, 9:43 am

≫ Next: Interpreting results of -signrank-

≪ Previous: How can I produce multiple bar graphs? A matrix of bar graphs? Can it be done with catplot? or tabplot?

I am analyzing survey data for 1000 respondents, and want to categorize them based on where they scored the highest across 7 metrics.

I have 7 continuous variables of scores for class subjects:

1. score_math
2. score_science
3. score_english
4. score_history
5. score_spanish
6. score_reading
7. score_writing

And I want to create a new variable (student_segments) that will return discrete values 1 through 7 depending on which of the above 7 variables returned the max score (i.e., if their score for math is their highest of the 7 scores it would return a value of 1 for the variable customer_segment... and 2 for science, 3 for english and so on)

Any advice on the best way to do this is very much appreciated!

↧

Interpreting results of -signrank-

November 24, 2015, 10:14 am

≫ Next: Dropping largest difference time variable

≪ Previous: Creating New Variable to Categorize Based on Maximum of Several Variables

In an experiment I ran, a subject made a decision that led to an initial payoff. Next, she made a second decision that updated her initial payoff to a final payoff. I tested for a difference between these two payoffs across subjects using -signrank-.

Plotting the values, the difference looks minimal. The final payoff is always at least below if not equal to the initial payoff, and there is only the slightest daylight in between the two:

Array

But the test came back highly significant:

signrank initial = final

Wilcoxon signed-rank test

sign | obs sum ranks expected
-------------+---------------------------------
positive | 0 0 59.5
negative | 14 119 59.5
zero | 1 1 1
-------------+---------------------------------
all | 15 120 120

unadjusted variance 310.00
adjustment for ties 0.00
adjustment for zeros -0.25
----------
adjusted variance 309.75

Ho: e = e1
z = -3.381
Prob > |z| = 0.0007

Is the significance of the test driven by the fact that one curve is almost always above the other?

↧