Weak instruments few clusters small panel data

May 16, 2017, 8:12 am

≫ Next: Help with gllamm (or other command)

≪ Previous: Interpreting a Local Counter with multiple loops

Dear all,

I am running an IV fixed effects estimation (xtivreg2) using a panel of around 3000 observations, for three years.

My dependent variable is a binary variable (at the individual level), while my independent variable is an aggregate variable (regional level) where I have only 20 regions. I use IV, with more instruments than endogenous variables (which are between 2 and 4 depending on the specification). The Sargan test is ok.

My problem is that when I am running xtivreg2, clustering for the region, instruments turn out to be very weak, thing that does not happen using robust or clustering at the individual level. I checked the first stage outputs and it appears that the t_statistics for only one endogenous regression is very low, and therefore also its F of the first stage.

Could this be related to the fact that I have a short panel and only 20 region clusters? Consider that the clusters are fixed within individuals and therefore regional fixed effects are dropped from the Fixed Effects estimation. What can I do to solve this problem? Can I only cluster at the individual level?

Thanks for any advice,
Ale

↧

Help with gllamm (or other command)

May 16, 2017, 8:26 am

≫ Next: Assigning different colors to each grouping variable in bar chart

≪ Previous: Weak instruments few clusters small panel data

Hi,

Suppose we have an unbalanced panel.

Also suppose that the command below is correct.

Code:

xtset id t
xtreg y x [aweights=w], mle vce(cluster id)

Could you please help me to obtain the equivalent gllamm (or other command) syntax?

Thanks,

Lukas

↧

Assigning different colors to each grouping variable in bar chart

May 16, 2017, 8:31 am

≫ Next: A Loop in a Loop

≪ Previous: Help with gllamm (or other command)

Hi,

I'm trying to make a bar chart within several subgroups of variables (grouping variables) and I want to assign a colour for each of these grouping variables. I've tried manually changing this in graph editor, but each time i change a specific bar color, all the other bars change to the same color too.

My variables:

Time - Range of values
Sex - Male, Female, Unknown
Age Group - assigned group numbers - 1,2,3..

My Stata Command:

"graph bar (median) Time, over(Sex) over(Agegroup)"

Awaiting your response.

Many thanks

J

↧

A Loop in a Loop

May 16, 2017, 9:18 am

≫ Next: Working with Durations

≪ Previous: Assigning different colors to each grouping variable in bar chart

Hi All,

I am trying to create a series of scatter plots using a loop within a loop. I have 5 variables named X6-X10 for which I would like to assess the linearity assumption. I can create scattergrams the long way using the following code:

Code:

scatter X6 X7
scatter X6 X8
scatter X6 X9
scatter X6 X10

scatter X7 X8
scatter X7 X9
scatter X7 X10

scatter X8 X9
scatter X8 X10

scatter X9 X10

My goal is to construct a "loop within a loop" to accomplish this task. This is what I have so far (I am sure it is way off...).

Code:

foreach var of varlist X6-X10 {
    forvalues i = 6(1)9 {
            scatter `var' X`i'+1
    }
}

Again, I know that is way off. I am new to looping statements, so a big thanks in advance for any help you can provide. :-)

Best,
Adam

↧

Working with Durations

May 16, 2017, 10:05 am

≫ Next: Dollar sign when defining local varaible

≪ Previous: A Loop in a Loop

I have some data downloaded from the Sleep Cycle app that includes data on my sleep. Two of the relevant columns are Start and End, which have have datetime data about when I started and ended my sleep every night.

What I want to do with the data is create a graph that shows what percentage of nights that I am asleep at any specific time of night. For example I would be able to look at the graph and see that at 11pm, I'm already asleep on 15% of nights, at 1am I'm asleep on 50% of nights, et cetera. I'm a fairly confident Stata user, but I'm not quite sure how I would go about this. Unfortunately a manual approach is out of the question, since I have over 1000 observations.

I know this goes a bit outside of what Stata was originally intended for, so if you have any suggestions for other programs/approaches that would be more effective, that would be fantastic.

↧

Dollar sign when defining local varaible

May 16, 2017, 10:24 am

≫ Next: How to define a certain range of a period for a regression in a loop?

≪ Previous: Working with Durations

Dear all,

I'm trying to run some .do file, and receive an error on a line:

Code:

local carry=$carry_months

Error:

invalid syntax
r(198);

As far I googled, a dollar sign might mean global variable or, in some cases, just a dollar sign to be inside a variable? Could you please explain what this line might do and how can I avoid the error?

After defining it, there is a loop:

Code:

local names="AS"

foreach k in `names' {
    forval i=1/`carry' {
        sort stock month_id
    
        qui gen `k'_pr_`i'=`k'[_n-`i'] if stock==stock[_n-`i'] & month_id==month_id[_n-`i']+`i'
    }
}

foreach k in `names' {
    forval i=1/`carry' {
        sort stock month_id
    
        qui replace `k'=`k'_pr_`i' if `k'==. & stock==stock[_n-`i'] & month_id==month_id[_n-`i']+`i' 
        qui drop `k'_pr_`i' 
    }
}

Thank you in advance.

↧

How to define a certain range of a period for a regression in a loop?

May 16, 2017, 10:34 am

≫ Next: How to save coefficients of fixed effects (dummy) within pooled ols?

≪ Previous: Dollar sign when defining local varaible

Dear Stata Experts,

Please, help me with the following issue. I need to run a regression each year within a three-year window. Say, if I want to obtain results for year 2005, I take into consideration the following years: 2003, 2004, 2005 (and then I save fixed effect coefficients of "idcode"). I use the following code, but not sure if it is true:

Code:

sum year
scalar minyr=r(min)
scalar maxyr=r(max)
local m=minyr
local n=maxyr

xtset idcode
forvalues i=`m'(1)`n'{
            qui reg ln_w grade age c.age  if inrange(year,`i-2',`i'), vce(cluster idcode)
            fe`i'
            di `i'
}

Please, let me know if I have a mistake in my code.

Best regards,
Alberto

↧

How to save coefficients of fixed effects (dummy) within pooled ols?

May 16, 2017, 10:40 am

≫ Next: Pseudo R2 for Gamma Regression?

≪ Previous: How to define a certain range of a period for a regression in a loop?

Dear Stata Experts,

Sorry if this post is related to my previous one, but I still could not figure out the answer from your responses. The question is how to I store coefficients of a fixed effect (dummy variable) within a pooled ols? For the code below I need to store coefficients of "i.idcode"

Code:

reg ln_w grade age c.age i.idcode, vce(cluster idcode)

Please, advise me this issue.

Best regards,
Alberto

↧

Pseudo R2 for Gamma Regression?

May 16, 2017, 11:02 am

≫ Next: Help for new guy in doing time series forecasting with many variables

≪ Previous: How to save coefficients of fixed effects (dummy) within pooled ols?

I've been asked to determine the psuedo R2 for my multivariate model, but I am using glm with gamma family and log link. When I look online, I can only find Psuedo R2 instructions for logistic regression. Is anyone aware of determining a Pseudo R2 for glm - gamma and how I would run this in Stata? Any help would be greatly appreciated!

↧

Help for new guy in doing time series forecasting with many variables

May 16, 2017, 11:09 am

≫ Next: Understanding performance issues with OBDC/configuring machine to speed up process

≪ Previous: Pseudo R2 for Gamma Regression?

After watching dozens of videos and spending about 10 hours this week reading blogs and forums trying to figure it out on my own, I decided that I needed to risk looking dumb and ask people more experienced than myself.

I am trying to forecast prices of items based off the previous prices of other items. I found a way to "caveman" my way through it (it takes me about 4 hours to make one equation), but am hoping to find a better way. (In case it matters, I will post the summary of my method below)

The goal is to create a linear regression that is something like: the price of wheat in three months = β1(price of sugar six months ago) + β2(price of wheat ten months ago) + E

I have 275 variables in my database to test with and need equations to predict each 2 through 13 months into the future for 10 items.

I have tried using the Statistics>Multivariate Time Series>Forecasting a few times. The furthest I have gotten is:
tsset t
gen corn5=f5.corn *(this variable is to test what corn price would be 5 months in the future compared to current prices of other items)
reg corn5 var1-var275 if t<109 *(the result is a long list of variables "omitted because of collinearity", followed by the "Source SS df MS" window, followed by a list of most variables listed as "(omitted)" and some with coefficient values)
estimates store corn5
forecast create corn5pred
forecast estimates corn5 *(followed by "Added estimated results from regress. Forecast model corn5pred now contains 1 endogenous variable)

A new variable was created called "_est_corn5" that is mostly zeros with some ones.

Any help with pointing me in the right direction would be highly appreciated.

Caveman method:
In Excel, create a new variable for each of the 275 variables for each possible lag (2 through 13), so 275 variables x 12 to test for a 2 month prediction, 275 x 11 to test for a 3 month, etc. I then do a correlation test in Excel to find which variables at differing lags have the highest correlation with the current price of the item of interest. I take the top 150 variable/lag combinations and do a stepwise regression (sw, pr(.05): reg VarOfInterest var10lagged4 var3lagged2 etc) for as many variables as Stata will take without refusing to work because of collinearity for the first half of the time series. I then remove the variable with the highest VIF one by one until I am able to eventually add in and test all 150 independent variables. Independent variables are removed one by one based on VIF and Pearson value until about a dozen "reasonable" formulas remain. I then test them against the second half of the time series and use the formula with the best results. Rinse repeat for 10 items of interest x 12 lag time periods x 4 hours each.

↧

Understanding performance issues with OBDC/configuring machine to speed up process

May 16, 2017, 11:30 am

≫ Next: Heston-Rouwenhorst Model

≪ Previous: Help for new guy in doing time series forecasting with many variables

Working with Stata 13 8-core MP on a Windows 10 Home laptop. i7.4710HQ processor, 16gb RAM, Samsung Evo 850 SSD.

I'm using the obdc command to export a large access database that is used by a data visualization program to create dashboards based on the analysis run in Stata.

Code:

odbc insert clave_registro r* i_* m_*, table(Reporte_por_centro) dsn("Reportes por centro") over

-over- option here because the program runs automatically every day to update the databases. I use Access because of performance issues I've had with my data visualization program.

The code takes 5-7 minutes to execute for a 254-variable, 20,000 observation dataset, even though task manager shows there are plenty of resources available (processor, RAM, disk).

I understand these performance issues are common with obdc in Stata, but I am wondering a) what causes the command to execute so slowly when more resources are available, b) if a machine can be configured to improve performance. Specifically, I'll be moving this program to a VM on a Windows server soon and am wondering if there is a way I can configure the server to run the do file more quickly.

Thanks!

↧

Heston-Rouwenhorst Model

May 16, 2017, 11:31 am

≫ Next: How to perform a comparison between t-statistics and p-value to identify significance level - matrix problem

≪ Previous: Understanding performance issues with OBDC/configuring machine to speed up process

Hello,

For my thesis I want to use the Heston-Rouwenhorst Model:

𝑟𝑖 = 𝛼 + ∑𝛽𝑗I𝑖𝑗 + ∑𝛾𝑘C𝑖𝑘

where r is the return of firm i, I is an industry dummy variable and C is a country dummy variable.

I have gathered the necessary data (15 countries and 11 industries) and can run a regression such as:
reg F i.Country i.Industry (where F is the return of all firms in month x).

This regression has a collinearity problem. Heston and Rouwenhorst get around the problem of collinearity by imposing two restrictions:

∑^𝐾_𝑘=1𝑚_𝑘γ_𝑘= 0
∑^𝐽_𝑗=1𝑛_𝑗𝛽_𝑗= 0

where 𝑚_𝑘 and 𝑛_𝑗refer to the number of stocks in country k and industry j, respectively. γ and 𝛽 are regression coefficients.

When running the regression -reg F i.Country i.Industry- Stata automatically drops 1 country dummy and 1 industry dummy because of collinearity but I need the coefficients of all dummies.
Is there a way to tell Stata not to omit any dummy variables so that I can impose the two restrictions illustrated above? Or should I implement this model in a different way?

For each month I was planning on running the code below:

reg F i.Country i.Industry
loc 1 = _b[Country1]
....
loc 15 = _b[Country15]
constraint 1 m1*γ1+.....+m15*γ15=0
loc 16 = _b[Industry1]
...
loc 26 = _b[industry 11]
constraint 2 n1*𝛽1+....n11*𝛽11=0
csnreg F i.Country i.Industry, c(1/2)

Thanks in advance
David Sterken

↧

How to perform a comparison between t-statistics and p-value to identify significance level - matrix problem

May 17, 2017, 8:08 am

≫ Next: Marginal Effects on Interaction Terms

≪ Previous: Heston-Rouwenhorst Model

Dear Members,

I wish to perform the dfgls test for a set of macroeconomic variables for a set of countries.

I put in a matrix the scalar obtained through the following loop:

Code:

local r=1
matrix DFGLS_LATVIA_LEVELS = J(7,5,.)
foreach var of varlist mgsv iad rmp gdpv nc itv xgsv{
dfgls `var' if ifscode==941
matrix DFGLS_LATVIA_LEVELS[`r',1]= r(N)
matrix DFGLS_LATVIA_LEVELS[`r',2]= r(maxlag)
matrix DFGLS_LATVIA_LEVELS[`r',3]= r(sclag)
matrix DFGLS_LATVIA_LEVELS[`r',4]= r(maiclag)
matrix DFGLS_LATVIA_LEVELS[`r',5]= r(optlag)
local r=`r'+1
}
matrix colnames DFGLS_LATVIA_LEVELS = "N" "Maximum lags" "Lag Schwarz criterion" "Lag modified AIC method" "OPTIMAL lag sequential-t method"
matrix rownames DFGLS_LATVIA_LEVELS = "mgsv" "iad" "rmp" "gdpv" "nc" "itv" "xgsv"
matrix list DFGLS_LATVIA_LEVELS
putexcel set myresults.xlsx, sheet(DFGLS_LATVIA_LEVELS)
putexcel A1 = ("Variables")
putexcel B1 = ("N")
putexcel C1 = ("Maximum lags")
putexcel D1 = ("Lag Schwarz criterion")
putexcel E1 = ("Lag modified AIC method")
putexcel F1 = ("OPTIMAL lag sequential-t method")
putexcel A2 = ("mgsv")
putexcel A3 = ("iad")
putexcel A4 = ("rmp")
putexcel A5 = ("gdpv")
putexcel A6 = ("nc")
putexcel A7 = ("itv")
putexcel A8 = ("xgsv")
putexcel B2 = matrix(DFGLS_LATVIA_LEVELS)

I know that the dfgls also provides a matrix or results. For example if I type what follows outside the loop, for one variable, itv, and one country, 941:

Code:

dfgls itv if ifscode==941

and ask for the results:

Code:

matrix list r(results)

I get the following result:

Code:

r(results)[11,5]
             k        MAIC         SIC        RMSE       DFGLS
r1          11  -4.7873584  -4.4196004   .07561431  -1.0180128
r1          10  -4.8202692  -4.4771146   .07578605  -.95794406
r1           9  -4.8362183  -4.5344526   .07596488  -1.0720941
r1           8  -4.8607304  -4.5733971   .07684763  -.89233621
r1           7  -4.8669265    -4.62445    .0772714  -1.0596359
r1           6   -4.900552  -4.6820343   .07744419  -.98500844
r1           5  -4.8901824  -4.7233092    .0782529  -1.2342609
r1           4  -4.9131212   -4.784244   .07829661  -1.3305729
r1           3  -4.9482966   -4.842877   .07843056   -1.268984
r1           2  -4.9366561  -4.8354363   .08120361  -.89749689
r1           1  -4.9042244  -4.8181604   .08448918  -.52626399

What I would like to perform I think is a little bit tricky.

I would like to extend my loop, or create a new one to do what follows, which I divide in four steps for the sake of clarity.

1) Go to the fifth and last column of the abovementioned r(results) matrix, which reports the DF-GLS tau-statistic.

2) Compare the DF-GLS tau-statistic with the 1%, 5%, 10% critical values, at the optimal lag indicated by the r(optlag) scalar, i.e. Ng-Perron criterion.

3) Add an additional column with ***, ** and * for the 1%, 5%, 10% critical values, respectively.

4) Export the results obtained above into excel.

I am struggling to identify which path I should follow to reach my aim. One element of further complexity is that I cannot store the values of the three different significance levels at all lags.

I would be very grateful if any Member could provide me with some help.

Many thanks.

Marco

↧

Marginal Effects on Interaction Terms

May 17, 2017, 8:58 am

≫ Next: Model convergence

≪ Previous: How to perform a comparison between t-statistics and p-value to identify significance level - matrix problem

Hi, I'm trying to calculate the marginal effects of the interaction of two variables in a logistic regression model. Basically, I'm regression y on x1, x2 and x1*x2, I want to obtain the marginal effects of changing dummy_x1 from 0 to 1, the marginal effects of changing dummy_x1 from 0 to 1 holding dummy_x2 at 1, and the significance between the two effects. In Ashcraft (2008) Table 5, this was achieved by simply running -dprobit- and calculate the marginal effects of x1*x2, and I'm wondering if the new -margins- command supports such functionality. Thanks.

↧

Model convergence

May 17, 2017, 9:07 am

≫ Next: Histogram axis ??unrelated to data

≪ Previous: Marginal Effects on Interaction Terms

Dear STATA user fellows,

I just want to report something that I find a bit strange happening. Not sure if I should worry about my model. My model looks like:

nbreg depvar x1 x2 x3...x8 i.year i.country, vce(cluster country)

My model does not converge even if I use "difficult" option or other techniques, e.g bfgs
However, if I move x3 in front of x1, i.e

nbreg depvar x3 x1 x2 ...x8 i.year i.country, vce(cluster country)

then, the model is converging!

Any idea as why this happens? I assume it has to do with different initial values, but it still I thought that x1...x8 sequence should not matter.
Ioannis

↧

Histogram axis ??unrelated to data

May 17, 2017, 9:18 am

≫ Next: can't re-fit a model that I previously fit: advice on setting initial values

≪ Previous: Model convergence

I wonder if someone could explain to me where Stata is finding my y axis when I plot this histogram? I want to to read N with whole numbers, or % of total. I know how to change the actual titles for the axis, but not to get Stata to recognise the graph to be N or % of total.
This is to plot the spread of ages in a study of 607 patients.
This is the beginning of the table:
. tab Age_num

Age | Freq. Percent Cum.
------------+-----------------------------------
9 | 1 0.17 0.17
19 | 1 0.17 0.34
21 | 1 0.17 0.50
23 | 1 0.17 0.67
25 | 1 0.17 0.84
27 | 1 0.17 1.01
28 | 1 0.17 1.17
29 | 3 0.50 1.68
33 | 2 0.34 2.01
Then with histogram Age_num I get this graph, it is the red box that is my problem and I don't understand why Stata is showing me this - neither the frequency nor the Percentage correlates to a peak of just over 0.04??? Many thanks for any explanations
Array

↧

can't re-fit a model that I previously fit: advice on setting initial values

May 17, 2017, 9:30 am

≫ Next: problem merging variables

≪ Previous: Histogram axis ??unrelated to data

Hi:

I have a problem I'm hoping someone can help with...

I previously fit a multilevel binomial model using 'meqrlogit'. I stored the estimates after fitting the model, and as I haven't ended the Stata session (i.e. I haven't turned my computer off for a few days!), I can restore the estimates and make predictions etc.

To fit the model, I went through a number of iterations, storing the matrix e(b) and using this as initial values when fitting the next model (i.e. after fitting a model I used 'matrix b = e(b)', and I suffixed the subsequent model with 'from (b, skip)').

The problem I have is that when writing the do-file, I jumped back-and-forth a few times as I thought of additional model configurations I should test. This was foolish, as the order of models in the do-file isn't the order in which I initially fit the models. And now, I'm unable to re-fit my final model as I can't work out the sequence of e(b) matrices I used to fit it: I keep getting the message 'initial values not feasible'.

This means if I turn off my computer, I won't be able to refit the model.

I realise you can save the estimates as an 'est' file using 'estimates save', but if I load these into a new Stata session I can't make predictions with the model (which is my main purpose).

I can see the matrix e(b) from my final model, but I don't know if it's possible to manually replicate this matrix to use it as starting values for re-fitting the final model. If I could manually replicate e(b), presumably I could then easily re-fit the model.

Does anyone know if there's a way to either:

- save the model estimates in such a way that I can load them into a new Stata session and make predictions? (bearing in mind this is a multilevel model)

- manually create a matrix that could then be used as initial values to refit the model in a new Stata session

Any help would be greatly appreciated!

Cheers

↧

problem merging variables

May 17, 2017, 9:33 am

≫ Next: descriptive statistic table

≪ Previous: can't re-fit a model that I previously fit: advice on setting initial values

I have 5 different variables with different data for each observation. these 5 variables represent years 2011-2015.

How can I group them as One variable named "year" and 5 different labels? is this even possible?

↧

descriptive statistic table

May 17, 2017, 9:34 am

≫ Next: correlation table

≪ Previous: problem merging variables

Hi all,

I would want to create a table, that can be exported in word file, that shows the mean, sd and also percentiles of my variables. These values have to be shown in the same line but in different columns.

Doing the following commands I just get the mean and sd, and the sd below the mean:

summarize TotalAssets Revenue CMarkCap Equity ROA_0
estpost summarize TotalAssets Revenue CMarkCap Equity ROA_0
eststo summstats
esttab summstats using table4.rtf, replace main(mean %6.2f) aux(sd)

Thank you!

↧

correlation table

May 17, 2017, 9:47 am

≫ Next: Error r(123) when running a marginscontplot

≪ Previous: descriptive statistic table

Hi all,

I would want to create a table, that can be export in word file, that shows the correlations between all the variables.
More precisely, I would want to create a table that replicate the result I get when I run the following command:
pwcorr y x1 x2 x3 x4, sig star(.05)

Thank you!

↧