Remaining serial correlation in ARDL model

May 14, 2017, 7:14 am

≫ Next: VECM interpretation of estimated coefficient.

≪ Previous: Repeated commands for different variables

Dear all,

I am using an ARDL model to solve the problem of autocorrelation in my regression, but how can I check whether autocorrelation in the error term might still exist? In the following model I use an ARDL(1,0) model with robust standard errors. Is the -robust- option enough to counter any remaining autocorrelation?

I have a panel dataset of 28 countries and T=15 per country. Total obs = 420

Code:

 xtreg logY L1.logY X control, fe robust

Fixed-effects (within) regression               Number of obs     =        392
Group variable: Country                         Number of groups  =         28

R-sq:                                           Obs per group:
     within  = 0.5706                                         min =         14
     between = 0.9890                                         avg =       14.0
     overall = 0.9578                                         max =         14

                                                F(3,27)           =      63.79
corr(u_i, Xb)  = 0.9123                         Prob > F          =     0.0000

                               (Std. Err. adjusted for 28 clusters in Country)
------------------------------------------------------------------------------
             |               Robust
     logY    |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     logY    |
         L1. |   .6909268   .0520994    13.26   0.000     .5840277    .7978259
             |
       X     |   .0011246   .0004701     2.39   0.024       .00016    .0020891
     control |   .0021688   .0008499     2.55   0.017      .000425    .0039125
       _cons |   .1956171   .0397835     4.92   0.000     .1139881    .2772461
-------------+----------------------------------------------------------------
     sigma_u |  .03386153
     sigma_e |  .01760108
         rho |  .78728529   (fraction of variance due to u_i)
------------------------------------------------------------------------------

↧

VECM interpretation of estimated coefficient.

May 14, 2017, 8:08 am

≫ Next: What is next after a referee rejects an instrumental variable strategy?

≪ Previous: Remaining serial correlation in ARDL model

Hello!

I have estimated a VECM model and are about to interpret my results. The model includes exports volume (set to unity), world GDP and real effective exchange rate (an increase = appreciation), all in log-form. The coefficients look, for an example, like this.

Array

Does one need to put any restrictions on the coefficients? Or can they be read directly as elasticities?

I am greatful for any helpful answer.

Best regards

↧

What is next after a referee rejects an instrumental variable strategy?

May 14, 2017, 8:11 am

≫ Next: Mata syntax: single line if statements

≪ Previous: VECM interpretation of estimated coefficient.

I have a paper that just got rejected. It appears that my identification strategy is not sound enough. Below I describe the critic and seek some advice.

My sample is elderly aged 55 and over. The dependent is a health outcome. Two independent variables of interest are part-time (working less than 35 hours a week) and full-time work dummies, and hence the base is retired. Since health can affect work decisions (simultaneity), I take an IV approach. As instruments for working part-time and full-time, I use dummies indicating whether the individual has reached age 62 or 65 which are eligibility ages to receive early and normal social security benefits. I also consider age 70 for some reason I do not need to explain. I also consider the same eligibility ages for the partner with the argument being that partner's retirement status could affect the work decisions of the individual. Hence, in total I have six instruments. There is a literature analyzing the effects of retirement on health outcomes using eligibility ages as instruments for the retirement decision. In this literature the base outcome is working any number of hours. Hence, my idea is instead to differentiate between part-time and full-time work, and analyze their effects on health since working different number of hours could have different effects on health. Meanwhile, I also consider fixed effects as the data is panel, but this is irrelevant to discussion here.

1.png presents the first stage results for the two endogenous variables. In total I have six instruments. The first stage regressions are both linear probability models. Since I have two endogenous variables, the instruments should provide independent sources of exogenous variation for both endogenous variables so that their effects can be identified. Hence, I consider the conditional F statistic (of Agrist and Pischke which is later improved by others; I do not present the results here) which suggests that the instruments are not weak.

2.png presents the second stage results. The results show that the effect of part-time is much larger than the effect of full-time. But the referee points out a problem. Since both part-time and full time work are dummies, the larger are the first stage coefficients, and so the predicting power of the instruments, the smaller will be the IV coefficient (like in a Wald estimator). Therefore, it is almost mechanical to observe a larger estimated effect of working part-time on the health outcome because almost all instruments better predict the probability of working full-time than they do the probability of working part-time. In fact, a larger effect for part-time is observed for a couple of other health outcomes, supporting the referee's concern.

I would like to ask two questions:

1. Given the critic, is the following then a lesson to be learned for the IV method in general? Suppose we have one endogenous variable and two instruments. Suppose we consider one instrument at a time: so no GIV but just IV estimation. Suppose both instruments are valid, equally significant, but that the first instrument has a larger effect on the endogenous variable than the second, in the first stage. If the referee is right, the first instrument will always result in a smaller IV estimate, and the second will result in a larger IV estimate, in the second stage. What do we conclude? If the effect of the instrument is large in the first stage, the IV estimate will be small in the second stage? But I do not recall myself reading about such a problem in any econometric textbook.

2. How could I proceed? To circumvent the critic, I should find an instrument for working part-time such that the effect of the instrument in the first stage is about the same size as that of the effect of the instrument for working full-time? It is probably not possible to find such an instrument. Should I discard the model all together? Or is there by chance an alternative econometric model I could turn to?

↧

Mata syntax: single line if statements

May 14, 2017, 8:53 am

≫ Next: Two-way fixed effects model

≪ Previous: What is next after a referee rejects an instrumental variable strategy?

what is the syntactic rule that underlines the following behaviour (compile error in the first and fourth if stmts but not the second or third):

. mata: if (1==1) x=1;

unexpected end of line
<istmt> incomplete
r(3000);

. mata: if (1==1) x=1;;

. mata: if (1==1) x=1; x;
1

. mata: if (1==1) x

unexpected end of line
<istmt> incomplete
r(3000);

↧

Two-way fixed effects model

May 14, 2017, 8:58 am

≫ Next: Mata: confusing error message

≪ Previous: Mata syntax: single line if statements

Hi Statalisters,

I have a panel data which spans from 2008 through 2015 and covers 181 Italian listed family firms. My main interest is the relation between founding-family ownership and firm performance. The analysis also incorporates variables that identify CEOs as firm founders, descendants of the firm's founder, or outsiders. I would like to use a two-way fixed effects model for my regression analysis.

The paper I have read that does something similar describes the fixed effects to be dummy variables for each year of the sample and dummy variables for each two-digit SIC code (I would like to use ATECO 2007 Code since I am talking about Italy), and the regression they employ is the following:

Firm Performance= δ0 + δ1 (Family Firm) + δ3 (control Variables) + δ3 + δ54 (Two digit ATECO Code) + δ'93-'99 (Year Dummy Variables) + 𝛆

where
Firm Performance = ROA based on EBITDA and net income, and Tobin's q;
Family Firm = binary variable that equals one when the founding family is pre- sent in the firm, and zero otherwise; Control Variables = officer and director holdings less family holdings, fraction of independent directors serving on the board, research and development expenses divided by total sales, long-term debt divided by total as- sets, stock return volatility, natural log of total assets, and the natural log of firm age;
Two-Digit ATECO Code = 1.0 for each two-digit SIC code in our sample;
Year Dummy Variables = 1.0 for each year of our sample period."

How should I build the model on STATA?

Thank you a lot!

↧

Mata: confusing error message

May 14, 2017, 10:13 am

≫ Next: Panel data

≪ Previous: Two-way fixed effects model

when you type:
. mata: rmdir ("vendor")

you get:
could not create directory vendor
rmdir(): 693 could not remove directory
<istmt>: - function returned error

The error seems to be an extra errprintf in rmdir.mata

*! version 1.0.0 15dec2004
version 9.0

mata:

void rmdir(string scalar dirpath)
{
if (_rmdir(dirpath)) {
errprintf("could not create directory %s\n", dirpath)
_error(693, "could not remove directory")
/*NOTREACHED*/
}
}

end

↧

Panel data

May 14, 2017, 11:37 am

≫ Next: Estimating weighted logit in multilevel models

≪ Previous: Mata: confusing error message

Dear Sir,

I have an unbalanced panel set for two time period with respect to Unique Person ID. To make it balance panel i tried the following command:

by IDPER : gen copies=[_N]

keep if copies==2

However, when I am browsing the data pre and post using the - above mentioned - command it is only keeping the observation for second period.

I have cross checked this with the duplicates command in stata, which checks for the list of duplicates with respect to Person ID. I have used the following command:

duplicates list IDPERSON

It states the data set has 4456 groups, meaning we have two time period information for 4456 person. Thus my panel shall have 8912 observation, however, after using the copies command I am left with only 4456 persons i.e. I have only second period observation.

HOW CAN I CREATE A BALANCED PANEL SET.

↧

Estimating weighted logit in multilevel models

May 14, 2017, 5:12 pm

≫ Next: Multilevel Analysis Using Complex Survey Data -- Analysis of Subset of Data and gllamm

≪ Previous: Panel data

My model looks like
gllamm y x, i(year cohort) link(logit) fam(binom)

where y is a dummy variable.

Now I want to add an analytical weight to the logit model (weighted logistic regression). However, I realize that there is no option in gllamm to add analytical weights. How could do it in Stata?

Thanks!

↧

Multilevel Analysis Using Complex Survey Data -- Analysis of Subset of Data and gllamm

May 14, 2017, 6:45 pm

≫ Next: Exporting from stata to excel

≪ Previous: Estimating weighted logit in multilevel models

Hello,

I am new to multilevel analysis, and am learning about weighting data and the Stata program gllamm. My question here is about subsetting my data.

I am planning a multilevel analysis to understand if variation in state-level measures of structural stigma against persons with mental illlness (operationalized as state availability of services, state mental health expenditures, etc.) predicts health outcomes in persons with mental illness. The individual-level data I am using comes from BRFSS, a complex survey that collects state-specific health data. Within the BRFSS, the mental health data I am using comes from an optional module, so I am only analyzing data from the subset of states that selected this module (25 states). Further, I am interested in the subset of individuals with mental illness.

My understanding is that I will need to use the Stata program gllamm in order to appropriately weight my data, but that gllamm does not support the subsetting of data. I would appreciate any advice or references to materials that inform me about best practices in this area.

Thank you.

↧

Exporting from stata to excel

May 15, 2017, 12:49 am

≫ Next: how to sort?

≪ Previous: Multilevel Analysis Using Complex Survey Data -- Analysis of Subset of Data and gllamm

Hi ,

I am exporting a dta file to excel. I have 506861 rows and 30 columns.
when i choose the command export data to excel I receive "too many or no observations specified" R(198)
what could have went wrong?

↧

how to sort?

May 15, 2017, 1:01 am

≫ Next: kmatch: New command for multivariate-distance and propensity-score matching

≪ Previous: Exporting from stata to excel

Dear all, Suppose I have the following data

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 c1 float d1 str2 c2 float d2 str2 c3 float d3
"A" 75 "B" 45 "C" 66
"B" 71 "C" 43 "A" 45
"C" 68 "A" 56 "B" 34
end

and I want the following outcome

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 cc float(d1 d2 d3)
"A" 75 56 45
"B" 71 45 34
"C" 68 43 66
end

Any suggestions?

↧

kmatch: New command for multivariate-distance and propensity-score matching

May 15, 2017, 1:12 am

≫ Next: Kpss test with auto option

≪ Previous: how to sort?

Thanks to Kit Baum, a new package called kmatch is available from the SSC Archive. To install the package, type:

Code:

. ssc install kmatch

kmatch matches treated and untreated observations with respect to covariates and, if outcome variables are provided, estimates treatment effects based on the matched observations, optionally including regression adjustment bias-correction. Multivariate (Mahalanobis) distance matching as well as propensity score matching is supported, either using kernel matching, ridge matching, or nearest-neighbor matching. For kernel and ridge matching, several methods for data-driven bandwidth selection such as cross-validation are offered. The package also includes various commands for evaluating balancing and common-support violations.
ben

↧

Kpss test with auto option

May 15, 2017, 2:34 am

≫ Next: Problem with matching observations

≪ Previous: kmatch: New command for multivariate-distance and propensity-score matching

Dear Members,

I am using different commands to test for the presence of a unit root in macroeconomic variables. These include dfgls, dfuller, pperron and kpss.

I read the help file of the kpss command and I was in doubt when the auto option is appropriate. The help of the command indicates that

"The maximum lag order for the test is by default calculated from the sample size using a rule provided by Schwert (1989) using c=12 and d=4 in his terminology. The maximum lag order may also be provided with the maxlag option, and may be zero. If the maximum lag order is at least one, the test is performed for each lag, with the sample size held constant over lags at the maximum available sample.

Alternatively, the maximum lag order (bandwidth) may be derived from an automatic bandwidth selection routine, rendering it unnecessary to evaluate a range of test statistics for various lags. Hobijn et al. (1998) found that the combination of the automatic bandwidth selection option and the Quadratic Spectral kernel yielded the best small sample test performance in Monte Carlo simulations."

It is on possibility stated by the second paragraph, i.e. the use of the auto option, that I am particularly interested. More precisely, I am not quite sure why I should NOT use auto given its parsimonious result (despite the kpss test seems to over reject the null of stationarity too often)

I would be very thankful if anyone could provide me with an insigth.

Many thanks.

↧

Problem with matching observations

May 15, 2017, 3:04 am

≫ Next: Extraction of Yearly time series Data out of Monthly panels of multiple companies

≪ Previous: Kpss test with auto option

Hi all,

I am fairly new to Stata and I have an issue that I am not able to resolve. in my dataset, I have 2 variables that I have obtained after some preliminary work on the data. The values are like this:

Date RateGC RateSpecial

4/1/10 X1 .
4/1/10 . Y1
5/1/10 . Y2
5/1/10 X2
...

and it goes on like this with no clear pattern. I would like to obtain two series like this:

Date RateGC RateSpecial

4/1/10 X1 Y1
5/1/10 Y2 Y2

so far I tried something like this.

egen datecounter=group(date)
foreach x of varlist RateGC RateSpecial {
bys datecounter: sum `x'
bys datecounter: replace `x'=r(mean) if `x'==.
}

So the reasoning behind would be to replace the missing value for each day with the average of that day(which is the value of the only value for the var X or Y in that day) to match the observations and then keep 1 observation every day. It does not work because with this code i get the average of the last day in the dataset to replace the missing values in each date. I guess i need a nested cycle to do it, but I cannot seem to make it work it out. Could you help please? Thank you

↧

Extraction of Yearly time series Data out of Monthly panels of multiple companies

May 15, 2017, 3:27 am

≫ Next: To avoid double counting unique icd codes withing the same ID.

≪ Previous: Problem with matching observations

Dear Sir,

I first calculated rolling Beta's for each my panel company, and than manually calculated (Ki) variable, now what I required is to extract time series yearly values of this Ki variable, for all panels as well as whole year. mean i want to extract data like this

Year Ki
1997 Avg(Average of all panels monthly data for 1997)
1998
1999
2000
.....
.....
....
....

First i was using cross sectional code, to form cross sectional averages

by id_firm, sort: egen AVGKi = mean(ki)

but now i need your expert help in that how to code by id_firm as well as Month_year (time variable) so that at the end i receive above table. I am very thankful to you

↧

To avoid double counting unique icd codes withing the same ID.

May 15, 2017, 4:59 am

≫ Next: Very strange: same STATA versions produce different results?!

≪ Previous: Extraction of Yearly time series Data out of Monthly panels of multiple companies

Hi everyone

I have a dataset which contains 25 icd-10 diagnostic codes per observation. Some people have multiple visits, thus id appears multiple times.

I want to first count the number of unique icd codes per patient but somehow my tag double counts codes which appear more than once for the same id.
Secondly I need to sum each icd code (condition e.g. diabetes) for the whole same. Please see below sample of data

. dataex

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double SubjectId str7(Dx1 Dx2 Dx3 Dx4) str5 Dx5
100 "K528" "E119" " "    " "    " "  
101 "K228" "K20"  "J960" " "    "I251"
101 "I251" "I10"  "E780" "E119" "N998"
101 "J960" "J440" "E112" "I10"  "E780"
101 "K228" "K20"  "K259" "B968" " "  
101 "B964" "E119" "E039" " "    " "  
101 "J440" "B956" "B964" "T814" "D649"
101 "T857" "I251" "E119" "I10"  "E780"
103 "G410" "F29"  "I698" "M625" " "  
104 "C73"  "C770" "J441" " "    " "  
105 "J154" " "    " "    " "    " "  
105 "J188" " "    " "    "J188" " "  
end

------------------ copy up to and including the previous line ------------------

Listed 12 out of 12 observations

I have tried a few examples from the help file but still did not get any to work. Please advice

gen long id = _n
keep id Dx* SubjectId
reshape long Dx, i(id) j(_j)
egen diabetes = max(Dx == "E119"), by(id)

Also tried

gen long order = _n
by Dx1 (order), sort: gen uniquecodes = _n==1

list SubjectId Dx1 _n uniquecodes

sort order
replace uniquecodes = sum (uniquecodes)

the second option does count uniquecodes correctly and doesn't double count within the same id. But once I use a loop to apply the same method to D1-Dx5 counts are wrong.

Your assistance will be highly appreciated.

↧

Very strange: same STATA versions produce different results?!

May 15, 2017, 5:14 am

≫ Next: Using marginsplot with externally estimated data

≪ Previous: To avoid double counting unique icd codes withing the same ID.

Hi,

my colleague and I use the same STATA version (14). This time, my colleague has prepared some estimations (GMM) in the .do file that produce one set of results, when I run the same file, STATA won't let me. The error I get is:

variance-covariance matrix of the two-step estimator is not full rank
Two-step estimator is not available. One-step estimator is available and variance-covariance matrix provides
correct coverage.

I tried to google this issue but apparently, no one has asked about this error before. Essentially, STATA won't let me continue unless I remove the "twostep" option from the GMM command. But then, the results are vastly different.

Does anyone have a solution? I am using the same data set and the same do file as my colleague is. It is something in the way STATA handles the data.

Thanks!

P.S.: just to rule that one out - we both have original and licensed STATA versions

↧

Using marginsplot with externally estimated data

May 15, 2017, 5:22 am

≫ Next: Interaction effects in a limited size dataset

≪ Previous: Very strange: same STATA versions produce different results?!

Hi all,

Is it possible to import the results of a regression analysis (done with Mplus) into Stata and then use the -marginsplot- command to produce an AME plot for an interaction effect included in this regression? Mplus will let me calculate all the relevant marginal effects and confidence intervals but only produces rather ugly graphs of the interaction effect. Stata's -marginsplot-, on the other hand, produces nice-looking graphs but seems to require -margins- to go directly before to work - is there any workaround for this? I.e. can I estimate the effects in Mplus, save these estimates, import them into Stata, and use them in -marginsplot-? Or maybe import the regression results and let -margins- run on their basis? I don't want to estimate the regression itself in Stata because MPlus's FIML estimator is capable of retaining considerably more cases (imputation is out of the question for time constraints).
Sorry if this question has been posted before - my research has not brought up anything.

Thanks in advance for any suggestions!

↧

Interaction effects in a limited size dataset

May 15, 2017, 5:40 am

≫ Next: using keep option in outreg and not defining specific variables, but to keep the first variable of each regression output

≪ Previous: Using marginsplot with externally estimated data

Dear Statalisters,

I would like to ask a question that's been bugging me for some weeks now and I can't figure out on my own. As mentioned in the title, I'm trying to find evidence for interaction effects in a dataset I have which has a limited number of observations (N=2337). The three explanatory variables are on a scale from one to five. All variables are statistically significant (p<.05) in a simple linear model.

What I tried so far
I first made three dummy variables (one for each explanatory variable) indicating whether the value is higher (1) or lower (0) than the average for that variable. I did this because of the limited sample size, arguing that using the variables directly would not yield any results due to too little information available. Then, I made categories based on these dummies, giving me 8 different categories: hhh (high, high, high), hhl (high, high, low), hlh, lhh, lhl, hll, llh and lll. I entered these categories into a linear regression (omitting the lll category due to multicollinearity) and found two categories significant at .05 and one significant at .10. While this appears to give some evidence of interaction effects, I am not quite sure this is the best way.

For example, the category lhl is statistically significant, but I am not sure on how to interpret this. Comparing the result to the signs of the individual explanatory variables, it suggests some interaction effects. Comparing it to the omitted category (lll), the result is consistent.

While I did receive some suggestions on alternative approaches, I am not convinced they are better. One suggestion for example was to do 4 separate regressions comparing two opposite categories (for example: hhl vs llh). Alternatively, I did a regression including a "i.high_alpha##i.high_beta##i.high_gamma" term (the dummies indicating high or low), while excluding the Original three explanatory variables. Would this be a good approach? Any help is much appreciated.

Regards,

Remco

↧

using keep option in outreg and not defining specific variables, but to keep the first variable of each regression output

May 15, 2017, 5:44 am

≫ Next: Derive S&P1500 sample

≪ Previous: Interaction effects in a limited size dataset

Hi,

I want to create a table of the following structure.

Variable	Spec 1	Spec 2	Spec 3	Spec 4
Health	XXX	YYY	FFF	ZZZ
Education	XXX	YYY	FFF	ZZZ
Governance	XXX	YYY	FFF	ZZZ

Spec1-Spec4 represent 4 different model specifications.
Health, Education and Governance represent estimation results from different models.
I would like to append/merge the specific results underneath each other.
In the following post a solution is suggested http://www.stata.com/statalist/archive/2013-02/msg00040.html.

However; in the example they suggest to specify the names of the variables to keep.
Since I´m having a lot of models to append. And creating the outputs in a loop, I would always like to keep the first variable (and the present it in the table).
So far I´ve used eststo/esttab to present my results but I havent find an aquivalent solution yet.

I don´t think that it is useful but here are parts of my output generation.

Code:

su Sector_cat, meanonly
foreach i of num 1/`r(max)' {
eststo Sector`i'_2_Tobit: tobit Sector_share $x i.period if Sector_cat == `i', ll(0)
eststo Sector`i'_2_1_Tobit: tobit Sector_share $y i.period if Sector_cat == `i', ll(0)
xtset panel_id period
eststo Sector`i'_2_RE: xttobit Sector_share $x i.period if Sector_cat == `i', ll(0)
eststo Sector`i'_2_1_RE: xttobit Sector_share $y i.period if Sector_cat == `i', ll(0)
eststo Sector`i'_2_FE: xttobit Sector_share $x $z i.period if Sector_cat == `i', ll(0)
eststo Sector`i'_2_1_FE: xttobit Sector_share $y $zz i.period if Sector_cat == `i', ll(0)
eststo Sector`i'_2_zoib: zoib Sector_share $x i.period if Sector_cat == `i'
eststo Sector`i'_2_1_zoib: zoib Sector_share $y i.period if Sector_cat == `i'
//eststo Sector`i'_2_pantob: pantob Sector_share $x $period if Sector_cat == `i'
//eststo Sector`i'_2_1_pantob: pantob Sector_share $y $period if Sector_cat == `i'


}

↧