Quantcast
Channel: Statalist
Viewing all 72776 articles
Browse latest View live

Regression on Panel Data

$
0
0
Hello,

I am having a bit of trouble running the xtreg command on my dataset. I have panel data on 24 countries for a 12 year time period (2001-2012) on 13 variables (HRlog, DV, dGDP, GDP, CPI, GINI, HDI, dUP, Dpenalty, PM1, PM2, PM3, UEM). I am doing an analysis on homicide rates in the developing world and wish to know if capital punishment has any deterrent effect on homicide rates or not. So my dependent variable is HRlog (homicide rates per 100,000 people) and I have taken the log of it, while the rest of the socio-economic indicators like the Gini Index and HDI and so on are my explanatory variables. DV (legislation on domestic violence) and Dpenalty (Capital punishment) are dummy variables whose value is between 0 and 1. 0 means no policy on domestic violence in that country and no death penalty, while 1 means there is legislation on domestic violence and there is capital punishment. I have also created year dummies from y2001-y2012.

Before running xtreg, I do setup my data as panel data by running the xtset command STATA recognizes it as Panel Data.

So i am running the command xtreg HRlog dGDP GDP GINI HDI CPI PM1 PM2 PM3 UEM dUP NoDeath Death NoLegis Legis y2001 y2002 y2003 y2004 y2005 y2006 y2007 y2008 y2009 y2010 y2011 y2012, fe

But I get the error no observations r(2000). Please help me, what am I doing wrong?

How to check for significant heterogeneity for a categorical variable with 2 interactions

$
0
0
Dear Statlist,

I am estimating whether elderly age in a healthier way because of the Long-term care system they are in. I do this by grouping European countries in different LTCsystem groups. I regress health (grip strength) on factors determining health and an interaction of age with the LTC system categorical variable to see whether the age health slopes statistically differ between the LTC system groups. I estimate this using random effects. However, I also included age square, which leaves me with two interactions.

xtreg maxgrip c.age i.LTCsystem c.age#i.LTCsystem c.age#c.age c.age#c.age#i.LTCsystem bmi if female==1, re vce(cluster mergeid_n)

How can I test whether the slope of the age health slopes significantly differ for the four clusters? (I know I can do testparm i.LTCsystem#c.age or test 2.LTCsystem#c.age=2.LTCsystem#c.age but then I can only conclude something about the first polynomial of age.)

Any help would be greatly appreciated.


Ouput in case you would need it:
VARIABLES Max. of grip strength measure
Age at interview (in years) 0.380*
(0.204)
LTCsystem = 2, Cluster 2 12.961
(10.910)
LTCsystem = 3, Cluster 3 14.001
(11.303)
LTCsystem = 4, Cluster 4 34.450**
(13.460)
1b.LTCsystem#co.age
2.LTCsystem#c.age -0.295
(0.288)
3.LTCsystem#c.age -0.386
(0.298)
4.LTCsystem#c.age -0.963***
(0.357)
c.age#c.age -0.005***
(0.001)
1b.LTCsystem#co.age#co.age
2.LTCsystem#c.age#c.age 0.002
(0.002)
3.LTCsystem#c.age#c.age 0.002
(0.002)
4.LTCsystem#c.age#c.age 0.007***
(0.002)
Body mass index 0.077***

Expanding a time series dataset by one month

$
0
0
Hi,

I have panel data and I would like to expand the dataset by one month. So I would like to expand each "stock" time series to "2020m4". Here is the data I have:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double stock float(date exret)
 1 722           .
 1 721  .016571907
 1 720 -.021690037
 2 722  -.08885346
 2 721   .03352593
 2 720 -.013057826
 3 722   -.1733951
 3 721   .05968564
 3 720  -.13112126
 4 722  .025679555
 4 721  -.04374008
 4 720   .05557474
 5 722 -.023603665
 5 721 -.005209302
 5 720   .05193632
 6 722   .01481354
 6 721  -.04118994
 6 720  .033280447
 7 722   .04653623
 7 721  -.04578867
 7 720   .04011091
 8 722   .07355644
 8 721  -.03574601
 8 720   .08482119
 9 722  -.08222993
 9 721   .06095662
 9 720   -.0952904
10 722  -.05354553
10 721 .0007637492
10 720  -.09510595
end
format %tm date
label values stock unit_id
label def unit_id 1 "130062", modify
label def unit_id 2 "130088", modify
label def unit_id 3 "130298", modify
label def unit_id 4 "130502", modify
label def unit_id 5 "130591", modify
label def unit_id 6 "131844", modify
label def unit_id 7 "132808", modify
label def unit_id 8 "134057", modify
label def unit_id 9 "13466Q", modify
label def unit_id 10 "13471D", modify
I have tried "predict" thinking that it might expand the dataset automatically but that is not the right way to proceed. Please help!


Difference and Difference Design Model Specification

$
0
0
Hey everyone,

I have a question concerning a model specification for a Difference and Difference Design. I have cross sectional data over 7 time periods in one country. Treatment kicks in at round 5, 6 and 7. Treatment is definined on the individual level. I am running the following estimation controlling for leads and lags:


y_igt = delta_g + alpha_t + Treatement_igt * T + X_igt * sigma + u_igt


i= individual i living in distrcit g at time t.
Delta and alpha are district and time fixed effect
Treatment is defined on the individual level taken the value 1 if i was treated, o otherwise. Treatment is multiplied with a survey dummy t, taken the value 1 for round 2,3,4,5,6,7. This allows me to control for lead and lags taking round 1 as a baseline.
X are individual co variates and u clustered standard errors.

Is this model specification correct? Do I also need to include the interaction effect for round 1?

All the best,

Simeon



Which F stats should I look at with ivlasso?

$
0
0
Hi,

I'm running IV regression with ivlasso in Stata. It reports several different first stage F statistics, does any one know which one is the right one that I should look at? I'm clustering standard errors, so it should be one of the last three F stat. BTW I also wonder why LASSO gives so large F stats, does it use a different way to calculate than usual ivreg? Thanks!
Array

Dropping multiple missing observations

$
0
0
Currently I'm working on a project in which I use Item response theory. I have 8 variables from a lot of cases, which I would like to find the latent variable for. Now, the problem is that there is quite some missing data. I already worked out how to impute some of the values, but now I want to make another model, in which I drop all the observations which have missing values for more than 4 of the 8 variables, because imputing these can be seen as inreliable. I've looked around in multiple manuals, but can't seem to find what I'm looking for.

LaTeX font on eps figures: cannot get writepsfrag package to work

$
0
0
Hello,

I cannot get writepsfrag to work. I am trying to have the same fonts on my Stata-produced figures and the rest of my LaTeX document.

I am running the example from the writepsfrag help file:

Code:
* ssc install writepsfrag
#delimit;
twoway function y=normalden(x), range(-4 4)
text(0.125 0 "\textbf{\color{blue}{Normal PDF:}}")
text(0.090 0 "\(y=\frac{1}{\sigma\sqrt{2\pi}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\)")
xlabel(-4 "\(-4\sigma\)" -2 "\(-2\sigma\)" 0 "\(\mu\)" 2 "\(2\sigma\)" 4 "\(4\sigma\)")
xtitle("All text is set in {\LaTeX} font") ytitle("\(y\)");
graph export normal.eps, as(eps);
writepsfrag normal.eps using normal.tex, replace body(figure, caption("Normal Probability Density Function"));
#delimit cr
and adding it to Overleaf using the following code:

Code:
\documentclass[varwidth=true]{standalone}
\usepackage[utf8]{inputenc}
\usepackage{psfrag}

\begin{document}

\begin{figure}[htbp]
\centering
\psfrag{\\textbf{\\color{blue}{Normal PDF:}}}[c][c][1][0]{\normalsize \textbf{\color{blue}{Normal PDF:}}}
\psfrag{\\(y=\\frac{1}{\\sigma\\sqrt{2\\pi}}e^{\\frac{-(x-\\mu)^2}{2\\sigma^2}}\\)}[c][c][1][0]{\normalsize \(y=\frac{1}{\sigma\sqrt{2\pi}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\)}
\psfrag{0}[c][c][1][0]{\normalsize 0}
\psfrag{.1}[c][c][1][0]{\normalsize .1}
\psfrag{.2}[c][c][1][0]{\normalsize .2}
\psfrag{.3}[c][c][1][0]{\normalsize .3}
\psfrag{.4}[c][c][1][0]{\normalsize .4}
\psfrag{\\(y\\)}[c][c][1][0]{\normalsize \(y\)}
\psfrag{\\(-4\\sigma\\)}[c][c][1][0]{\normalsize \(-4\sigma\)}
\psfrag{\\(-2\\sigma\\)}[c][c][1][0]{\normalsize \(-2\sigma\)}
\psfrag{\\(\\mu\\)}[c][c][1][0]{\normalsize \(\mu\)}
\psfrag{\\(2\\sigma\\)}[c][c][1][0]{\normalsize \(2\sigma\)}
\psfrag{\\(4\\sigma\\)}[c][c][1][0]{\normalsize \(4\sigma\)}
\psfrag{All text is set in {\\LaTeX} font}[c][c][1][0]{\normalsize All text is set in {\LaTeX} font}
\resizebox{1\linewidth}{!}{\includegraphics{normal.eps}}
\caption{Normal Probability Density Function}
\end{figure}

\end{document}
However, the figure generated does not work the way it should:
Array
Am I doing something wrong? Is there an alternative way to produce eps files in Stata with a font that corresponds to the LaTeX document? I tried downloading LM Roman 10 and setting it as the Stata graph font. It works within Stata, but when I export graphs as eps, they revert back to the standard font .

Estimating asymmetrical confidence intervals for ICC using -nlcom-

$
0
0
Dear all
Out of curiosity I want to reproduce the calculations from:
Code:
cls
use https://www.stata-press.com/data/r16/judges, clear
icc rating target judge, format(%6.3f)
using a mixed regressions and possibly -nlcom-:
Code:
mixed rating, reml noheader nolog nofetable ||_all: R.target ||_all: R.judge
nlcom ///
    ( individual: exp(2*_b[lns1_1_1:_cons]) / (exp(2*_b[lns1_1_1:_cons]) + (exp(2*_b[lns1_2_1:_cons]) + exp(2*_b[lnsig_e:_cons]))) ) ///
    ( average: exp(2*_b[lns1_1_1:_cons]) / (exp(2*_b[lns1_1_1:_cons]) + (exp(2*_b[lns1_2_1:_cons]) + exp(2*_b[lnsig_e:_cons])) / 4) )
The point estimates are equal for the two methods. The same is not true for the confidence intervals.

The distribution of ICC is F and it is asymmetrical. This is clearly the problem for -nlcom-.

Without success, I've tried stepwise estimation by -nlcom- estimating the log variances in the first step with option post.
Note that I have to estimate the total variance as the sum of variances from two independent variables before (or at the same time) that I estimate ICC.
Also without success I've tried building estimation of log variances and ICC into one -nlcom- like
Code:
nlcom (log_a: ...) (log_b: ...) (ICC: exp(log_a - log_b))
I have the following questions:
  1. Is there a way of tricking -nlcom- into getting more exact confidence intervals? Or is it a lost cause?
  2. Are the approaches I've used with -nlcom- valid?
  3. If so, how do I do it right?
Looking forward to hear from you


Forecasting a variable

$
0
0
Hi,

I am using a panel dataset and for one of my variables I only have data 2014 to 2017.
For my dataset to balanced I need to forecast what the 2018 values for this variable.
Is there a way to do this in stata?

Thanks

Linear regression using a time variable

$
0
0
Hi all, I'm having some difficulties in doing a linear regression for my research.

My dataset consists of 361 respondents, with each of them being 'exposed' to two scandals on two different time occasions. That means that one respondents has four rows, two from before a scandal and two from after a scandal. We are working with four types of scandals and three response types in orde to check the effects on the dependent variable being "Watchingvids". Each respondent get exposed to two out of the 12 possible scenario's (scandals x responses) (see file)

How can I do a linear regression on Watchingvids taking into consideration that we need to check the difference between before and after the scandal?

Ik know this might sound vague, but please be free to ask additional questions so that I can make it more clear to you.

Thanks in advance for your help

Kind regards, Remi Letaief



Computing the consistency ratio and consistency index for analytic hierarchy process with mata

$
0
0
hey everyone,

i'm completely new to stata and mata as well. im trying to figure out if there is a possibilty of getting the consistency ratio and index out of a matrix by using mata. i guess eigenvalues and eigenvectors are playing a role here but thats all i know yet.

thanks for every help in advance!
best wishes

IV using panel data and fixed effects

$
0
0
Hi,

I am using panel data about women's wellbeing and influencing factors, therefore, I have been using a fixed effects panel regression (xtreg..., fe cluster(pidp)) so far.
To establish causality I am trying to use an IV regression. I assume that I need to account for the panel nature and fe, so so far have used the following regress:
xtivreg $Y1 $X1 ($Y2 = $X2), fe
Where Y1 is the dependent variable (life satisfaction)
X2 is the endogenous variable (housewife)
X1 are controls (marital status, children, region, income, year, age)
X2 is the instrument (gender employment ratio by region and year)

I have seen that there is also a regression for IV that accounts for the endogenous variable being binary, which mine is, using treatreg $Y1 $X1, treat($Y2 = $X2 $X1), which makes the first stage probit. However, this does not account for fe, please could someone help by providing guidance on which is the best method to use?

I have some other job status' as well, (unemployed, part-time, full-time), do these need to be included in the iv regressions? Or are they accounted for as '0' in the binary variable 'housewife'?

Many thanks,

Ash

Three-year volatility

$
0
0
I have been looking on the forum for a topic about three-year volatility but I didn't find what I wanted.

I have panel data which looks like: Firm ID CFO Fiscal Year
1 0.042 2011
1 0.057 2012
1 0.032 2013
1 0.045 2014
1 0.031 2015
1 0.030 2016
2 0.041 2011
2 0.048 2012
2 0.051 2013
2 0.050 2014
2 0.034 2015
2 0.043 2016
2 0.041 2016


I would like to calculate the three-year volatility (measured as standard deviation over the year t, t-1 and t-2). I have been trying to compute this but it didn't work.
Does anyone know how to do this?

Thank you in advance!

Latent class analysis using gsem - Cross validation

$
0
0
Hi,
I'm trying to learn LCA/LPA using gsem command in Stata by walking myself through Masyn (2013) - cited in SEM example 52 - and trying to replicate the steps mentioned in her empirical examples.

In her article, it is recommended to cross validate the optimal number of classes in large samples.
In particular:
  • Divide the sample in two subsamples A and B.
  • Obtain the optimal number of classes (say K-class) in one of the sub samples; say A - using a long procedure explained in the text.
  • Estimate model (1): a K-class model in subsample B fixing all parameters to parameters obtained from a K-class model in subsample A.
  • Estimate model (2): an unrestricted K-class model in subsample B
  • Test Model (1) against Model (2).
My question is: using the @sign on each coefficient and equation separately is the only way to estimate the restricted model (1)? (Which will be time consuming in case of having large number of indicators). Or is there any other ways to do it? Moreover, in case of the LPA, one would need to fix the estimated variance and covariance as well. In particular, hw can one restrict the entire e(b) matrix to specific numbers?

Thanks in advance,
Emma


Reference:
Masyn, K. E. (2013). 25 latent class analysis and finite mixture modeling. The Oxford handbook of quantitative methods, 551.

Averaging values from different "sum, detail outputs"

$
0
0
Hi everyone,

I was wondering if there is a chance to average "sum, detail outputs" over various months.
First, I sorted my dataset by month and then let Stata produce summary statistics for each month via: "bysort month: sum variable_of_interest, detail"
Now I would like to know the time series average across the different metrics calibrated, e.g., mean, standard deviation,95th percentile, and so on.

Does anybody have an idea how to solve this?

Any help is much appreciated.

Best,
Phil

Xtabond2 Newbie question

$
0
0
Hi,
I am a quite newbie to dynamic panels. For my project, I want to run a simple model on government approval rates on a dataset consisting of quarterly approval data for 20 countries around 30 years (some countries have less data). Because the approval rate for any given quarter is also dependent on the approval rate in the previous period, I use the following model:
xtabond2 approval l.approval noparties inflation growth coalition c.wars##c.right, gmm(approval inflation growth, lag(1 2) collapse eq(diff)) iv(since2 noparties coalition wars right, eq(diff)) robust

noparties refers to the number of parties, coalition to coalition government, wars, if there is a military conflict going on, and right is the ideological orientation of the government. Inflation and growth are yearly macroeconomic indicators.
So I have two questions:
1- Is the way I use xtabond2 correct?
2- If I change lag(1 2) to something like lag(1 1) or (lag 0 3), the coefficients of almost all the variables change quite dramatically. Why is this the case? Then how can I choose the proper values for lag intervals?

Compress doesn't work

$
0
0
Hi,
I tried to use command -compress- to save space, but it didn't work. Here is description of my variables and apparently there are many space can be saved. Is there any possible way to compress all the variables?
And I didn't set the format. The dataset was converted from .csv file directly.
Any code or link about this problem is much appreciated.

Code:
 . compress
  (0 bytes saved)

. de
                          
-------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------------------------------------------------------------
id       str184  %184s                  id
type str91   %91s                   type
city str46   %46s                  city

Help with coefplot command

$
0
0
Hi everyone!

I am trying to make a coefplot that shows the hazard ratios of a categorical variable (PQI-4 categories) based on 3 variables age (dichotomous), physical activity (dichotomous) and protein intake (dichotomous).

I want the graph to show me for each variable (age, physical activity, protein intake) the hazard ratios for each category of the PQI variable, that is:

age:
-PQI4 = 1 if age==0 .....
-PQI4 = 1 if age==1 .....
-PQI4 = 2 if age==0 .....
-PQI4 = 2 if age==1 .....
-PQI4 = 3 if age==0 .....
-PQI4 = 3 if age==1 .....
-PQI4 = 4 if age==0 .....
-PQI4 = 4 if age==1 .....

physical activity:
-PQI4 = 1 if physical activity==0 .....
-PQI4 = 1 if physical activity==1 .....
-PQI4 = 2 if physical activity==0 .....
-PQI4 = 2 if physical activity==1 .....
-PQI4 = 3 if physical activity==0 .....
-PQI4 = 3 if physical activity==1 .....
-PQI4 = 4 if physical activity==0 .....
-PQI4 = 4 if physical activity==1 .....

protein intake:
-PQI4 = 1 if protein intake==0 .....
-PQI4 = 1 if protein intake==1 .....
-PQI4 = 2 if protein intake==0 .....
-PQI4 = 2 if protein intake==1 .....
-PQI4 = 3 if protein intake==0 .....
-PQI4 = 3 if protein intake==1 .....
-PQI4 = 4 if protein intake==0 .....
-PQI4 = 4 if protein intake==1 .....



I have tried to do it in 2 ways but it goes wrong:

1)

foreach v of varlist edad_r prot_intake PA {
di "---`v'---"
stcox i.PQI4 $mod2 if `v'==0 $opt
estimates store `v'_0
stcox i.PQI4 $mod2 if `v'==1 $opt
estimates store `v'_1
}

estimates dir

coefplot (edad_r_0, asequation(Edad) \, mcolor(blue) ciopts(lcolor(blue)) label(edad ≤40 años)) ///
(prot_intake_0, asequation(Ingesta de proteinas) \, mcolor(blue) ciopts(lcolor(blue)) label(≤25% TEI)) ///
(PA_0, asequation(Actividad física) \, mcolor(blue) ciopts(lcolor(blue)) label(≤20 MET-s/week)) ///
(edad_r_1, asequation(Edad) \, mcolor(green) ciopts(lcolor(green)) label(edad >40 años)) ///
(prot_intake_1, asequation(Ingesta de proteinas) \, mcolor(green) ciopts(lcolor(green)) label(>25% TEI)) ///
(PA_1, asequation(Actividad física) \ , mcolor(green) ciopts(lcolor(green)) label(>20 MET-s/week)) ///
, keep(*PQI4) xline(1) title("Bivariate effects on price by car type") ///
baselevels eform xsize(6) ysize(9) legend(col(3) row(2))



It goes wrong, because it groups me by categories of the PQI variable ...

[ATTACH=CONFIG]temp_17944_1588167197709_474[/ATTACH]

2)


coefplot (edad_r_0, asequation(Edad) \ , mcolor(blue) ciopts(lcolor(blue)) label(edad ≤40 años)) ///
(prot_intake_0, asequation(Ingesta de proteinas) \ , mcolor(blue) ciopts(lcolor(blue)) label(≤25% TEI)) ///
(PA_0, asequation(Actividad física) \ , mcolor(blue) ciopts(lcolor(blue)) label(≤20 MET-s/sem)) ///
(edad_r_1, asequation(Edad) \ , mcolor(green) ciopts(lcolor(green)) label(edad >40 años)) ///
(prot_intake_1, asequation(Ingesta de proteinas) \ , mcolor(green) ciopts(lcolor(green)) label(>25% TEI)) ///
( \PA_1, asequation(Actividad física) \, mcolor(green) ciopts(lcolor(green)) label(>20 MET-s/sem)) ///
,keep(*PQI4) xline(1) title("Stratified analysis of PQI categories") ///
baselevels eform xsize(6) ysize(9) legend(col(3) row(2))



I don't know what I am doing wrong because if I don't remove the slash "\PA_1..." from the beginning of the last row, the HR of the PQI variable is not grouped by the 3 dichotomous variables



[ATTACH=CONFIG]temp_17943_1588166236897_652[/ATTACH]


Can someone help me?
Thanks!

Fixed effects regression tests in master's thesis

$
0
0
Hi,

I'm currently writing my master's thesis in economics and I'm analysing the effect of privatization of public housing on criminality in Swedish municipalities (regions). For this I have a strongly balanced panel data set with data on housing and reported crimes in Swedish regions - 290 panels (regions) and 11 years. So I guess this is a large N small T dataset.

For my regressions I use a fixed effects regression model with year fixed effects and a number of region control variables. I have IHS transformed the variables (inverse hyperbolic sine, similar to the natural logarithm). I use robust standard errors, clustered at the regional level. Simplified, my preferred stata command look like this;

- xtset region year -
- xtreg crime privatization controlvarlist i.year, fe vce(cluster region) -

Now the methodological choice seem reasonable to me due to the nature of the data. But I should probably include some tests to motivate my choices in the thesis. Test for heteroscedasticity and serial correlation, test that fixed effects is really the right approach, and such.

Which tests and other robustness-checks would you advise me to use? If you know a well written masters thesis with similar approach that I can use for comparison, I'd be thankful if you want to share it.

Thank you very much in advance for any help or tips. My supervisor haven't answered me for over a month so I have no one to turn to really.

Best regards,
Mikaela

Merging Files

$
0
0
Hi all,

Currently, I am using IHDS data (both rounds) for my project. In order to run individual fixed effects, I need to merge the datasets in order for it to become panel data. I am however, struggling to merge these two files. The basic premise of the data is that there was a survey done of 40,000+ plus households in 2005 and then 83% of those households were reinterviewed in 2012.

I have tried to follow this guide (https://www.ihds.umd.edu/guide-merging-files) multiple times in order to merge the files but it does not seem to work.

If anybody, by chance, is familiar with this data and is aware on how to merge the files. It would be greatly appreciated.

Thank you
Viewing all 72776 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>