Quantcast
Channel: Statalist
Viewing all 72776 articles
Browse latest View live

Identifying observations that meet specific criteria

$
0
0
Hi everyone, I have a dataset that is arranged as in the example table below. There are millions of cases that have happened in several hundred facilities. Each case happened during a specific year over a period from 2010 to 2014.

I have two questions:

1) How can one identify facilities that had at least one case during EACH year of the study period? (e.g. facility 4, in red)
2) How can one identify facilities that had at least one case during the first year (2010) AND the last year (2014) of the study period? (e.g. facility 1, in green)




Thank you very much for your help.

xtivreg and changing the first-stage model

$
0
0
I am using xtivreg. I have two endogenous variables. Both are dummy variables. This means that xtivreg is considering two first stage regressions. Both regressions imply a linear probability model since the endogenous variables are dummy variables. In one of the two first stage regressions, one of the regressors has a coefficient larger than one. This seems implausible since in a linear probability model the coefficients should be between zero and one (in particular, the coefficient of the first order 'age' term when I consider a continous cubic age function). This seems to suggest that I should change the first stage linear probability model, perhaps to a probit model. As far as I am aware, Stata does not allow me to do this. Or does it, and otherwise is there a way I could deal with the problem?

Meanwhile, I assume that I am right that xtivreg is estimating a lienar probability model in the first stage if the endogenous variable is a dummy variable. From Stata's xt manual, unfortunately, it is not so clear to me what model precisely is being estimated in the first stage, which also seems to depend on if the fe or re options are specified. Could you please provide some clues for the first stage model too?

Solved

$
0
0
Posted a question but quickly solved it. Can't figure out a way to delete. Thanks!

Identifying observations that meet specific criteria

$
0
0
Hi everyone, I have a dataset that is arranged as in the example table below. There are millions of cases that have happened in several hundred facilities. Each case happened during a specific year over a period from 2010 to 2014.

I have two questions:

1) How can one identify facilities that had at least one case during EACH year of the study period? (e.g. facility 4, in red)
2) How can one identify facilities that had at least one case during the first year (2010) AND the last year (2014) of the study period? (e.g. facility 1, in green)

Thank you very much for your help!

Array



Panel IV Regression (xtivreg) and heteroskedasticity tests

$
0
0
Hi,

I have panel based data on which I am running an IV, RE model using xtivreg.

I am wondering wheter there are any ways to perform heteroskedasticity tests e.g. Pagan-Hall or Breusch-Pagan on the model. I have evaluated using ivhettest, but from what I gather, this only works on non-panel data modelled using ivreg.

Does anyone know how I can go about carrying heteroskedasticity tests after xtivreg?

Thanks,
Chada

-mvport- new version is updated

Help with infix

$
0
0
Dear Statalist users,

I want to have a correlation table in an excel file. So, (1) I read http://www.statalist.org/forums/foru...x-i-m-confused (2) and I use the following steps:

Code:
local mainDog "height weight age length price shipping salesTax exportTax importTax"

pwcorr `mainDog'
log close

clear

infix 9 first str13 v1 1-13 str6 v2 18-24 str6 v3 27-33 str6 v4 ///
      36-42 str6 v5 45-51 str6 v6 54-60 str6 v7 63-69 str6 v8 72-78  using Correlation.log


export excel using "Correlation", replace
Problem:
All of negative correlations become positive (negative signs dropped) in the resulting output excel file. Why does this happen? And is there a way to fix it ?

Thank you


Matsave is Stata

$
0
0
I need help with figuring out the Matsave command in Stata. Long story short, I am using the following Stata code version 9, which has a Savemat command:

svy: mean testever, over(invch)
mat invchns1=e(b)'
svy: mean test12mo, over(invch)
mat invchns2=e(b)'
svy: mean test12m_yes, over(invch)
mat invchns3=e(b)'

svy: mean testever, over(hivriskfactor)
mat hivrisk1=e(b)'
svy: mean test12mo, over(hivriskfactor)
mat hivrisk2=e(b)'
svy: mean test12m_yes, over(hivriskfactor)
mat hivrisk3=e(b)'

nptrend testever, by(invch)
nptrend test12mo, by(invch)
nptrend test12m_yes, by(invch)

mat figure2a=invchns1,invchns2,invchns3
mat figure2b=hivrisk1[2,1],hivrisk2[2,1],hivrisk3[2,1]
mat figure2=figure2b \ figure2a
matname figure2 testever test12mo test12m_yes, c(1...) explicit
matname figure2 riskfactor norisk lowrisk medrisk highrisk, r(1...) explicit
preserve
drop _all
SaveMat figure2 "figure 2"
restore

I am trying to create similar matrices using the Matsave command in Stata version 13 (I have already installed the Matsave command) however when doing so, I get the following error codes:

svy: mean testever, over(invch)
mat invchns1=e(b)'
svy: mean test12mo, over(invch)
mat invchns2=e(b)'
svy: mean test12m_yes, over(invch)
mat invchns3=e(b)'

svy: mean testever, over(hivriskfactor)
mat hivrisk1=e(b)'
svy: mean test12mo, over(hivriskfactor)
mat hivrisk2=e(b)'
svy: mean test12m_yes, over(hivriskfactor)
mat hivrisk3=e(b)'

nptrend testever, by(invch)
nptrend test12mo, by(invch)
nptrend test12m_yes, by(invch)

mat figure2a=invchns1,invchns2,invchns3
mat figure2b=hivrisk1[2,1],hivrisk2[2,1],hivrisk3[2,1]
mat figure2=figure2b \ figure2a
matname figure2 testever test12mo test12m_yes, c(1...) explicit
matname figure2 riskfactor norisk lowrisk medrisk highrisk, r(1...) explicit
(ERROR: number of names and rows() or columns() ranges do not match conformability error)
preserve
drop _all
mata Matsave figure2 "figure 2"
(ERROR: invalid expression)
restore

I'd like to know why the rows/columns for my matrix are off and how to properly write the expression in order to output Figure 2.



Despite suset, is there any other method to test coefficient difference between two regressions?

$
0
0
Hello, everyone!
The most common way to test coefficient difference between two regressions is to run the command like this:
reg y x if z==1
est store m1
reg y x if z==0
est store m2
suset m1 m2, vce(cl firm)
test [m1_mean]x = [m2_mean]x

But in my study there are two problems when using this method:
1. I need to control a lots of dummy variables (firm fixed effects) in two regressions. It works well if I use -areg- but it will exceed the matsize not if I use -reg-. Yet the -areg- didn't support -suset-.
2. Meanwhile, I have 20000 firms in my sample, and when I try to use -suset m1 m2, vce(cl firm)-, it will exceed the matsize as well.

Could anyone help me?
Many thanks in advance!

Shuo

Importing Excel files using a loop: avoid importing a file multiple times

$
0
0
Hi,
I am trying to create a panel data.
The raw data is multiple excel files (ranging from one to three) for each year, and I am using the following loop commands.
(Please note that I have manually renamed excel files to "year_filenumber" where filename=1,2,3 representing the number of excel files in the year. Thus, the loop works)

Code:
*import .xls files into .dta
foreach year in 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015{
    forvalues i=1/3 {
        capture import excel "`year'_`i'", sheet("`year'_`i'") firstrow case(lower) allstring clear
        capture saveold "`year'_`i'", replace
        }
}


*create yearly .dta files
clear all
foreach year in 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015{
    forvalues i=2/3 {
        use "`year'_1", clear
        capture append using "`year'_`i'"
    }
    generate time="`year'"
    saveold "`year'", replace
}


*append yearly .dta files
clear all
use 2001
foreach year in 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015{
    append using `year'
}
The problem is that years which have data in only one or two excel files are imported extra times.
For example, 2005_1 is the only excel file for year 2015. When it gets imported in Stata, I get three .dta files (2015_1, 2015_2, 2015_3) which get appended into one 2015.dta file (leading to duplicates).

Is there any way to correct this loop (instead of using "duplicates drop" command in the end) ? Thank you.

Repetitive but simple programming using foreach

$
0
0
I wrote a program for repetitive but simple work using foreach. But it didn't work.

For a variable of fc1, I recoded the variable, creating a new variable of r_fc1 as belows:
recode fc1 (1=100)(2=75)(3=50)(4=25)(5=0), gen(r_fc1)

But I wanted to make it simple because I should do that for fc2...fc5. So I wrote as belows:
foreach N of numlist 2/5 {

recode fc'N' (1=100)(2=75)(3=50)(4=25)(5=0), gen(r_fc'N')
}

But it resulted in the following:
fc ambiguous abbreviation
r(111);


What is wrong with it?
I would like to do the same thing to the variables of comp1...comp4, coord1...coord3.
Can you give some advices on a right, smarter programming?

Calculating the percentile rank

$
0
0
How might I go about calculating the percentile rank of scores in stata, grouping by period?

Xtologit and interaction terms

$
0
0
I'm using xtologit to run an ordered logistic regression on my ordinal outcome (3 possible values) to see the effect of a 'treatment' diagnosis (which is binary, 0 or 1) over time. I have also included covariates gender, education and wealth into my model. I have panel data with 6 waves.

In my first model I am including the interaction term wave x diagnosis as covariate, along with education, wealth and gender.

In my second model I am looking at the effect of education, wealth and gender over time so I include the interaction terms wave x education, wave x wealth and wave x gender.

For model 1, the wave x treatment interaction term coefficient is significant and negative. Would I be correct to interpret this as those receiving treatment (diagnosis) will exhibit negative change in the outcome, over time?

Also, I presented my coefficients from xtologit as odds ratios. Is it good practice to interpret the coefficients numerically? Or should I only compare their magnitudes?

Thanks

Panel error correction model : asymmetric adjustments

$
0
0
Dear all,
I have run (Stata 14.1) a panel error correction model using -xtpmg- (by Edward Blackburne and Mark Frank). Does anyone know if it's possible to further test for asymmetries in the adjustment to long term relationships with panel data? or that this can only be done on a country level?
Thanks for your time,
Anat

Estimating elasticity using marginsplot or lpoly?

$
0
0
This is a followup to a previous post. I can't figure out how to edit that so I apologize in advance for creating two threads. I don't think I was very clear with my question previously.

I have data on whether a person completed primary school or not (0/1). I am interested in knowing how the probability of completion varies with a percentage change in income, holding other variables constant. Essentially, probability of having completed changes y% when income changes x%. I want to do this for 5 different waves of data and 3 groups in each wave of the data (1 million observations total).

My hypothesis is that for some groups completion is very responsive to income (steep) while for some it is not responsive (flat)


1. I want to be able to use sample weights.
2. I want a grid with 5 graphs- one graph for each wave of data and each graph has 3 plots (or elasticity lines)- one for each group.


The way I am doing this now (simplified version)

Code:
foreach w in 1 2 3 4 5 {
foreach g in 1 2 3 {
logit completed logincome i.rural i.married statedumm1-statedumm10 agedum1-agedum7 [pw=wt] if wave==`w' & group==`g'
predict yhat_`w'`g' if e(sample)
}
}

gen yhat==.
foreach w in 1 2 3 4 5 {
foreach g in 1 2 3 {
replace yhat=yhat_`w'`g' if wave==`w' & group==`g'
}
}

//yhat is predicted probability of completed primary

(lpoly yhat logmpce [aw=wt] if group==1) (lpoly yhat logmpce [aw=wt] if group==2) (lpoly yhat logmpce [aw=wt] if group==3), by(wave)
I suspect I am doing something wrong, and welcome any suggestions on how I can do this better.

Would using marginsplot be more advisable?


Old post is here.


count by repeated time values

$
0
0
Hello,

I have a data set that has a date for each patient attendance, and then my variables of interest (what sport they do) etc..

However, when i try to tsset i get repeated time values error.

the date does not appear to be a string, however, it is formatted to appear as a date

format %tddd/nn/CCYY date

i can tab data by the day,

i.e. tab sport if date ==19943

but i need to count many variables by date and compare daily counts by sport and other var.

I am unsure how to proceed please help.

-matt


example
date (DMY) Patient | dr | sex | sport
4/06/1986 1 1 1 1
5/06/1986 1 1 1 2
5/06/1986 2 1 1 1
6/06/1986 1 1 1 2
6/06/1986 2 1 2 1
7/06/1986 1 1 2 3
7/06/1986 2 1 1 3
7/06/1986 3 1 2 3
7/06/1986 4 1 2 2
8/06/1986 1 1 1 1
8/06/1986 2 1 1 1
8/06/1986 3 1 1 1
8/06/1986 4 1 2 1
8/06/1986 5 1 2 2

Two-way robust clustering with panel data

$
0
0
I am looking for Stata procedures to handle two-way robust clustering for panel data that addresses (i) cross-sectional dependence, and (ii) serial-correlation. Ideally (though this might be a pipe dream), it would be nice if I could apply these to a WLS/FGLS estimator, but I would be happy with a procedure(s) that applied this to OLS. Thank you for any advice you can give me.

areg absorb with two variables

$
0
0
I have data with 3 sources of variation (county, year, month) and want to include county-year interactions (1000 counties x 20 years = 20000 dummies), county-month interactions (1000 counties x 12 months = 12000 dummies), and year month interactions (20 years x 12 months = 240 dummies). Is there any way to include both county-year interactions and county-month interactions using “areg, absorb()”? Thank you!

How to generate a norm-diff statistic?

$
0
0
Hi Members, I'm trying to generate a "norm-diff statistic", in order to include as part of my summary statistics. However, I have no idea how to go about it.
I'm using Stata 13. Any help/ guidance is most welcome and will be much appreciated.

So far I have browsed the internet and asked friends, but no luck.

Regards,

Jeje.

multivariate logistic regression

$
0
0
Dear Stata users,

I have a project with multiple independent variables and one binary outcomes that I want to assess I am interested in looking for the predictors.
here is what I have done
1- check and visualize data.
2-converted multi-categorical variables into multiple dummy variables.
3-categorized most continuous variables.
4-assessed the association between the independent variables and the outcomes using logistic regression.
5 advanced those variables with P <0.1 (I have reach to this point)
6 assessed for co-linearity
7 form your final multivariate log regression using "mvreg" function.

question 1: is this process correct? is there anything that I should do or adjust?
question 2: mvreg is giving coef. Is there a way to convert them into Odds Ratios, apart from doing it manually?
question 3: As I was searching for a way to get OR, I tried logit and regress functions for the same independent variable and outcome and I got different coef. Am I doing itwrong ?

I really appreciate your feedback

Amin

thanks


Viewing all 72776 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>