Collapse with Tempfile?

August 22, 2016, 8:30 pm

≫ Next: Calculating consumer surplus in Stata

≪ Previous: writing ado file with panel data estimator

I am using preserve / collapse / export / restore in a loop. I then append all the exported files to combine them into one file.

That worked fine for a small number of files, but at this point, it is saving out more than 10k files, and that's very slow and somewhat unnerving.

I was hoping to be able to save the results of the collapse command to a tempfile instead of exporting it.

This is my original code:

Code:

foreach v of varlist $varlist {
    foreach c of global names {
    preserve
    collapse (mean) mean`v'=`v' (count) count`v'=`v'     
    gen name1 = "`v'"
    gen name2 = `"`c'"'
    export excel name1 name2 mean`v' count`v' using "`v' `g'.xlsx", replace
    restore
    }
}

Instead, I have tried the following:

Code:

tempfile temptest
save `temptest', emptyok

foreach v of varlist $varlist {
    foreach c of global names {
    preserve
    collapse (mean) mean`v'=`v' (count) count`v'=`v'     
    append using `temptest', force
    save using `temptest', replace
    restore
    }
}

I am getting an "invalid file specification" error.

Any suggestions?

Thank you.

↧

Calculating consumer surplus in Stata

August 22, 2016, 8:30 pm

≫ Next: Weighting with kdensity and/or rifreg

≪ Previous: Collapse with Tempfile?

I would like to calculate consumer surplus based on the result of the following regression results.
The demand function is P=alpha + beta*ln(q_electric/totex). Assuming that the equilibrium P is 8.9 and Q is 950.3. Please help me with the codes if I have combinations of alphas and betas from several regressions.

Many thanks!

| Robust
ln_q_electr~y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
ln_ttotex | .2848769 .0788145 3.61 0.000 .1304034 .4393504
tariff | -.0279971 .00634 -4.42 0.000 -.0404232 -.015571
fsize | .0016364 .0006948 2.36 0.019 .0002745 .0029983
hgc1 | -.3618684 .0392349 -9.22 0.000 -.4387674 -.2849693
hgc2 | -.2137475 .0335509 -6.37 0.000 -.2795061 -.147989
hgc3 | -.1073966 .024404 -4.40 0.000 -.1552276 -.0595657
tenure1 | .1457273 .0188806 7.72 0.000 .1087219 .1827326
urb | .357512 .019396 18.43 0.000 .3194965 .3955275
aircon_qty | .1489894 .0217826 6.84 0.000 .1062962 .1916826
pc_qty | .0942096 .0189774 4.96 0.000 .0570146 .1314046
ref_qty | .6046353 .0290229 20.83 0.000 .5477514 .6615192
tv_qty | .205826 .0175987 11.70 0.000 .1713331 .2403188
cellphone_qty | .0860953 .009818 8.77 0.000 .0668524 .1053383
_cons | 2.327596 .8431563 2.76 0.006 .6750399 3.980152
-------------------------------------------------------------------------------

↧

Weighting with kdensity and/or rifreg

August 22, 2016, 9:33 pm

≫ Next: Missing ticks and labels with graph twoway and yscale

≪ Previous: Calculating consumer surplus in Stata

I am running into some problems when trying to correct for sample selection using weights with the kdensity and rifreg commands.

Both of these commands allow for analytical weights but when I apply my weights, it does not seem to have much of an effect (which is strange). I've tried running related estimates using regress and logit with the weights specified using aweight or pweight... the weights make a large difference with these commands. Obviously this is not statistically sound but I've tried manually applying the weights and then running kdensity and rifreg... they also had a big effect with this method.

I've seen a few related questions online about applying survey weights with these commands and I'm wondering if anyone has any advice/insight. Thanks in advance!

↧

Missing ticks and labels with graph twoway and yscale

August 22, 2016, 11:51 pm

≫ Next: store selected betas and alphas in several regression

≪ Previous: Weighting with kdensity and/or rifreg

Kia ora - when I run the following commands the resulting graph does not automatically fill in ticks and labels between 0 and 20 on the y-axis. I can specify ticks and labels using axis scale options within graph twoway, but I have a number of similar graphs I want to produce and I would like Stata to automatically produce ticks and labels on the y-axis. The reason I am using graph twoway instead of graph bar is so I can include an additional plot of confidence intervals overlain over the bars (as shown in http://www.ats.ucla.edu/stat/stata/faq/barcap.htm).

sysuse auto
collapse (mean) mpg, by(foreign)
graph twoway (bar mpg foreign), yscale(range(0))

Has anyone encountered this issue before and has a solution? Thanks for your help, Jonathan

↧

store selected betas and alphas in several regression

August 23, 2016, 12:53 am

≫ Next: Two different endogeneity test results after ivreg2

≪ Previous: Missing ticks and labels with graph twoway and yscale

Hi, I would like to ask how to store selected betas and alphas.
For example, I have model 1
Y1=alpha11+ beta11*X_11+ beta12*X12...

model2
Y2=alpha21+ beta21*X_21+ beta22*X22...

I would like to use alpha11 and alpha21 and beta11 after running all regressions/models.

Many thanks!

↧

Two different endogeneity test results after ivreg2

August 23, 2016, 2:52 am

≫ Next: Limit of variables (dummy and clustered variables)

≪ Previous: store selected betas and alphas in several regression

Hi,

I am using the -ivreg2- command with the -gmm2s r endog()- options set. I would like to export the endogeneity test results to a table. I noticed that the test statistic and p-value reported in my regression output:
"Endogeneity test of endogenous regressors: 4.048
Chi-sq(2) P-val = 0.1322"

are stored in scalars e(estat) and e(estatp) after estimation. While the help file of -ivreg2- does not mention e(estat) among the saved results at all. Instead, it mentions e(archi2) and e(archi2p) as "Anderson-Rubin chi-sq test of significance of endogenous regressors", which are not reported in the regression output and are not further explained in the help file. In my application, however, e(estatp) and e(archi2p) seem to suggest opposing conclusions on endogeneity. Could someone please explain the practical difference between the two tests or suggest which one of them to report/consider?

Thanks,
Peter

↧

Limit of variables (dummy and clustered variables)

August 23, 2016, 2:58 am

≫ Next: margin after ml

≪ Previous: Two different endogeneity test results after ivreg2

Hi,

I was wondering,how does fixed-effects dummy-variables and clustered variables count to the limit of variables in Stata, because I am choosing between IC and SE.

Best regards,

Viktor Studenyak

↧

margin after ml

August 23, 2016, 3:37 am

≫ Next: IVREG2 and bootstrap - > F test of excluded instruments disappears

≪ Previous: Limit of variables (dummy and clustered variables)

Dear Statalist,

I have written a model in STATA that uses 3 equations (multinomial logit). The model is estimated using simulated maximum likelihood (with ml maximize command) because of the presence of unobserved heterogeneity. So mlogit cannot do the job here. I would like to use margins command to compute the marginal effects. Is that possible?

Thanks in advance,

Eleni Yitbarek

↧

IVREG2 and bootstrap - > F test of excluded instruments disappears

August 23, 2016, 6:49 am

≫ Next: Sos: Ec3sls

≪ Previous: margin after ml

Dear Statalister,

I´ve come accross a problem concerning the ivreg2 command while bootstrapping. Once the bootstrap is complete, the output does not show me the F test of excluded instruments:
or the Sanderson-Windmeijer multivariate F test of excluded instruments? Does using the bootstrap command kill the F-test somehow?

My command looks the following:

bootstrap, reps(10) seed(1): ivreg2 DEPVAR VAR 1 VAR 2 VAR 2 VARn (Endogenous VAR = Instrument), robust ffirst level(90)

Maybe you can help. Extracting the numbers using disaply e(first) or return list does not work either.

Thanks,
Chris

↧

Sos: Ec3sls

August 23, 2016, 7:34 am

≫ Next: Problem in reshaping data to long and preparing panel data

≪ Previous: IVREG2 and bootstrap - > F test of excluded instruments disappears

Dear all,
I focus on the spatial panel simultaneosu equation study recently, trying to estimate simultaneous equations with error components (system estimation), I know we can use xtivreg to estimate single equation(i.e., xtivreg,ec2sls). But how to program EC3SLS(Baltagi) in Stata? I refer to some literatures and books, see for example, Baltagi et al.,(2006) and his book (Econometric Analysis of Panel Data, p.123), but I couldn't fix this matter due to my limited knowledge and program skill, so someone who know the issue I mentioned should help me? some suggestion or provide some useful information. I feel very appreciate. Many thanks.

Best wishes,
Yantuan,
Aug. 23, 2016

↧

Problem in reshaping data to long and preparing panel data

August 23, 2016, 8:28 am

≫ Next: Outreg2 - Use addstat to display statistics from lstat and fitstat

≪ Previous: Sos: Ec3sls

Please consider: id_pu as unique codes for pupils; id_sc as unique codes for schools; sc_mean_tsmat mean school test scores; year_pb year the tests were taken

My data: I am using a data set that has repeated observations for schools over four years time (2007 2009 2011 and 2013), but that is cross-sectional for the students in each of the years. My variable of interest is the school mean test scores (sc_mean_tsmat). Over these four years time I have a total of 7,845,381 pupils in 38,000 schools. I have attached a print screen of data editor.

My question: I would like to assess the change in mean school test scores (sc_mean_tsmat) over the years.

My problem: I haven't managed to put my data in a long shape and consequently prepare a panel data. One of the problems is that one single school appears multiple times in my data for one given year because of the multiple pupils in each unique school. Moreover, the school id code is not in an ascending order.

CODEs I have tried:

sort id_sc

egen id_sc_long = group(id_sc)

move id_sc_long id_sc

reshape long sc_mean_tsmat, i(id_sc_long id_sc) j(year_pb)

I GOT this ERROR message:

year_pb already defined -- data already long
r(110);

Which is not the case, as the data still looks like what you can see in the attached file.

I am using Stata 12. Please consider that the unique codes for the schools - id_sc - is sorted.

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id_pu id_sc) float(sc_mean_tsmat year_pb)
5766055 11000260 256.10498 2007
5766033 11000260 256.10498 2007
5766149 11000260 256.10498 2007
5766144 11000260 256.10498 2007
5766153 11000260 256.10498 2007
5766050 11000260 256.10498 2007
2788348 11000260 256.10498 2011
4192243 11000260 256.10498 2011
680873 11000260 256.10498 2011
5766098 11000260 256.10498 2007
1470028 11000260 256.10498 2011
5766123 11000260 256.10498 2007
(...)
3908610 11000317 238.25754 2011
1235942 11000317 238.25754 2011
5312407 11000317 238.25754 2011
10883954 11000317 238.25754 2013
10883879 11000317 238.25754 2013
3343479 11000317 238.25754 2011
10035781 11000317 238.25754 2009
10002566 11000317 238.25754 2009
5766224 11000317 238.25754 2007
5766209 11000317 238.25754 2007
10883952 11000317 238.25754 2013
3343480 11000317 238.25754 2011
10035817 11000317 238.25754 2009
10035768 11000317 238.25754 2009
10883921 11000317 238.25754 2013
10035771 11000317 238.25754 2009
698439 11000317 238.25754 2011
5766210 11000317 238.25754 2007

Any help on this matter would be great. Thanks.

↧

Outreg2 - Use addstat to display statistics from lstat and fitstat

August 23, 2016, 9:24 am

≫ Next: MANOVA not significant, margins significat

≪ Previous: Problem in reshaping data to long and preparing panel data

Greetings,
I want to use outreg2 to report various logit model results including: AIC, BIC, log-likelihood for full model, chi-squared stat, Nagelkerke/C-U R-squared, and the percent predicted correctly. I am able to get most of these (except the percent predicted "correctly" using outreg2 using the following code:

Code:

logit y x1 x2 x3 , r
est store M1, title (Model 1)
fitstat
outreg2 using Table.xls, dec(3) addstat(AIC, `r(stataaic)', BIC, `r(statabic)', Log-Likelihood Full Model, `r(ll)', chi-square test, `e(chi2)', Nagelkerke_R2, `r(r2_cu)') groupvar(x1 x2 x3) ctitle("Model 1") replace

Alternatively, I can get the percent predicted using this code:

Code:

logit y x1 x2 x3 , r
est store M1, title (Model 1)
lstat
outreg2 using Table_IndependencePreferences.xls, dec(3) addstat(Percent Correct, `r(P_corr)') groupvar(x1 x2 x3) ctitle("Model 1") replace

I cannot, however, use fitstat and lstat without the return from one overwriting the other. So I tried the following to save the lstat results as a local macro and then run fitstat and then report both but I continue to get a "syntax error" (r198).

Code:

logit y x1 x2 x3 , r
est store M1, title (Model 1)
lstat
mat lstats = r(S)
local PctCorr: display %4.1f lstats[1,9]
fitstat
outreg2 using Table_IndependencePreferences.xls, dec(3) addstat(AIC, `AIC', Pseudo R-squared, `e(r2_p)',chi-square test, `e(chi2)', PctCorr, `PctCorr') groupvar(x1 x2 x3) ctitle("Model 1") replace

I am suspicious that fitstat is still over writing the local macro that I've written, but I am not sure. Any suggestions? Please give an example using code if possible as I am a novice with respect to matrix language and manipulation.
Kind regards and thanks.

↧

MANOVA not significant, margins significat

August 23, 2016, 9:51 am

≫ Next: Quantile regression with semi-continuous outcome with a right-skewed distribution

≪ Previous: Outreg2 - Use addstat to display statistics from lstat and fitstat

I am new to Stata and i am stacked with the following:

Code:

 manova rr_rate14  edssprg_14 mri_c = i.nabs_cat i.gender c.age_trt_ini c.treat_drt c.dur_ms_2014 c.EDSS2012 c.rr2yrsbefore
/* and then*/
regress edssprg_14 nabs_cat c.edssprg_14#c.nabs_cat, vce(ols)

the overall MANOVA model is significant, but the individual models and the regressions are not.
the margins are again significant. How do I interpret?
also is that correct?

Code:

mvreg edssprg_14 i.nabs_cat c.edssprg_14#c.nabs_cat, vce ( ols )

My data contains a level variable (4 levels) which is the predictor for clinical (edssprg_14 and rr_rate14) and imagistic progression, along with several other variables as age gender treatment disease duration.

↧

Quantile regression with semi-continuous outcome with a right-skewed distribution

August 23, 2016, 10:35 am

≫ Next: Creating portfolio daily returns

≪ Previous: MANOVA not significant, margins significat

Hello, New to StataList but have found it very helpful to read others' posts and responses. Thanks to everyone for creating such a helpful forum.

I'm trying to decide whether it is appropriate to estimate a quantile regression with a semi-continuous outcome variable (e.g., a behavioral development scale ranging from, say, 0-12), especially if this outcome variable is right-skewed such that 30% of observations have an outcome variable of 0, 10% a value of 1, 10% a value of 2, and then roughly 5% each for the remaining values 3 to 12. I know quantile regression and conditional quantile regression are best suited for truly continuous outcome variables, and that ordered quantile regression and other extensions of quantile regression have been developed. However, I am curious whether folks think that (conditional) quantile regression will still generate unbiased estimates as long as I focus on estimating the location of quantile above, say, the 30th percentile, at which point values of the outcome variable become more evenly distributed across values above 0.

Thanks for any insight/suggestions people may have.
Jay

↧

Creating portfolio daily returns

August 23, 2016, 12:15 pm

≫ Next: Parmest: Adding variable label to ANOVAs

≪ Previous: Quantile regression with semi-continuous outcome with a right-skewed distribution

Hello all,

I am working on analyst recommendations and I’m trying to construct portfolios investing $1 on each buy recommendation and holding it for a period of time. I have to create daily portfolio returns in order to check for alpha. My data looks like;

date cusip retx year Recommendation Count
02jan2008 11111111 -.000 2008 0
03jan2008 11111111 -.014 2008 1
............... 11111111 -.005 2008 1
............... 11111111 .018 2008 1
31dec2008 11111111 -.007 2008 1
02jan2008 22222222 -.001 2008 1
03jan2008 22222222 -.021 2008 2
............... 22222222 -007 2008 2
............... 22222222 .010 2008 3
31dec2008 22222222 .003 2008 1
............... 33333333 ...... .2009 ..
................ 44444444 ..... 2010 ..

To exemplify, the weighted portfolio return on date 03jan2008 would be equal to ((-.014*1)+(-.021*2))/(1+2). As data I have daily returns series for every stock in sample portfolios for each year and the total buy recommendation counts on any date for each stock (count decreases when the stock is dropped after the holding period). I was trying to figure out a way to create daily return series for each year but I was not able to do much. Looking for your recommendations.

Best regards,
Gökalp

↧

Parmest: Adding variable label to ANOVAs

August 23, 2016, 12:55 pm

≫ Next: Need to Create A Matched sample based on 3 characteristics

≪ Previous: Creating portfolio daily returns

Have to run a series of ANOVAs. Each individual dependent variable "v" is in an ANOVA with IV "ggen." I need to write the results of each separate ANOVA to the same file. The below code accomplishes all of that.

Code:

tempfile model_1
tempfile models_all

preserve
drop _all
save `models_all', emptyok
restore

local myvars1  v*
di "`myvars1'"
foreach v of varlist `myvars1' {
     anova "`v'" ggen
     parmest, saving(`model_1', replace)
        preserve
        use `models_all', clear
        append using `model_1'
        save `"`models_all'"', replace
        restore
     }

use `"`models_all'"', clear

The only thing stumping me is how to add labels to each set of results (`model 1') so that when I use `models_all' I can see that first set of results is associated with v125, the second set with v128, and so on. Make sense?

Thanks,

Bryan

↧

Need to Create A Matched sample based on 3 characteristics

August 23, 2016, 1:03 pm

≫ Next: Specifying multiple criteria for -foreach-

≪ Previous: Parmest: Adding variable label to ANOVAs

Hello all,

I'm working on a first year summer paper and am having trouble creating a matched sample. I am using data from the COMPUSTAT database. I want to look at US firms who have recently moved their headquarters, and created an indicator variable for these firms by "gvkey" (a unique firm identifier) called "usmoveind". This value is equal to one for firms who have moved their headquarters during the time period I wish to observe. I have also created an indicator for the 1-3 years prior to the year in which the firm moved it's headquarters called "apremoveind". It should be noted that "apremoveind" is a function of the individual firm (captured by the "gvkey" variable) and that firm's specific year of interest (variable name "fyear").

For example, here is how I created an indicator for one of the gvkeys to indicate that I wished to look at the 3 years before and after the move which was completed in the year 2001:

Code:
gen usmoveind=1 if gvkey==5959 & fyear>1997 & fyear<2005

And now, this is how I created a unique identifier for the 3 years before the move for this particular firm:

gen apremoveind=1 if gvkey==5959 & fyear>1997 & fyear<2001

I also created a variable to identify the industry in which the firm operates called "sic2". This variable was applied to the entire dataset, not just the US firms who moved. Finally, I created a measure for GAAP effective tax rates, called "gaap_etr" which is my primary variable of interest..

What I would like to do is identify a matched sample of firms (by "gvkey") with respect to "gaap_etr" in the "apremoveind" period for firms who have the "usmoveind" ==1. It is absolutely critical that the matched sample have the exact same same value for "sic2". Thus, for my US move firms, I would like to find comparable firms (in the same industry) who have a "gaap_etr" which is within a certain margin (say, +/- 1%) of the "gaap_etr" for my US move firms DURING the US firms "apremoveind" period (which ranges from 1-3 years and is a function of the firm year, variable name "fyear", as well as the unique firm identifier, variable name "gvkey"). Ideally, I would need the 3 year period for the matched firm to be the same 3 year period as the US Move firm, thus, the "fyear" variable should cover the same 3 year period.

I would also ideally like to match on two other variables, total assets (variable name "at") and revenues (variable name "revt"). Trying to match exactly on "sic2" and find a comfortable range for "gaap_etr" (most important matching characteristic), "at", and "revt" all in the same 3 years which cover the "apremoveind" for my US Move firms is giving me a bit of a headache. I would like to get a list of these control groups, create an indicator for them (call it "controlind"), and run some analyses to see how they compare to my "usmoveind" firms during the 3 year period before the headquarters move.

Any help on this matter would be greatly appreciated! I hope this is clear, and would be happy to clarify!

Thanks,

Erik

↧

Specifying multiple criteria for -foreach-

August 23, 2016, 5:39 pm

≫ Next: Can GMM apply for a panel data with large T ?

≪ Previous: Need to Create A Matched sample based on 3 characteristics

Hello, I'm working on building a -foreach- statement in hopes of making more efficient the search criteria for a -replace- command. Below is my code, in which I examine the variable plicd for specific values, comparing two methods:
1. Using -foreach- to make variable tagfor ==1 whenever the search criteria are satisfied. And,
2. Using -replace- to make variable tagman==1 whenever the search criteria are satisfied.

For my -foreach- code, I used as a template Nick Cox's FAQ at

HTML Code:

http://www.stata.com/support/faqs/data-management/try-all-values-with-foreach/

. I

My questions are:
1. My observation is that in my choice of search criteria, there's no advantage over a manual search. Am I not using -foreach- correctly?
2. If I want to specify a range of values for variable plicd (e.g. I21.0 to I21.3), how would I specify that in each of my two methods?
3. Note that the value of "I25.3" is captured, even though in my -index- function I specify only "I25.3" for the "I25.x" range.

Code:

clear
set obs 10
input mrn tagfor str8 plicd
1 . "I21.0"
2 . "I21.1"
3 . "I22.3"
4 . "I25.2"
5 . "I25.3"
6 . "I26.4"
end
l, noo

egen group = group(plicd)
su group, meanonly
 summ group, detail
 foreach i of num 1/`r(max)' {
        replace tagfor=1 if         index(plicd, "I21.*") | ///
                                                                        index(plicd, "I22.*") | ///
                                                                        index(plicd, "I25.2")
        }
        gen tagman = 1 if                 index(plicd, "I21.*") | ///
                                                                        index(plicd, "I22.*") | ///
                                                                        index(plicd, "I25.2")
replace tagman=0 if tagman!=1 & tagman!=.
l mrn plicd group tagfor tagman, noo

↧

Can GMM apply for a panel data with large T ?

August 23, 2016, 6:39 pm

≫ Next: labmask error

≪ Previous: Specifying multiple criteria for -foreach-

Dear Statalis Members,

I have a panel data of 13 variables (including the Y) around 1,700 firms from 1991 to 2015 with missing figures.

Is the T (25 years) too big for applying GMM estimates ? What are the potential problems for using a large time period (say over 25, maybe 50 years) for GMM ?

Thank you in advance.
Chad

↧

labmask error

August 24, 2016, 12:26 am

≫ Next: Downloading enhancedeba

≪ Previous: Can GMM apply for a panel data with large T ?

Hi,
I am using Stata 14 on Windows. I have tried implementing the labmask command and encountered the following error "x not constant within groups of y". Any work around on this?

Thanks in advance.

↧