exporting a big dta file to excel

November 3, 2015, 2:10 pm

≫ Next: ARIMA and Insufficient Observations

≪ Previous: How to transform a file containing time intervals in a reshaped long form?

I am trying to export a 4 columns and 290 million rows dta to an excel file but stata tells me:

. export excel cep_res using "SIH_ceps", firstrow(variables)
too many or no observations specified
r(198);

Is it because I have too many rows in my dta file? How could I solve this problem?

↧

ARIMA and Insufficient Observations

November 3, 2015, 3:49 pm

≫ Next: rolling kernel regression with varying bandwidth

≪ Previous: exporting a big dta file to excel

Hello everyone,
I'm pretty new to STATA, especially ARIMA functions, and I've spent the past few hours trying to figure out where I'm going wrong.

Problem

I'm trying to make a basic ARIMA model with my dependent variable, Berkshire Hathaway Class A Shares Returns. There are 427 observations. I've been following the video by the STATA Corp LP channel on Youtube https://www.youtube.com/watch?v=8xt4q7KHfBs but when I work with my own data set, things do not end up right.

Specifically, when I include D. in front of the dependent variable (for 1st difference purposes) when using the line graph, I get a blank graph. This is the same if it is the natural logarithm of the dependent variable.

And when I use the syntax, arima depvar ar, (1)

I get the error: (note: insufficient memory or observations to estimate usual
starting values [2])
insufficient observations
r(2000);

Is 427 observations not enough to conduct an ARIMA model?

Data and Observations

Here is the description of my data

Contains data
obs: 427
vars: 4
size: 9,394

storage display value
variable name type format label variable label

date int %td.. Date
b_hathway_cla~r long %10.0g B_HathWay_Class_A_Ind_Var
aaa_corp_bond~d double %10.0g Aaa_Corp_Bond_Yield
_year_tbill_y~y double %10.0g 10_Year_T-Bill_Yield_Maturity

Any help or links towards resources for troubleshooting ARIMA would be greatly appreciated.

Sincerely,
Mike Pelosi

↧

rolling kernel regression with varying bandwidth

November 3, 2015, 6:52 pm

≫ Next: How to save output of psmatch2?

≪ Previous: ARIMA and Insufficient Observations

Hello All,

Glad to become a member of Statalist. I am trying to use kernel regression to predict the variance of A based on the variance of B (measured as return squared). For the variance of B, at each time t, I first create around 10,000 states (i.e. 10,000 different variances of B) and then for each state I use kernel regression based on all the available actual data of variances A and B to predict the variance of A. This process then rolls over by adding one more pair of actual observations each time (like a rolling regression). I have written the codes (see below). It worked but far too slow - it took around 40 hours for the estimation. I know there are many Stata experts on the forum, so could anyone let me know how to speed up the loop please? I heard mata is much faster. However, unfortunately I have no idea about it and given the submission deadline of my thesis, I might not have enough time to learn it at the moment.

Any help would be immensely appreciated!!

quietly {
forvalues j = 300(1)1300 {
forvalues v = 0(1)9 {
gen vara`j'`v' = (ln((`j'+`v'/10+0.05)/aprice))^2
forvalues i = 1/`=_N' {
gen K2`j'`v'`i' = ((2*_pi)^(-1/2))*exp((-1/2)*((vara`j'`v'[`i']-actualvara[_n-1])/h[`i'])^2) in 1/`i'
gen sum2`j'`v'`i' = sum(K2`j'`v'`i')
gen w2`j'`v'`i' = K2`j'`v'`i'/sum2`j'`v'`i'[_N]
drop K2`j'`v'`i' sum2`j'`v'`i'
gen expectedvarb`j'`v'`i' = sum(w2`j'`v'`i'*actualvarb[_n-1])
replace vara`j'`v' = expectedvarb`j'`v'`i'[_N] if _n ==`i'
drop w2* expectedvarb*
}
}
}
}

Notes:
vara = variance of A
aprice = price of A
actualvara = actual variance of A
actualvarb = actual variance of B
expectedvarb = expected variance of B

↧

How to save output of psmatch2?

November 3, 2015, 7:13 pm

≫ Next: multilevel models and confidence interval (xtmixed)

≪ Previous: rolling kernel regression with varying bandwidth

Hi!
I am wondering the way of storing result of "psmatch2" in matrix,
In case of T-test, I used following statement.
mat T[1,1] = r(mu_1)
mat T[1,2] = r(mu_2)
mat T[1,3] = r(mu_1) - r(mu_2)
mat T[1,4] = r(p)

But I cannot find psmatch2 output save format (such as r(mu_1) ).
Please help me!

Thank you very much for reading !

↧

multilevel models and confidence interval (xtmixed)

November 3, 2015, 7:45 pm

≫ Next: RESET after Tobit

≪ Previous: How to save output of psmatch2?

I am running multi-level models using the xtmixed command in stata. I'm trying to make a graph that has mean predicted trajectory along with two confidence interval trajectories (upper and lower bounds of the mean). I consulted the "Multilevel and Longitudinal Modeling Using Stata" written by Rabe-Hesketh and Skrondal but still couldn't quite figure out how to do it. My main confusion is how to calculate the confidence interval in the growth curve models? Anybody can help out? Thanks in advance!

↧

RESET after Tobit

November 3, 2015, 8:55 pm

≫ Next: How can I replace values within a specific variable, with the labels associated with those values?

≪ Previous: multilevel models and confidence interval (xtmixed)

How do I run a Ramsey RESET test on a Tobit and model? I am trying to replicate the procedure I have
already seen in many papers And I know it is possible as I have read three papers
that present the results, but I keep running into errors when I try to
run it.

After running my tobit command
. xi: tobit Max_WTP AGE SEX LOCATION MARRIED HHSIZE DISTANCE YEDUCATION EMPLOYMENT i.NNINCOME INSURANCE SATISFACTION, ll
. est store M
. ovtest
last estimates not found
r(301);

. ovtest M
last estimates not found
r(301);

I also tried
xi: tobit Max_WTP AGE SEX LOCATION MARRIED HHSIZE DISTANCE YEDUCATION EMPLOYMENT i.NNINCOME INSURANCE SATISFACTION, ll
ovtest

last estimates not found
r(301);

I also tried estat ovtest but no success

Is there a line of code I am missing? Should I be running mfx after
the tobit (even though I have tried this and I still have the same
error)

Any help is greatly appreciate

Regards
Mohammed

↧

How can I replace values within a specific variable, with the labels associated with those values?

November 3, 2015, 9:09 pm

≫ Next: cross-validation to find optimal bandwidth at each time t

≪ Previous: RESET after Tobit

I am trying to replace the values in a variable with the labels associated with those variables, but I want to avoid writing multiple replace ... if ... statements. I need to do this for 50 different data files so it does not make sense to do it this way. Is there a way to write a loop or a simple line of code to surve the purpose?

Context and Example:
My data files are for different countries. One variable is region / province / state. It takes the values of 1,2,3,... and each data file has its own labels for these values. For example the US file has California, New York, etc. and the Canada file has Ontario, British Columbia, etc. But both have the values of 1,2,3. When I append the two data sets, the lables of the master (US) set are used for the outcome (appended data set) and therefore I lose the labels for the used file (Canada).

To solve this problem, I want to go and replace 1 with California in the US file and replace 1 with Ontario in the Canada file, and then perform the append. I know how to do this with many many lines of code, but I want a smarter and scalabel way of doing this.

↧

cross-validation to find optimal bandwidth at each time t

November 3, 2015, 11:55 pm

≫ Next: interaction opposite signs

≪ Previous: How can I replace values within a specific variable, with the labels associated with those values?

Hello listers,

My dataset is quite simple, only 3 columns, i.e. date, variance of A and variance of B. I am trying to use Nadaraya-Watson kernel regression to predict variance of A conditional on variance of B. This will be performed on a rolling basis, which means the optimal bandwidth may change at each time t. And I wanna apply a cross-validation procedure to find the optimal bandwidth at time t, i.e. minimise the root mean-squared error using a Jackknife-based procedure. I have had a look at -help jackknife-, however, it only returns the RMSE, not the optimal bandwidth. I have also checked -help loocv- and -help crossfold-, they also just return RMSE. For the whole sample, I might be able to do it in excel with the solver function, but no way for time-varying bandwidth.

If anyone could shed some light on this, it will be really appreciated!!

↧

interaction opposite signs

November 4, 2015, 8:08 am

≫ Next: Using deviation coding (devcon) with linear regression

≪ Previous: cross-validation to find optimal bandwidth at each time t

I have the following regression
Leverage = @ + dummy + Cash + cashXdummy
Dummy is 1 for international firms and zero for non-international.

The results are puzzling
Cash gives significant positive effect
The variable (cash * dummy) is significant and negative.

Are the mentioned two opposite signs okay? What this means. I did the regression in several methods and still provide the same results. I also expand the smaple size and still have the same outcomes. Is this results fine or maybe I did something wrong? I learned that the multicoloniarty is not an issue of concern with interactions

many thanks

↧

Using deviation coding (devcon) with linear regression

November 4, 2015, 8:40 am

≫ Next: Ways to clean ICD9 codes

≪ Previous: interaction opposite signs

Has anyone used the devcon command with linear regression? I've been working with this and could use some advice on whether I've run this correctly.

I have two variables - one a continuous variable (B12) and one a categorical variable. (I'm looking at student enrollment (B12) in about 600 higher education programs across 17 fields of study.) The continuous variable (B12) is the number of students enrolled in each program and the categorical variable (i.NEWTaxonomy) includes the 17 different fields of study. For each field of study within NEWTaxonomy, I generated a new variable, i.e., NEWTaxonomy=1 becomes tax1, NEWTaxonomy=2 becomes tax2, NEWTaxonomy=3 becomes tax3, etc.

I then ran a linear regression: regress B12 i.NEWTaxonomy, vce(robust)

Followed by: regress B12 tax2 tax3 tax4 tax5 tax6 tax7 tax8 tax9 tax10 tax11 tax12 tax13 tax14 tax15 tax16 tax17, vce(robust)

And then: devcon, groups(tax1 tax2 tax3 tax4 tax5 tax6 tax7 tax8 tax9 tax10 tax11 tax12 tax13 tax14 tax15 tax16 tax17)

In the final step, what I'm assuming I'm doing (and what I want to be doing!) is looking at deviations from the grand mean.

The coefficients, robust standard errors, F-tests and R-squared in the results from the first two steps are all identical and p-values in the third step seem logical, but I could use some reassurance that I've run this correctly!

Thanks in advance for any advice.

Nathan Bell

↧

Ways to clean ICD9 codes

November 4, 2015, 8:44 am

≫ Next: copying a value within group

≪ Previous: Using deviation coding (devcon) with linear regression

I'm a newbie, and trying to clean up an ICD9 string variable that has some errant codes. An example is 530.809999999999999 which should be 530.81
I've tried using recode, and this is what I get:

. recode ICD9CODES1 (530.8099999999999 = 530.81)
recode only allows numeric variables

Replace isn't much better:
. replace ICD9CODES1 = "530.81" if ICD9CODES1 = 530.8099999999999
type mismatch
r(109);

I'm not sure if I first need to (or should) change the ICD9 code variable to a double and then try the recode command, or is there a better way?

↧

copying a value within group

November 4, 2015, 9:00 am

≫ Next: if command

≪ Previous: Ways to clean ICD9 codes

Dear all,

I need help about the following issue:

Target: In my data there exists a variable, x. Within each group of observations identified by a unique id (call the variable boxnumber), x is missing for all but one observation. However, this value is not always in the same place or location; sometimes it is in the first raw of within the group and sometimes it is in row 3 or 4 etc. I want to assign this single nonmissing value of x (within each group) to all missing values of x within the group. That is, when I'm done, x is nonmissing for all observations and constant within boxnumber. This constantvalue of x (within id) remains unchanged for the original nonmissing values of x. Very similar question was asked in the following link. However, that solution suggested works only if this nonmissing value is always in the same order within the group. Thank you for your help. http://www.stata.com/statalist/archi.../msg00921.html http://www.stata.com/statalist/archi.../msg00923.html

↧

if command

November 4, 2015, 10:19 am

≫ Next: Simulate (Monte Carlo) how to estimate the R-Squared, t-ratio, HAC t stats and t/sqrt(T)

≪ Previous: copying a value within group

I do not know why my -if command- below is not working.

foreach a in 03 04 05 06 07 08 09 10 11 12 13 14 {
foreach p in 01 02 03 04 05 06 07 08 09 10 11 12 {

use RJ`i'`a'`p'.dta, clear

if `a'==03 & `p'==02 {
display "sim"
destring insc_pn seq_aih5, replace force
}

append using base.dta

save base.dta, replace
}
}

Stata gives me:
variable insc_pn is str10 in master but double in using data
You could specify append's force option to ignore this string/numeric mismatch. The using variable would then be
treated as if it contained "".
r(106);

I have already tried if "`a'"=="03" & "`p'"=="02" but it still does not work.

↧

Simulate (Monte Carlo) how to estimate the R-Squared, t-ratio, HAC t stats and t/sqrt(T)

November 4, 2015, 11:06 am

≫ Next: How can I create a country-specific time trend in a panel model?

≪ Previous: if command

Dear all,

I am trying to run a simulation for my MSc thesis, which is basically comes down to this:

program define myreg
drop et ut Vmin1 V V1 R R1 time
set obs 660
gen time =_n
tsset time
mat sigma = [1,-0.3\-0.3,1]
mat m = (0,0)
drawnorm et ut, n(660) means(m) cov(sigma)
gen Vmin1 = Variance[_n-1]
gen V=0+0.6*Vmin1+ut
gen V1=V[_n-1]
gen R=Vmin1*0+et
gen R1=R[_n]
ivregress 2sls R1 V1, vce(hac quadratic opt) small
more
end

simulate _b _se, reps(10000): myreg

The only problem is that that the simulate command only provides _b and _s in the exp_list. However, I also need the R2, standard t-stat, HAC t-stat and the t/sqrt(T) ratio. Is there a way to solve this problem?

Kind regards,

Erik JAn Poelen

↧

How can I create a country-specific time trend in a panel model?

November 4, 2015, 11:29 am

≫ Next: Map Letters to positions in alphabet

≪ Previous: Simulate (Monte Carlo) how to estimate the R-Squared, t-ratio, HAC t stats and t/sqrt(T)

Hi Statalist-users,
I am trying to run a panel data model with 19 countries across 10 years to observe the investment flows into each country.
For that i have run a fixed effect model and also added time dummies before. In order to get a trend coefficient for each country, now I want to create a time trend for every country (where the time trend increases from 1, 2,..., T and T = the number of observations per country). For each country there is a separate time trend (0 for all the other countries and 1, 2, ..., T for each country).
Can anyone help? Every answer is appreciated!
Thanks!

↧

Map Letters to positions in alphabet

November 4, 2015, 11:36 am

≫ Next: Add labels in ttdendro command

≪ Previous: How can I create a country-specific time trend in a panel model?

Hi. I'd like to generate a variable that contains the relative positions of letters (in another var) in the alphabet. For example:

var1    new_var
A       1
B       2
C       3
C       3
X       24

What's an elegant way to do this?

↧

Add labels in ttdendro command

November 4, 2015, 2:37 pm

≫ Next: Collapsing every group of 4 observations

≪ Previous: Map Letters to positions in alphabet

I tried to add labels for the "tree leaves" in the ttdendro graphs using the labels(varname) option. But the feedback said this option is not allowed in the ttdendro. Since the documentation said all dendro options will be allowed in ttdendro, I wonder why I cannot use it. Is anyone can help? Thanks!

↧

Collapsing every group of 4 observations

November 4, 2015, 2:42 pm

≫ Next: Putexcel command works on one computer but does not work on another computer (same OS Server 2008 R2 and same version of Stata (14))

≪ Previous: Add labels in ttdendro command

Hello,

I have some data which looks like this (although there are many more variables)

Mar-1960	10	11
Jun-1960	22	21
Sep-1960	913	713
Dec-1960	1982	8311

This pattern repeats for a number of years. The data is already organized in this format (i.e. the months and years are in order). I would like to sum these four observations to create a new observation,

1960

2927

9056

I've looked into the collapse command and some loops without any success. Thanks for any help.

↧

Putexcel command works on one computer but does not work on another computer (same OS Server 2008 R2 and same version of Stata (14))

November 4, 2015, 4:13 pm

≫ Next: putexcel error: "using not allowed"

≪ Previous: Collapsing every group of 4 observations

I have a user that gets this on one server. On another server it works correctly. . putexcel A1=("Poverty and SN Table") using allpovtest, replace //TITLE
using not allowed
r(101);

We updated the server with the issue to 14.1 but that did not help. Both servers are Windows 2008 R2 and Stata 14 Any ideas?

Thanks very much!

Mary Anne

↧

putexcel error: "using not allowed"

November 4, 2015, 4:47 pm

≫ Next: Can Stata 12 read the HRF dates in my syntax as SIF dates?

≪ Previous: Putexcel command works on one computer but does not work on another computer (same OS Server 2008 R2 and same version of Stata (14))

I read this forum all the time, but I'm new to posting, so please forgive any mistakes.

I just installed the October 29th Stata update, and now all my code involving the "putexcel" command is returning the "using not allowed r(101)" error.

Here's an example of my code that used to work fine:

fre c1checkweight if today_date - c1wgtdate < 8, nomissing include(0 1)
matrix r2 = r(valid)
if r2[1,1] == . {
putexcel B10=("N/A") C10=("N/A") D10=("N/A") using "QC Template Weekly Progress Slide 2015-06-22", modify keepcellformat
}
else if r2[1,1] != . {
putexcel B10=(r(N)) C10=matrix(r2[2,1]) D10=matrix( (r2[2,1]/r(N))*100 ) using "QC Template Weekly Progress Slide 2015-06-22", modify keepcellformat
}

However, even trying to recreate the basic examples in the manual ( for the command will produce this error. For example:

 putexcel A1=(2+2) using file

I realized that this is related to the recent updates to "putexcel", where you must now use "putexcel set" to specify the file for subsequent putexcel commands. This makes things easier in the long run (i.e., now you don't have to constantly re-specify where you want your cells to go, but it also has the (perhaps unintended) effect of making old code no longer work.

The options ", modify" and ", keepcellformat" also seem to produce errors when using just the "putexcel" command.

I've updated all my code so it now works, but I thought this might confuse others at first as well, so I thought I'd mention it here in case it helps!

Thanks!

-Evan

↧