Can Stata 12 read the HRF dates in my syntax as SIF dates?

November 4, 2015, 6:04 pm

≫ Next: How to deal with unbalanced and missing observations in estimating panel VAR in stata

≪ Previous: putexcel error: "using not allowed"

I have a command that includes a large number of dates I'd like to use in recoding a date variable as a dummy variable. The dates in my syntax are in DMY format (01jan2011, for example). Despite my having changed the dates in my data to being displayed in this very format, Stata appears to be unable to read any command using dates not in SIF form. I have followed a host of tutorials, forum posts, etc. on SIF to HRF and HRF to SIF conversion for dates, but these seem to speak only to changing the way they're displayed in my data, whereas I'm looking instead for Stata to read the HRF dates I enter in syntax as SIF dates.

Is there a simple way to either a) get Stata to read my command with DMY format dates or b) convert my long list of DMY format dates to SIF dates (to use that as my syntax instead) without manually converting them one by one?

↧

How to deal with unbalanced and missing observations in estimating panel VAR in stata

November 5, 2015, 1:12 am

≫ Next: How to use a logit model that accounts for age (or time effect) of ventures

≪ Previous: Can Stata 12 read the HRF dates in my syntax as SIF dates?

Hi members, someone to help me; i am estimating a panel VAR, however my data is unbalanced and has missing observations- i have attached a sample to illustrate my data characteristics- kindly help me on how to handle this in stata.

↧

How to use a logit model that accounts for age (or time effect) of ventures

November 5, 2015, 1:22 am

≫ Next: Custom prediction equations using mi impute chained

≪ Previous: How to deal with unbalanced and missing observations in estimating panel VAR in stata

Synopsis of issue: How can I account for age (or time) effects in a pooled logit without using fixed effects or conditional logit?

I have a panel dataset of firms for 8 consecutive years, a few firms fail each year and those that do not for 8 years are censored. I want to use a special technique (Blinder-Oaxaca, 1973 decomposition) to figure out how the difference in levels of same variables for say the Male vs Female owners impacts survival of ventures. This technique can handle logit but cannot handle fixed effect logit (atleast, as much as I understand. There is a specific user written command for this analysis called ‘fairlie’ in Stata, which first runs a logit and then does a decomposition – all behind the scenes, hence I do not think I can run a fixed effects logit). Thus, I was thinking could I run one of the following pooled logit?

Option 1:
g(F(x)) = b0 + b1*X1 +b2*X2 + ……. + T1 + T2 + ………. T7 + u

Where the dependent variable is log of odds, b0, b1 …etc are the coefficients, X1, X2 are variables and T1, T2 …. are time dummies (8^th dummy omitted), and u is the error term?

The reason I want to include the time dummies, is because I want to account for the fact that the firms that survive longer have a higher chance of survival. If I run a simple pooled logit without the time dummies, it will treat a firm failing in 5^th year same as a firm failing in the 2^nd year …..

Option 2:
g(F(x)) = b0 + b1*X1 +b2*X2 + ……. + bn*age_of_venture + u

Here the age_of_venture variable, should I believe account for different ages of a venture. Thus, in a pooled logit of the panel data, say for firm id 23, which survived for 4 years, we would have 4 rows of observations with age_of_venture variable incrementing in each row by 1 as such 1, 2,3 and 4 and dummy for venture surviving or failed (the dependent variable) would be 0, 0, 0 and 1 for the 4 rows (if 0 = survival, 1 = failure)?

Option 3:
Or if there is a better approach to control for age of ventures in a pooled logit without running conditional or fixed effects logit, I would appreciate that…..

Thanks for any help.

↧

Custom prediction equations using mi impute chained

November 5, 2015, 9:33 am

≫ Next: How to open a bank using insheet and "if"

≪ Previous: How to use a logit model that accounts for age (or time effect) of ventures

Hi ,

I am wondering if anyone can offer some advice with writing customized prediction equations.

mi impute chained ///
(regress, inlcude(var1 var2) omit(var3 var4 outcome2 ) ) outcome1 ///
(regress, inlcude(var1 ) omit(var3 outcome1 ) ) outcome2 ///

etc. etc.

However, i have lots of missing data in many variables, and it gets really fiddly with all the omits and includes.

It would be a lot easier to specify the regression model i was actually interested in, rather than specifying all things i wasnt.

something along the lines of ...

mi impute chained ///
(regress, custom(outcome1 var1 var2 )   ///
(regress, custom(outcome2 var1  ///

a bit more like the ice eq() option

many thanks

A

↧

How to open a bank using insheet and "if"

November 5, 2015, 10:05 am

≫ Next: Checking through observations to satisfy a condition

≪ Previous: Custom prediction equations using mi impute chained

I have a huge data to open and it is formatted using ";" as delimiter. I know the command infile have the option "if", but my data is not in a format to be opened by infile.
I'd like to know if it is possible to open my data using some kind of "if". I have a file with all country's citizen salary, but I'm only looking for those who works as teachers, so I would lose a lot of time opening this bank and then I'd be able to apply my condition. I don't know if I explained it well, english is not my main language, it's a bit hard for me to explain such a difficult case I have, but I may explain it in other words if needed.

Thanks for your attention.

↧

Checking through observations to satisfy a condition

November 5, 2015, 11:07 am

≫ Next: Export correlation matrix to Word

≪ Previous: How to open a bank using insheet and "if"

Hi there,

I have one specific effective date per employee. I need to compare it with a series of or observations of beginning dates per employee. Per employee, the beginning dates can be sorted from earliest to latest. I should get the latest that is less than or equal to the specific effective date. Oftentimes, the latest beginning date does not satisfy the begdate<= effectdate condition, so the program should move to the next most recent beginning date and check again until it satisfies the condition--and pick up this beginning date. How do I do this identification over observations of beginning dates?

Thanks for any help.

↧

Export correlation matrix to Word

November 5, 2015, 11:12 am

≫ Next: "Not sorted" error in panel data

≪ Previous: Checking through observations to satisfy a condition

Dear all,

Finally all my regressions are runned and my have all regressions exported to Word using the following code:

qui reg CARR5 Completed CashPayment MajorityStake TargetPublicStatus LargeTransactionValue DomesticDeals DummyYear R_AssetGrowth R_CashAssets R_DebtAssets R_ROA R_BEME
est sto one
outreg2 [one] using "M:\test.doc", replace dec(3)

The Stata-code to get the correlation matrix is:
corr CARR7 BCARA7 Completed CashPayment MajorityStake TargetPublicStatus LargeTransactionValue DomesticDeals Horizontal DummyYear HFI R_AssetGrowth R_CashAssets

How can I adjust the code I use for my regressions to export the correlation matrix to word? outreg2, estout, tabout and egenmore are installed.

Kind regards,
Emiel Brak

↧

"Not sorted" error in panel data

November 5, 2015, 11:20 am

≫ Next: istdize command is not storing matrices

≪ Previous: Export correlation matrix to Word

Hi,

I have problem with using generate command with by option in panel data.

My code looks like this:

xtset id reportyear
gen nfi = a - b
gen dnv = nfi + c + d +e
bysort somevariable reportyear: gen nvai = (dnv - L.dnv) / L.dnv

When I execute the do file it returns error: Not sorted?

Can someone explain what is the problem? I think stata is confused because I am working in panel and want to make growth rate by non-id variable?

↧

istdize command is not storing matrices

November 5, 2015, 1:00 pm

≫ Next: Best method for doing Mixed ANOVA in STATA?

≪ Previous: "Not sorted" error in panel data

Hi guys. I have a problem with the stored values when using the istdize command. According to the STATA mannual, istdize command should store a couple of matrices. When I run the command, it only stores scalars. The return list result is:

scalars:
r(se) = 8.888194417315589
r(mean) = 79
r(N) = 1
r(ub) = 98.45757114376829
r(lb) = 62.54504395974459

Does anyone know why it is not storing what the mannual says? I have tested it using STATA 12 and STATA 13.

Thanks

P.S: I attached a picture of the mannual

Array

↧

Best method for doing Mixed ANOVA in STATA?

November 5, 2015, 1:50 pm

≫ Next: Anomaly detection

≪ Previous: istdize command is not storing matrices

Hello everyone, I'm trying to analyze this 2x2 factorial design I have, where one factor is between-subjects and the other is within-subjects. In SPSS it's straight forward, just do a repeated measures anova and add your between subjects factor. But I'm trying to learn STATA, so here I am. Should I try to do this using the split-plot ANOVA commands, or should use a mixed model using the mixed command?

Thank you in advance

↧

Anomaly detection

November 5, 2015, 2:54 pm

≫ Next: Calculating and appending the check digit of a SEDOL number

≪ Previous: Best method for doing Mixed ANOVA in STATA?

I am trying to find out if there is any functions or add-ons to STATA which support anomaly detection on STATA on time-series data.

The data of have is daily frequency counts and there is a both a collective and contextual nature to their anomaly pattern.

Essentially, simple distance measures won't suffice since sometimes a zero is anomalous and sometimes it is not depending on the overall pattern of the data. Likewise sometimes a value of 50 is anomalous when it isn't precedded by a 20 and followed by a 70 but otherwise is. From my reading I am looking to use collective and contextual anomaly methods.

Anyone have any experience with this in either stata or elsewhere?

↧

Calculating and appending the check digit of a SEDOL number

November 5, 2015, 4:31 pm

≫ Next: Regression: Variance of the error term

≪ Previous: Anomaly detection

Hello everyone, I am trying to calculate the check digit (7th) of a SEDOL number from an existing 6 digit (alphanumeric or numeric) SEDOL number. Does anyone have a .do file for this command in STATA? I found many in other languages, but not in Stata :http://rosettacode.org/wiki/SEDOLs or https://en.wikipedia.org/wiki/SEDOL
sedolibes
Examples of codes (all securities listed on LSE)
769663
B00CRV
B00DF1
B00FPT

Thank you in advance,
A.

↧

Regression: Variance of the error term

November 5, 2015, 4:32 pm

≫ Next: Dropping observations based on a set difference of days in panel data

≪ Previous: Calculating and appending the check digit of a SEDOL number

Dear all,

I have a dataset containing roughly 200 companies with daily stock data for 10 years.
The variables are: date, companyid, Ri_Rft, B_Ret, SMB, HML
I need to run reg Ri_Rft B_Ret SMB HML for every company in the sample monthly.
After this I need to save the Variance of the Error Term as a new variable.

I have a the following code set up:

gen resid=.
levelsof id, local(groups)
foreach a of local groups {
    quietly reg Ri_Rft B_Ret SMB HML if id==`a'
    tempvar d
    predict `d', stdp
    replace resid=`d' if id==`a'
}

However, I have two problems with this setup.
First, I am not sure if the "predict, stdp" command achieves my goal of saving the variance of the error term.
Will the new variable 'resid' contain the variance of the error term?
Second, this code only works if I reduce my sample to roughly half the companies, or else it gives an error: no room to add more variables.
Is this solved simply by using set maxvar and how does this work? Where should I place it in my code?

Kind Regards,
Bram van Vorstenbosch

↧

Dropping observations based on a set difference of days in panel data

November 5, 2015, 5:24 pm

≫ Next: Combining SUR and Heckman for 6 equations

≪ Previous: Regression: Variance of the error term

Hi Lister.

This is my first post on Statalist so forgive me if my query comes off elementary. I have gone through some of the previous posts based on certain filters, and haven't found anything relevant.

I am conducting an event study with respect to actual share repurchases, and I have event data for each company in my sample. My dataset contains two columns: company ID and eventdate. There are certain cases whereby a company engages in several repurchase events within any given month across any year, therefore I have recurring company ID. However, I am only interested in those events that fall outside the space of ten days. I will try to illustrate my point with an example below:

ID Event Date
100 1/03/2000
100 5/03/2000
100 11/03/2000
100 18/03/2000
100 25/03/2000
100 31/03/2000

The output I would like to get here are those events that fall beyond the space of ten day period ( in the above example this corresponds to 1/03/2000, 11/03/2000, 25/03/2000). However, I am struggling to see how this could be derived through Stata. I am not exactly sure how to encode it such that Stata considers the difference between the first and second date observations, and if that is less than 10 then it drops the second observation. Next, look at the difference between the third and first observation and as the difference now is greater than 10, so Stata should keep that observation. Similarly, it should nexy look at the difference between the fourth and third observation, and if the difference in number of days is less than ten then it should drop the fourth observation and so on.....I am guessing this would require some kind of loop however since my stata level is still rather elementary, I would greatly if any of the Stata gurus here could help me out. Thanks a ton!

↧

Combining SUR and Heckman for 6 equations

November 6, 2015, 12:33 am

≫ Next: Save "Constant" + t-stat from a regression

≪ Previous: Dropping observations based on a set difference of days in panel data

My set is data of 6 sub-sectors for 49 different countries in 16 different years. I would like to analyse the determinants of investments in each sub-sector. I suspect that I can increase efficiency of my estimations by applying SUR on the 6 sector equations. Unfortunately 3/4 of all observations have an investment-value of zero, even many more for some sectors, so I have to use a Heckman (which I usually estimate 'manually' like shown in slide 50/51 here: http://goo.gl/5C1Xep). This is where I get a little confused on how to go about it.

Is there a way to i) run a probit-SUR for 6 equations (biprobit only allows for 2 equations) to cover the first step of the Heckman prodcedure and ii) extract the inverse Mills ratio (ratios?) from that, so I can use it in sureg? If not, is there a way to use the heckman command in a SUR-way?

↧

Save "Constant" + t-stat from a regression

November 6, 2015, 1:07 am

≫ Next: stcurve

≪ Previous: Combining SUR and Heckman for 6 equations

Hi,

I am running N regressions by group (with many groups...). I would like to report the Intercept + tstat (+ # obs + R^2, maybe) by creating directly an Excel file that would have the following format: Code:

Gp	Sub_gp	var1	var2	…	varN
1	1	Intercept	Intercept		Intercept
		t-stat	t-stat		t-stat
1	2	Intercept
		t-stat
1	3	Intercept
		t-stat
10	1	Intercept
		t-stat
10	2	Intercept
		t-stat
10	3	Intercept
		t-stat

Here is the regression that I'm running, using a dataset that is accessible to all:

clear all

sysuse auto
  
  foreach VAR in price weight length { bys foreign rep78: regress `VAR' mpg headroom gear_ratio, robust }

Can anyone help? Thanks a lot in advance!

↧

stcurve

November 6, 2015, 1:22 am

≫ Next: using already imputed variables to impute other variables with mi

≪ Previous: Save "Constant" + t-stat from a regression

Hi there,

I need help. Its a really easy question, but it drives me crazy. I googled, but I did not find an answer.

I entered this code:

mi set mlong

mi stset dauer [pweight = gewicht_post2] , failure(einstieg==1) id(PERSONID)

xi: mi estimate, hr: stcox i.SIBE_l09b Ausbildung kohorte SEX verz i.bildung

mi stset dauer [pweight = gewicht_post2] , failure(einstieg==1) id(PERSONID)

mi estimate, hr: stcurve, survival at1(SIBE_l09b=1) at4(SIBE_l09b=2) outfile(stcurve5.dta)

The cox regression works fine, but not the stcurve. I get the following message: varlist specification required

How can I do this?

Thank you very very much. And i'm sorry if this question is boring for you guys. But it drives me crazy

↧

using already imputed variables to impute other variables with mi

November 6, 2015, 1:56 am

≫ Next: converting string date in MM/DD/YYYY format to Stata date

≪ Previous: stcurve

Dear Stata Users,

Does anyone know if there is a way to use variables already imputed outside of Stata's mi routines to help impute other variables when using mi? The data I have look like what you get by running the code below. Imagine that socst has already been imputed and I want to use those imputed values in "mi impute chained" to help impute read, write, math, and science without adding more than the existing 5 imputations. Stata's mi procedure does not seem to like using already imputed variables as independent variables in the prediction equations for variables that have not been imputed. Is there a way to do this?

Thanks,

Jeremy

use http://www.ats.ucla.edu/stat/stata/s...ice_imputation, clear
mi import flong, m(_mj) id(_mi) imputed(female read write math science prog) clear
drop _Iprog_2 - m_schtyp
sort id _mi_m
replace read =. if _mi_m>0
replace write =. if _mi_m>0
replace math =. if _mi_m>0
replace science =. if _mi_m>0

↧

converting string date in MM/DD/YYYY format to Stata date

November 6, 2015, 7:51 am

≫ Next: Working with dates - assigning day and month to year of birth

≪ Previous: using already imputed variables to impute other variables with mi

Hello,
I wish to convert these string dates, in which the MM and DD may be either 1 or two digits. The following two commands have been unsuccessful:

todate mydate, gen(statadate) p(ddmmyyyy)
returns error "length does not match pattern"

gen statadate = date(mydate, "DDMMYYYY")
generates missing values

Any advice/corrections would be most appreciated.

↧

Working with dates - assigning day and month to year of birth

November 6, 2015, 8:08 am

≫ Next: Suest after xtivreg2

≪ Previous: converting string date in MM/DD/YYYY format to Stata date

Hi All

I have a variable birthyear that specifies the year of birth for all subjects in my dataset (any one year between 1981 and 2006). I would like to add the same day and month to the year of birth for all subjects (1st Oct). I need to do this so that I have full date variable (i.e. day, month and year) to be able to calculate for example age etc. My renaming date variables such as date of diagnosis or date of visit have day, month and year.

Currently, birthyear is storage type 'int' with a display format '%8.0g".

tab birthyear in 1/8

1981
1983
1987
1981
1994
2006
2001
1996

What I would like to create is:

tab birthyear_new in 1/5

1oct1981
1oct1983
1oct1987
1oct1981
1oct1994

I started with generating two variables to specify day and month:

gen bd=01
tostring bd, gen(bday)

gen bm=10
tostring bm, gen(bmon)

I now have have to add the above bday and bmon to the birthyear variable. I'm not sure how to do this. Any advice most welcome.

Thanks

/Amal

↧