Quantcast
Channel: Statalist
Viewing all 72758 articles
Browse latest View live

how to define a date and time as a timeseries?

$
0
0
Hi Guys, I've done quite some research on this online and found nothing thats able to help me.
I am working on a project through in which I will be looking at the data of changes in stocks on 15 minute intervals. shown below.

Date Time Index Futures
31/10/2012 13:30 1415.48 1410.7
31/10/2012 13:45 1416.54 1411.7
31/10/2012 14:00 1414.13 1409.5

I need to set a timeseries incluing the date and exact time of my values. I am well aware i need to create a new date variable for this and specify that it is through "DMYhm".

Is there a way i can perhaps use this already existant date and time value, or do I create a new one, which is what i have been trying to do.
All i have managed to do is generate dates that do not take into account the time which i need, or using %tc stata generates me a new variable that is dates 1 jan 1960 and the time on that is wrong to.

Any help would be great!

Thanks.

Error when using reclink and loop to match records

$
0
0
Dear all,

I'm working on matching birth records to hospital discharge records using reclink. The files are huge, so I've broken them down into monthly files - each month of births contains 9-10 thousand records, while each month of discharges contains about 25k records. To speed things up, I'm looping through months and years - the code starts like this, where i indexes years and w indexes months:

forvalues i=1997/2011 {
foreach w in 01 02 03 04 05 06 07 08 09 10 11 12 {

...then runs through a few rounds of record linkages that look like this:

//Round 1
use "W:\Work\EBCwork\ebcmonths\ebc`w'`i'", clear
reclink var1 var2 var3 var4 var5 using ///
"W:\Work\EBCwork\ebcmonths\disc`w'`i'", gen(myscore) idm(birthid) idu(discid)

replace myscore=. if myscore<.75
drop if myscore==.
save "W:\Work\EBCwork\ebcmonths\matches\match`w'`i' "

//Round 2
use "W:\Work\EBCwork\ebcmonths\ebc`w'`i'", clear
reclink var1 var2 var3 var4 var5 var 6 using ///
"W:\Work\EBCwork\ebcmonths\disc`w'`i'", gen(myscore) idm(birthid) idu(discid) ///
exclude ("W:\Work\EBCwork\ebcmonths\matches\match`w'`i' ")

replace myscore=. if myscore<.75
drop if myscore==.
save "W:\Work\EBCwork\ebcmonths\matches\match`w'`i' 2"

use "W:\Work\EBCwork\ebcmonths\matches\match`w'`i' ", clear
append using "W:\Work\EBCwork\ebcmonths\matches\match`w'`i' 2"
save "W:\Work\EBCwork\ebcmonths\matches\match`w'`i' ", replace

...and so on.

The code seems to be working fine and runs for some years without a hitch, but for some months, 02/1998 and 04/2001 for instance, Stata seizes up in the middle of a round of matching and gives me an error that looks like this:

Going through 7701 observations to assess fuzzy matches, each .=5% complete
...) required
r(100);

The point at which I get an error seems to be different every time this happens. Sometimes it makes it through 2 rounds of matching (within a loop), and sometimes it makes it through 3 or 4. Also, the percentage (# of dots) at which is stops is not consistent between errors. Does anyone have any insight as to what is going on here?

Thank you for any help!






puzzling out for num line of code

$
0
0
hi guys!

I recently received a *do file dating back to 2010 and there's a line of code I'm struggling with:

for num 1/6: by xwaveid: replace hhfxid=hhfxid[X] if hhfxid==""

where xwaveid is the crosswave ID (7-digit) and hhfxid is the father's xwaveid (text, 7-digit)

Specifically I'm unable to figure out what the requested change is / how to look for the change made

grateful for your suggestions

Merging more than two datasets in Stata

$
0
0
Hello: I am a beginner in Stata, and currently working with NHANES data in Stata, my question is can you combine more than two data sets on stata? I tried to use the merge command and the "combine data" tab but seems to merge only two data sets.

any help would be appreciated.
Thanks
Hadeel

Using stsplit for time-varying covariates in stcox

$
0
0
I am doing a large registry-based study, in which I am doing Cox-regression for mortality rates in hyperthyroid individuals compared to euthyroid individuals. Furthermore, I am splitting the hyperthyroid individuals into treated and untreated individuals, as to ascertain the effect of treatment on mortality. I have the euthyroid controls in one group (gruppe=1), the treated hyperthyroid individuals as gruppe=2, and the treated hyperthyroid individuals as gruppe=3. being in the treated group is defined as having redeemed a prescription for antithyroid medication at any point during the follow-up period.

My stset looks like this Array

My cox regression takes into consideration the age, sex and charlson comorbidity score: Array


Similar calculations have been done in another study.

Recently one of my supervisors pointed out that we might want to take in to consideration the time between the start of followup and the start of treatment, as the time being untreated can have an effect on mortality. We thus decided to have the treated hyperthyroid individuals function as untreated hyperthyroid individuals until the day treatment started. We have considered doing this by utilizing stsplit: Array

"eksd" being the date of treatment start (redeemed prescription). People who have not received treatment thus have eksd=. and have not been split.

I changed the group variable "gruppe" to 0 for the euthyroid individuals (controls), and 1 for the hyperthyroid individuals.

I now want to stcox the set, and compare the treated hyperthyroid individuals to the euthyroid individuals, the treated hyperthyroid individuals compared to the euthyroid individuals, and the treated compared to the untreated.

I am unsure how to go about this, specifically which variables to include, how to separate the treated and untreated hyperthyroid individuals, and if this even is a statistically valid approach. If not, suggestions for other approaches are much appreciated.

Multinomila logit with different sets of alternative specific variables

$
0
0
Hi.

I have a question regarding multinomial logit models in STATA. I'm trying to estimate a trip mode choice model in stata. The alternatives have different specific variables. For Instant the transit mode have repressors such as waiting time or interchange time while these are not applicable for walking mode and bicycle mode or car mode.

I have looked at modules "asclogit" and "nlogit" and it seems to me that the alternative specific variables need to be same variable for all alternatives (in my case modes) while my model does not allow that.



I also have tried to set the value of the variables to zero when they were not applicable for the specific modes which results in error message: “variable xxx is not alternative specific: it has not within-case variability"
Am I doing something wrong here?
I also have same type of question but in case of mlogit.
If I model a multinomial logit and for some alternatives, some of the repressors become insignificant, I would like to exclude them from the final model. Is there a way of doing this or am I making a mistake here?

Replace Sample vs Population

$
0
0
Hello, i need help. I have 4 variables i.e.
VRegS VRegP VregPmake and VRegSmake.
Capital S stands for sample while P stands for population

VRegS=vehicle registration for sample

VRegP= vehicle registration for population

VregPmake=...make of a vehicle from the population

VRegSmake=...make of a vehicle from sample
Sample size is 100
Population is 1000

So, I have 100 observations VRegS, 1000 observations of VRegP, and 1000 observations VRegPmake.

And
100 missing values of VRegSmake

So, I want to replace VRegSmake
How?

Here is the catch:
The 100 makes for S can be found in the 1000 makes for P.

I can't use

. replace VRegSmake = VRegPmake if VRegS == VRegP

Because S is smaller than P and the values are not matched one to one, even if I sort, you getting the point?

So, what should I do?

Subsetting data

$
0
0
Hello,
I'd appreciate if you could help me with Stata syntax for the following analysis.

I've got the HCV treatment dataset. I need to run analysis on the subset of patients who have received an approval to start treatment. There is a probability that the authorizations were issued more than one time during the time period by the treatment body/committee for the same patient. Thus, there are multiple records for the same patient ID.
For this purpose, a) I need to calculate the number of unique patients who have received at least once an approval to initiate the treatment course.
b) and if an approval was received >1 time, then those records corresponding to the most recent date of approval need to be included in the subset analysis.

NOTE: the database is not time-ordered ( this is a bummer)

Here are the variables:
patient_ID;
Approval_date ( The date when the approval to start treatment was issued);
Approval ( code "0" for YES and code "1" for NO)

For the a) part I'm doing following:
by patient_ID: egen cmt_approved = total(Approval == 0)
replace cmt_approved = 1 if cmt_approved > 0

then to figure out the unique # of patients with at least one approval, I run this commend :
bysort patient_ID Approval : gen ncommittee = _n == 1
tab Approval ncommittee , miss
and I generate the sample where those with " Approval==0 & ncommittee==1 " are included.
But I need to ensure the relevant records with the date of last approval is included in the sample. ( something like if april>march then keep that record within the same ID)
and this is the point where I am bog down

Thank you in advance for you assistance,
Regards,
Lia Gvinjilia


Print variable mean in graph export?

$
0
0
Dear All,
consider the following minimal example:

Code:
clear
webuse grunfeld, clear
g ret=invest/1500
bys company (year): g cumul=ret if _n==1
replace cumul=ret + l.cumul if missing(cumul)
keep if company <8

levelsof company, local(localcompany)

foreach i of local localcompany {
    twoway line cumul year if company == `i'
    graph export `"company_`i'_meaninvest.png"'   , replace

}
The code works just fine, however instead of printing "meaninvest", I would want it to print the mean of the variable invest (for each company). Based on an explanation by Nick Cox on how to access summary statistics, I tried to define a local variable and call it:


Code:
clear
webuse grunfeld, clear
g ret=invest/1500
bys company (year): g cumul=ret if _n==1
replace cumul=ret + l.cumul if missing(cumul)
keep if company <8


levelsof company, local(localcompany)



foreach i of local localcompany {
    twoway line cumul year    if company == `i'
    su invest                 if company == `i'
    local meaninvest = r(mean)
    graph export `"company_`i'_`meaninvest'.png"'   , replace

A weird error occurs
Code:
output-file suffix "9596429824829.png" not
    recognized
    specify correct suffix or specify as()
    option

Could anyone tell me how my task can be accomplished?

Thanks in advance!

Nested for loops

$
0
0
I would like some help with writing a nested for loop. So far this is what I have
foreach x of varlist A- J{
gen Share_`x' = 0
foreach y of varlist 1- 10{
replace Share_`x' = ( `y'/ `x')
}
}

What I want is to get 10 variables, each being '1/A', 2/B', '3/C' and so on. Right now this is giving me `1/A','2/A',`3/A'...
Is there a way to write the loop so I can get my desired result? Thank you!

Merge Sample Var with Population Var

$
0
0
Hello, i need help. I have 4 variables i.e.
VRegS VRegP VregPmake and VRegSmake.
Capital S stands for sample while P stands for population

VRegS=vehicle registration for sample

VRegP= vehicle registration for population

VregPmake=...make of a vehicle from the population

VRegSmake=...make of a vehicle from sample
Sample size is 100
Population is 1000

So, I have 100 observations VRegS, 1000 observations of VRegP, and 1000 observations VRegPmake.

And
100 missing values of VRegSmake

So, I want to replace VRegSmake
How?

Here is the catch:
The 100 makes for S can be found in the 1000 makes for P.

I can't use

. replace VRegSmake = VRegPmake if VRegS == VRegP

Because S is smaller than P and the values are not matched one to one, even if I sort, you getting the point?

So, what should I do?

understanding matsize

$
0
0
So im getting a matsize to small error..
first problem is I have matsize maxed out, at 800..
second problem, how do i calculate my number of variables in my matrix?

Model contains a total of 4,964 observation
anova DependentVariable(5 unique) CaseIdentifier(1415 unique) IndependentVariable(5 unique),repeted(IndependentVariable)
If someone could explain how its calculated, mostly for understanding I would appreciate it.
And if someone have a solution around the matsize i would love to hear it. as of now I kind of given up on doing a anova, since its creating problems while its as the most simple level.

Cheers

Generating lagged variables in mi data

$
0
0
I'm having trouble generating lagged variables in mi data. Here is an example where students are tested in reading and math at 3 different times, and I try to generate a variable that lags the scores by one period. I'm trying to follow the advice posted by Stata's Yulia Marchenko in http://www.stata.com/statalist/archi.../msg00213.html . If you look at the output, though, you'll see that the lagged variables are calculated from the observed data, not from the imputed data as I intended.

/* First I impute the data in wide format to account for the correlations among tests. Then I reshape the imputed data into long format again.
This works fine. I'm just including it so that we have some data to work with. */
use "http://www.ats.ucla.edu/stat/stata/faq/mi_longi.dta", clear
reshape wide read math, i(id) j(time)
order *, sequential
mi set wide
mi register imputed math1-math3 read1-read3
mi impute mvn math1-math3 read1-read3, add(2)
mi reshape long math read, i(id) j(time)

/* Now I try to calculate the lagged variables. This is what isn't working for me. */
mi tsset id time
mi xeq: sort id time; gen math_lag = L1.math; gen read_lag = L1.read
list in 1/3



Using prediction results from interval regression to compute Oaxaca decomposition.

$
0
0
I have an ordinal wage variable and I wish to use interval regression to obtain a predicted wage in interval form. I am using STATA `intreg' function to predict the wage from a set of explanatory variables. Interval regression fits a model of y = [depvar1, depvar2] to a set of independent variables where the depvar1 and depvar2 are the lower and upper class interval e.g. [$0 - $1000], [$1000 - $2000] ... so on. The `predict' function after `intreg' will predict wage in interval form based on the fitted model.

I intend to use the predicted wage to calculate the Oaxaca decomposition technique `oaxaca' to study wage differential by gender. I may, however, use the original wage in ordinal form, and instead of using `oaxaca', I may use `nldecompose' for non-linear model. However `nldecompose' is very limited in terms of decomposing differences by each contributing variables.

The problem is - the predicted values do not fall into the same category as its original form would. For example, for wage between $0-$500, the predicted Y minimum and maximum value is reported to be $669.95-$2871.99. How would you suggest me approach this problem please, many thanks.

Degrees-of-freedom adjustment for FEs in complex survey data

$
0
0
Hello,

I would highly appreciate if you could help me with the following problem:

Since the svy command does not support areg, I want to manually demean my variables (I have more than 35'000 groups, so I cannot use individual dummy variables). I found this previous thread in which the appropriate degrees-of-freedom adjustment for the right standard errors is discussed: http://www.stata.com/statalist/archi.../msg00652.html

While this adjustment also works if I include sample weights, I get wrong standard errors if try to adjust for clusters and stratification. Does thus anybody know a similar degrees-of-freedom adjustment for complex survey data?

Thanks,


Patrick

Using data not found (nearmrg)

$
0
0
Hello,

I am trying to use nearmrg on my data files and I keep getting the same error: "Using data not found". As I wanted to break down the problem I used very simple test data instead of my real data and the error message still shows up. Now I have the following:

Master.dta
Group Date
A 15.01.2012
A 15.02.2012
B 15.01.2012
B 15.02.2012
C 15.01.2012
C 15.02.2012

Using.dta
Group Date SVarOfInterest1 SVarOfInterest2
A 01.01.2012 1 201
A 15.01.2012 2 202
A 03.02.2012 3 203
A 23.02.2012 4 204
B 03.01.2012 11 211
B 19.01.2012 12 212
B 03.02.2012 13 213
C 20.01.2012 21 221
C 25.01.2012 22 222
C 04.02.2012 23 223
C 03.01.2012 24 224

This is the code:

nearmrg Group using Using.dta, nearvar(Date) genmatch(SourceDate) lower
using data not sorted
r(5);

Thanks in advance!
Maryna

Using data not sorted (nearmrg)

$
0
0
Hello,

I am trying to use nearmrg on my data files and I keep getting the same error: "Using data not sorted". As I wanted to break down the problem I used very simple test data instead of my real data and the error message still shows up. Now I have the following:

Master.dta
Group Date
A 15.01.2012
A 15.02.2012
B 15.01.2012
B 15.02.2012
C 15.01.2012
C 15.02.2012

Using.dta
Group Date SVarOfInterest1 SVarOfInterest2
A 01.01.2012 1 201
A 15.01.2012 2 202
A 03.02.2012 3 203
A 23.02.2012 4 204
B 03.01.2012 11 211
B 19.01.2012 12 212
B 03.02.2012 13 213
C 20.01.2012 21 221
C 25.01.2012 22 222
C 04.02.2012 23 223
C 03.01.2012 24 224

This is the code:

nearmrg Group using Using.dta, nearvar(Date) genmatch(SourceDate) lower
using data not sorted
r(5);

Thanks in advance!
Maryna

cdsimeq regression storing and exporting

$
0
0
Hello,
I am using cdsimeq command to estimate a simulatenous equation with probit endogenous variable ans i am not able to store and export output result with estout, esttab or outreg2. I would request a help to know how i can export estimating tables into latex or excel.
Thanks

Roc

probit model: partial effects of a one-SD increase on dep. variable

$
0
0
Dear Statalist community,

I'm using a probit model to analyze the determinants of free trade agreements (FTA). The dependent variable FTA is binary: 1, if an FTA between a country pair exists, 0 otherwise.
So far I'm using:
Code:
margins, dydx(*) atmeans
which gives me marginal effects on the response probability of FTA.

However, I would need the partial effect of a one standard deviation change (and not of a marginal change) in the indep. variable on the response probability of FTA.

Can you help me here?
Thank you.

Decomposing differences using nldecompose for nonlinear model

$
0
0
I am using nldecompose for decomposing differences in the outcome variable for nonlinear model. Has anyone worked with this command, may I know how you compute the contribution of each variables towards the differences in the outcome please.

For linear model, oaxaca command produces detailed results on the contribution of each variables, for example - how much of the gender wage differences is attributable to educational attainment. But I couldn't find out how this can be done on my nonlinear model.

Thank you.
Viewing all 72758 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>