IPTW Cox regression following MI

September 17, 2018, 4:42 pm

≫ Next: Descriptive statistics generated with "misum": Creating an excel table for imputed data

≪ Previous: Generate mean of a variable for each level of another variable

Dear Statalist,

I am trying to run an IPTW propensity score Cox regression following multiple imputation. I am trying to use an MIte approach where the propensity score model is run within each imputed dataset and ultimately combine the results using Rubin's rules¹. I have referred to the following post which discusses this however I am having issue with its implementation.

https://www.statalist.org/forums/for...opensity-score

This post suggests uses the following code;

Code:

mi xeq saving(miest) : logit treat_var covariates     // However this gives me the error message: invalid numlist r(121)

mi predict xb_mi using miest   // xb is the default
mi xeq: gen preprob =invlogit(xb_mi)

Using instead:

[CODE]
mi estimate, saving (miest, replace): logit treat_var covariates
mi predict xb using miest, xb
mi xeq: gen ps=invlogit(xb)
mi xeq: propwt treat_var, ipt
mi stset death_date [pweight=ipt_wt], failure(death) origin(diagnosis_date) scale(30.4)
mi xeq: stcox treat_var, vce(robust)
[CODE]

works, however there are 2 issues;

1) My dataset contains 12,834 observations, with missing data in 5348 subjects. After running the above code I obtain an estimate of the treatment effect from the Cox model, however this only uses 7486 observations (ie only those with complete data in mi 0).

2) Using mi estimate for the PS estimation model rather than mi xeq does not run the model within each imputed dataset and is therefore not using MIte.

Does anybody have any suggestions?

Thank you

² Leyrat C, Seaman S, White I, Douglas I, Smeeth L, Kim J, et al. Propensity score analysis with partially observed covariates: how should multiple imputation be used? Stat Methods Med Res. 2017;0(0):1-17.

↧

Descriptive statistics generated with "misum": Creating an excel table for imputed data

September 17, 2018, 6:21 pm

≫ Next: Regular expressions - extracting the ICD 9 from string variable

≪ Previous: IPTW Cox regression following MI

Hi list,

I am looking to generate a table that will allow easy comparison of changes in means and standard deviations across observed and imputed data for a list of variables.

Creating the first (left-hand) side of the table seems pretty straightforward:

tabstat `survey_items', stat(n mean sd) save
return list
matrix desc_stats = r(StatTotal)'
matlist desc_stats
putexcel set "${table_location}/Descriptive_statistics.xlsx", sheet("Sheet1") replace
putexcel A1= matrix(desc_stats), names overwritefmt

But I can't figure out how to generate a matrix with the same descriptive statistics for the imputed data. "misum" generates the statistics, but I am having difficulty making the matrix command work; the matrix is empty:

mi convert flong, clear
misum `survey_items'
matrix mi_desc_stats = r(StatTotal)'
matlist mi_desc_stats

This is what the last three commands output in the results window:

. misum `farmimp_items'

m=1/5 data

Variable | Mean SD min max N
-------------+-------------------------------------------------------
var1 | 2.719593 1.086147 .45611 5.109824 881
var2 | 2.949962 1.040069 .3450429 5.469518 881
var3 | 2.568165 1.140307 -.7103254 5.74613 881
var4 | 2.737817 .9577713 .6009374 4.83914 881
var5 | 3.478526 .7357405 1 5.017073 881
(etc. etc.)

. matrix mi_desc_stats = r(StatTotal)'

. matlist mi_desc_stats

| c1
-------------+-----------
r1 | .

Can anyone suggest what I might be doing wrong?

Best,
Ethan

↧

Regular expressions - extracting the ICD 9 from string variable

September 17, 2018, 7:09 pm

≫ Next: Heteroskedastic Test in Random Effects

≪ Previous: Descriptive statistics generated with "misum": Creating an excel table for imputed data

Dear Forum Members,

I am dealing with a huge dataset, whose main string variable presents the IC9 plus a description. But I just need to extract the ICD codes. The "extra" parcel of this string variable contains signs lilke "-" and "/", and sometimes the row just presents a description of a given illness, such as "AIDS", before specifying the ICD codes in the following rows.

Below, an excerpt of the data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str69 code int count
"AIDS / HIV"                                                  200
"042 - HUMAN IMMUNODEFICIENCY VIRUS [HIV] DISEASE"      200
"Alcohol abuse a general description"                                        700
"291.9 - UNSPECIFIED ALCOHOL-INDUCED MENTAL DISORDERS"   10
"303.90 - ALCOH DEP NEC/NOS-UNSPEC"                     190
"303.91 - ALCOH DEP NEC/NOS-CONTIN"                     200
"303.92 - ALCOH DEP NEC/NOS-EPISOD"                      100
"303.93 - ALCOH DEP NEC/NOS-REMISS"                     125
"305.00 - ALCOHOL ABUSE-UNSPEC"                        50
"305.01 - ALCOHOL ABUSE-CONTINUOUS"                      25
end

I want to get, say, a blank space in the first row, than 042, than a blank space, than 291.9, then 303.90, etc.

In short, sometimes just the first 3 numbers, sometimes 3 numbers plus a dot plut 1 number (or 2).

Fiddling with - regexr -, I tried this (rather inelegant) code, and it worked fine. This way, I could get rid of all words, leaving just the ICD codes.

Code:

split code, parse(-) gen(myCODE)
gen MYCODE = myCODE1
gen MYCODEICD = regexr(MYCODE, "[aA-zZ]+", " ")
gen MYCODEICD2 = regexr(MYCODEICD, "[aA-zZ]+", " ")
gen MYCODEICD3 = regexr(MYCODEICD2, "[aA-zZ]+", " ")
gen MYCODEICD4 = regexr(MYCODEICD3, "[aA-zZ]+", " ")
gen MYCODEICD5 = regexr(MYCODEICD4, "[aA-zZ]+", " ")
*/ Taking out the "/"
gen MYFINALCODE = regexr(MYCODEICD5, "/", " ")
*/ test to guarantee all ICD codes are numbers
destring MYFINALCODE, gen(test)

The original data set had more the five thousand rows. I gather the user-written - moss - can provide the solution (OK, problably without the dot for some ICD codes), and this way I could proceed with the analysis.

But I still think there must a better, faster and more elegant way to tackle this issue.

Thanks in advance.

↧

Heteroskedastic Test in Random Effects

September 17, 2018, 7:37 pm

≫ Next: Cost system in nlsur command

≪ Previous: Regular expressions - extracting the ICD 9 from string variable

Hello!

I have a unbalanced panel data with 123 cross sections and 247,904 observations . The observations are daily.

I need to make a regresion by random effects because I have dummy variables, so I can't estimate by fixed effects.

I tried to estimate with xtreghet, but I didn't have succeed. The message was
"matsize too small to create a [297539,1] matrix
r(908);"

I already set matsize 8000 and set emptycells drop, but I still did not succeed.

How I have to proced to test Heteroskedastic?

Thanks for the help!!

↧

Cost system in nlsur command

September 17, 2018, 8:05 pm

≫ Next: problems with data

≪ Previous: Heteroskedastic Test in Random Effects

Hi.

I'm quite a new user to Stata so I don't know exactly hot to interpret my results. I'm estimating a nonlinear SUR of a Cost equation with its respective share equations. Everything runs fine but I got one coefficient that shows its value to be 0 and the standard error displays (constrained). I don't know what this means and I cannot find anything wrong on my code.

Here are the results and attached you'll find the code:

FGNLS regression
Equation Obs Parms RMSE R-sq Constant

1 lnvc 1,714 . .1371535 0.9999* (none)
2 s1vc 1,714 . .0499717 0.9927* (none)
3 s2vc 1,714 . .029871 0.8944* (none)
4 s3vc 1,714 . .041627 0.9285* (none)

* Uncentered R-sq

Coef. Std. Err. z P>z [95% Conf. Interval]

/a0 18.41984 .0063245 2912.47 0.000 18.40745 18.43224
/A2 -.0396727 .0267935 -1.48 0.139 -.092187 .0128416
/A3 .0017111 .0043529 0.39 0.694 -.0068204 .0102425
/A4 .0015563 .0131165 0.12 0.906 -.0241516 .0272641
/A5 .0128759 .0143441 0.90 0.369 -.015238 .0409899
/A6 .0008374 .0072367 0.12 0.908 -.0133463 .0150211
/A7 .0106667 .0183763 0.58 0.562 -.0253502 .0466835
/A8 -.0049452 .0187845 -0.26 0.792 -.041762 .0318717
/A9 -.0081854 .0095137 -0.86 0.390 -.0268319 .0104611
/A10 -.0555401 .0366289 -1.52 0.129 -.1273314 .0162513
/A11 -.035149 .026951 -1.30 0.192 -.087972 .0176739
/A12 .0249516 .0212743 1.17 0.241 -.0167454 .0666485
/A13 -.0008874 .0021088 -0.42 0.674 -.0050206 .0032458
/A14 -.0212056 .0229981 -0.92 0.356 -.0662811 .0238699
/A15 .0186707 .0183737 1.02 0.310 -.017341 .0546825
/A16 -.0106454 .0136892 -0.78 0.437 -.0374758 .016185
/beta1a -.222152 .1240797 -1.79 0.073 -.4653438 .0210399
/beta2a -.0891937 .0580806 -1.54 0.125 -.2030296 .0246423
/beta3a .0923995 .0657517 1.41 0.160 -.0364715 .2212704
/beta1 .5564591 .0019501 285.35 0.000 .552637 .5602812
/beta2 .0800931 .0012391 64.64 0.000 .0776646 .0825216
/beta3 .1393816 .0014469 96.33 0.000 .1365457 .1422176
/gam .9885289 .0044281 223.24 0.000 .97985 .9972078
/gamt 0 (constrained)
/b11 .1582139 .0046336 34.14 0.000 .1491322 .1672956
/b12 -.0229883 .0027911 -8.24 0.000 -.0284588 -.0175178
/b13 -.0265272 .0024619 -10.78 0.000 -.0313524 -.0217019
/b22 .0386621 .0024685 15.66 0.000 .0338239 .0435003
/b23 -.0074005 .0015034 -4.92 0.000 -.010347 -.0044539
/b33 .0471558 .0020626 22.86 0.000 .0431133 .0511983
/by1 .0101199 .0012073 8.38 0.000 .0077537 .0124862
/by2 -.0053522 .0008809 -6.08 0.000 -.0070788 -.0036256
/by3 -.0104817 .001017 -10.31 0.000 -.012475 -.0084883
/yy .0368355 .0052046 7.08 0.000 .0266347 .0470364

.
Array

↧

problems with data

September 17, 2018, 8:10 pm

≫ Next: Interpretating Estimators of a Panel data regression with Random Effects

≪ Previous: Cost system in nlsur command

Hello,

Currently, I am working on a project for my econometrics project at uni. I am using data from the British Election Study, to be specific the BES2017_W13 dataset that was available online. We are trying to see which variables influence the probability of voting for the conservative party, however, there is a problem with the data set.
The variable profile_gross_personal, which is an int according to stata, contained intervals as the outcome. To overcome this strange problem I used the following commands:

tostring(profile_gross_personal), generate(gross)

tabulate gross

tabulate profile_gross_personal

gen income=0

replace income=2500 if gross=="1"

replace income=7500 if gross=="2"

replace income=12500 if gross=="3"

replace income=17500 if gross=="4"

replace income=22500 if gross=="5"

replace income=27500 if gross=="6"

replace income=32500 if gross=="7"

replace income=37500 if gross=="8"

replace income=42500 if gross=="9"

replace income=47500 if gross=="10"

replace income=55000 if gross=="11"

replace income=65000 if gross=="12"

replace income=85000 if gross=="13"

replace income=100000 if gross=="14"

drop if income==0 (is getting rid of the missing value that we cause by the generate command earlier)

And I continued this in order to get the average value in the interval as the outcome value. However, as a sanity check, I tried to run a basic regression. I used reg income england (dummy variable I created) in order to see if the process I did work. In this case, it worked and yielded me a regression outcome. However, when I run the logit income england I get a r(2000) error; outcome does not vary.
Does someone know a way around this problem?

my first guess that it was because of the fact that it was a float with strange values, since normally floats are only dummies. Hence, I used the following command:

recast int income, force

However, this cuts of all the values of income that are above 32500. Therefore, this workaround did not work and it did not solve the r2000 problem.

So my question is, does someone know a way to overcome the first problem w.r.t. the r2000 error or a solution to the initial problem that it was stored as a string but classified as an int? I also got some strange classifications in other variables, so this r2000 appears more often.
Thank you in Advance!

Kind Regards

↧

Interpretating Estimators of a Panel data regression with Random Effects

September 17, 2018, 8:16 pm

≫ Next: Creating grouping variable in long format

≪ Previous: problems with data

Greetings Community of Stata, This is my first post, I've been estimating Kaldorian laws for my country (colombia) which has 32 Departaments as Administrative territorial entities so i generated a Id_dpto variable to identify them across the time in the panel structure. This is a short panel since N>T Panel. (it has 16 years and 26 groups and arround 380 observations) also unbalanced.

So I've made a single panel regression with random effects (since hausmann test fail to reject the null hypothesis, P>chi^2= 0.6147) but i'm not interesting in the random intercept, I'm more interested in finding the impact of the growth of the industrial sector over the growth in the colombian economy as general (using an estimation of the pib of each departament over the industrial pib of each one). Results give me this.

Code:

.         xtreg G_y_dpto G_pib_s_ind_dpto, re

Random-effects GLS regression                   Number of obs     =        389
Group variable: id_dpto                                Number of groups  =         26

R-sq:                                                         Obs per group:
within  = 0.0107                                          min =          1
between = 0.1929                                        avg =       13.8
overall = 0.0094                                          max =         16

Wald chi2(1)      =       3.37
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0064


G_y_dpto                       Coef.   Std. Err.      z    P>z     [95% Conf. Interval]

G_pib_s_ind_dpto               .0533818   .0290758     1.94   0.006    -.0036058    .1103693
_cons                          .0389222   .0065268     5.96   0.000       .02613    .0517144

sigma_u           0
sigma_e   .12278606
rho           0   (fraction of variance due to u_i)

I'm wondering about the interpretation of my estimator related to "G_pib_s_ind_dpto" which is the growth of the industrial pib.

I followed the examples in -xtreg- about random effects. it gives an interpretation of the estimator like OLS. which is like "the return of the X has the impact of Bi over Y".

I followed Wooldrige book of panel data analysis and it has different interpretations over and over of the Bi coefficients.

So it would be wrong to say this ?. The growth of 1% of the industrial pib at a departamental level across the time, increases by 0.053% the growth of the departamental pib ceteris paribus.

if not... What would be the correct way to interpretate the estimator related to my independent variables?

↧

Creating grouping variable in long format

September 17, 2018, 8:23 pm

≫ Next: Trouble creating bar graph

≪ Previous: Interpretating Estimators of a Panel data regression with Random Effects

Hello, I am using Stata 15

I am trying to create a grouping variable that indicates whether participants were seen within days_to_dt1_pt <15 days , 15- 90 days and > 90 days. I am interested in only the days to the first treatment days_to_dt1_lpt). So I am not interested, for example, in the 16 and 30 for participant 1 as they would be categorized as < 15 . I want to only have one possible category for each participant. I can not figure out how to do this without creating a value for each row of data. For participant 2, they would be in the 15-90 day category but it would be recorded twice--again what I don't want.

My data look like:

id month days_to_ dt1_pt
1 1 10
1 2 16
1 3 30
1 4 .
1 5 .
2 1 .
2 2 46
2 3 50
2 4 .

What I tired is to create a 1 for each time the patient is in one of the categories. But as you can see below, code 2 would return a value for month 2 and 3 which i don't want. I am not sure how to handle this.

capture drop
gen early_vs_delay = .
replace early_vs_delay = 1 if days_to_dt1_pt < 15

capture drop early_vs_delay_1
gen early_vs_delay_1 = .
replace early_vs_delay_1 = 1 if days_to_dt1_pt > 14 & days_to_dt1_pt < 91

capture drop early_vs_delay_2
gen early_vs_delay_2 = .
replace early_vs_delay_2 = 1 if days_to_dt1_pt > 90 & days_to_dt1_pt < 1000

Then I tried to combine:

capture drop early_vs_delay_3 // this is to creatE a variable from combining the ones for above.
gen early_vs_delay_3 = .
replace early_vs_delay_3 = 1 if early_vs_delay ==1 // 2 week or less
replace early_vs_delay_3 = 2 if early_vs_delay_1 == 1 // 2 weeks to 90 days
replace early_vs_delay_3 = 0 if early_vs_delay_2 == 1

Please let me know if this information is sufficient or more detail is needed.

Thanks.

Jake

↧

Trouble creating bar graph

September 17, 2018, 10:26 pm

≫ Next: Sibling and head of household identification

≪ Previous: Creating grouping variable in long format

Hi all,

I am trying to create a bar graph where they y-axis represents the frequency of a binary (Y/N complication) outcome and the x-axis demonstrates each of the predictor variables. Ideally I would show each x-axis variable as having a "Yes" had a complication bar right next to a "No" had no complication bar. Instead my code gives me the following chart. Any advice?

Code:

graph bar (count) gender employ_main mechanism_of_injury rti_type smoker alcohol diabetes_yesno comorbidities___2 hiv_yesno, over (primary_reop)

↧

Sibling and head of household identification

September 17, 2018, 11:26 pm

≫ Next: Keeping all dummy variables for reghdfe

≪ Previous: Trouble creating bar graph

Hello, I am trying to generate a variable using cross-sectional household census. Example of the data I have on my hand (bold column are the ones I want to generate):

family ID	relation to household head	gender	age	year of birth	twin dummy	birth order	female head dummy
1	head	female	28	1990			1
1	son/daughter	male	2	2016	1	1	1
1	son/daughter	female	2	2016	1	1	1
2	head	male	35	1983			0
2	spouse	female	34	1984			0
2	son/daughter	male	6	2012	0	2	0
2	son/daughter	female	10	2008	0	1	0

I want to know if child is singleton or twin, and the order of birth (e.g. first child or second child etc.) And also dummy indicating if family head is female or not. I have tried to generate this dummy but it has values only for "head of household" individuals and other members of family got missing values.
Thanks in advance.

↧

Keeping all dummy variables for reghdfe

September 18, 2018, 12:46 am

≫ Next: Divide by hours from a datetime var

≪ Previous: Sibling and head of household identification

I want to run a regression that looks like this:

Code:

eststo: reghdfe y x i.year i.year#x, absorb(q) vce(cluster id)

However, I want to keep all the dummies i.e. suppress the interaction to 0, so I can plot a coefficient against year chart, together with the confidence intervals. I tried this:

Code:

fvset base none year

And then added noconstant option. However, reghdfe keeps dropping one of the years due to collinearity because I think it is estimating coefficient for the intercept/constant. Note: I am also using a lot of fixed effects, so it seems that while reg could keep all dummies (since it drops the intercept), it couldn't handle that many fixed effects.

↧

Divide by hours from a datetime var

September 18, 2018, 2:00 am

≫ Next: Itraclass correlation for multilevel models, gllamm

≪ Previous: Keeping all dummy variables for reghdfe

Hi everyone!

I am writing an article on patients admitted to the Intensive Care Unit (ICU), looking at different complications after admitted after severe trauma. I need to calculate the average diuresis (ml/kg/hour) for my patients. That is fairly easy for those days when patients are admitted for a full 24 hours. The last admission day varies depending on when patients are discharged.

I have a datetime variable (hours_admitted_last_day) which contains the time elapsed from start of last day in the ICU (0600 hours) until discharge. Another var is the amount of urine produced per day (dygnsdiures_). For the last admission day it represents the amount of urine produced until discharge the same day.

How do I extract the number of hours from my datetime var hours_admitted_last_day to use that to calculate ml/hour the last day in the ICU?

Hope my question makes sense,

This is an attempt to include a dataex (I don't understand why two variables are in parenthesis...)

pat_id = patient id
day = day in the ICU (day 0 = first day)
icu_out = time discharged from the ICU
hours_admitted_ = hours admitted respective day ( is at the moment "missing" for the last day in the ICU)
hours_admitted_last_day = datetime variable which I want to use to calculate urine production the last day in the ICU

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int pat_id byte day double icu_out float(hours_admitted_ hours_admitted_last_day) double dygnsdiures_
1 0 1.487457e+12 24  2.322e+08 1415
1 1 1.487457e+12 24  1.458e+08 1690
1 2 1.487457e+12  .   5.94e+07  800
1 3 1.487457e+12  .  -2.70e+07    .
1 4 1.487457e+12  . -1.134e+08    .
end
format %tc icu_out
format %tc hours_admitted_last_day

All the best,

Jesper

↧

Itraclass correlation for multilevel models, gllamm

September 18, 2018, 8:09 am

≫ Next: Display command

≪ Previous: Divide by hours from a datetime var

Hello,

I am trying to figure out how to calculate intraclass correlation (ICC) for a three level model using gllamm, a user-contributed command for multilevel modeling. gllamm does not accept the "estat icc" option. From what I am seeing online, I believe there is not a simple command for ICC with gllamm, and that I have to write out the formula for ICC. To start with, I am trying to find the unconditional ICC for the levels in my model.

Here is my unconditional model:

Code:

gllamm Garden_Active_, i(Garden_ID LGardenZip_) family(binomial) link(logit)  nip(30) adapt

And here is my output:

Code:

gllamm Garden_Active_, i(Garden_ID LGardenZip_) family(binomial) link(logit)  nip(30) adapt

Running adaptive quadrature
Iteration 0:    log likelihood = -2718.5586
Iteration 1:    log likelihood =  -2606.836
Iteration 2:    log likelihood = -2600.9528
Iteration 3:    log likelihood = -2600.8307
Iteration 4:    log likelihood = -2600.8307


Adaptive quadrature has converged, running Newton-Raphson
Iteration 0:   log likelihood = -2600.8307  
Iteration 1:   log likelihood = -2600.8307  (backed up)
Iteration 2:   log likelihood = -2600.7843  
Iteration 3:   log likelihood = -2600.7388  
Iteration 4:   log likelihood = -2600.7386  
 
number of level 1 units = 4035
number of level 2 units = 2432
number of level 3 units = 31
 
Condition Number = 2.1411013
 
gllamm model 
 
log likelihood = -2600.7386
 
--------------------------------------------------------------------------------
Garden_Active_ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
         _cons |  -.0779572   .0776656    -1.00   0.315     -.230179    .0742645
--------------------------------------------------------------------------------
 
 
Variances and covariances of random effects
------------------------------------------------------------------------------

 
***level 2 (Garden_ID)
 
    var(1): 4.6620217 (.59617179)
 
***level 3 (LGardenZip_)
 
    var(1): .01965498 (.03711558)
------------------------------------------------------------------------------

However, I'm not understanding the formula and what numbers I need to plug in to calculate ICC for my level 2 and 3. Could anyone please help?

Many thanks,
Alyssa

↧

Display command

September 18, 2018, 9:52 am

≫ Next: Tests with interaction variable

≪ Previous: Itraclass correlation for multilevel models, gllamm

Hello,

I've written a display command to my do file, such as:
di "whatever, does not matter"

When I execute it, on the main screen I can see both, the command and the output, which was expected. However, I would like the main screen to display ONLY the output. Is there any way to hide the command on the main screen?

Thank you.

↧

Tests with interaction variable

September 18, 2018, 10:09 am

≫ Next: Error in Running -xtcd- command

≪ Previous: Display command

Dear All,

I have a question, which is not strictly related to the usage of Stata. Suppose I have a model like the following:

y=c+a1x1+a2(x1)^2+error term

I want to study whether a non-linear impact of x1 on y exists. This is not a problem, of course. The problem arises if I suspect that non-linearity emerges because of a second variable, which "mediates" the impact of x on y. Hence I think of estimating the following:

y=c+a1x1+a2(x1)^2+b1m1+b2x1*m1+b3(x1)^2*m1+error term

If I rearrange the above term, I get:

y=c+[a1+b2*m1]*x1+[a2+b3*m1]*(x1)^2+error term

m1 is a continuous variable.

Suppose that I want to test the significance of [a2+b3*m1] to see if the quadratic term remains significant. The problem is that m1 is itself a variable. Hence the significance of the tests changes according to its values.

How can I ran the test? I thought of using lincom to get confidence interval too. But how I can deal with the problem of the variable m1 in the test? Should I take the mean of it (I am not sure about this)? Or should I run the test at the min and max values of m1?

Thanks in advance for your help.

↧

Error in Running -xtcd- command

September 18, 2018, 10:11 am

≫ Next: Create dummy variable if previous value exists

≪ Previous: Tests with interaction variable

Hi Dear,

I have been trying to test for cross sectional dependence in my variables using the pre-estimation command i.e. -xtcd varname-

However for one of my variables, when I run the command, it gives me the following error:

unknown function *sqrt()
r(133)

Can you please guide me about what is going wrong here and what can I do to fix it?
Thank you.

↧

Create dummy variable if previous value exists

September 18, 2018, 10:59 am

≫ Next: Influence statistics in Stata manual for xtreg postestimation

≪ Previous: Error in Running -xtcd- command

I have the following dataset where contract_no is a unique identifier of contracts between a group of firms. I want to create the variable prob with the help of a code that will be applied to my whole data set. Currently, I have manually created prob such that if firms in a given contract do not have a previous contract, the prob value is equal to 0, otherwise 1. Since the first two observations do not have previous contracts for firm 1 and 2, the prob value is zero. But in the next contract, i.e. contract_no 2, out of the three firms, two had previous contracts so prob assumes the value of 1 for firm 1 and 2 and zero for firm 3. The same goes for the rest of the data set. How can I code this?

Code:

clear
input float(contract_no year) byte firm float prob
1 1980 1 0
1 1980 2 0
2 1990 1 1
2 1990 2 1
2 1990 3 0
3 1995 1 1
3 1995 2 1
4 2000 7 0
4 2000 6 0
4 2000 5 0
4 2000 3 1
4 2000 1 1
end

↧

Influence statistics in Stata manual for xtreg postestimation

September 18, 2018, 11:10 am

≫ Next: Combining strings by groups

≪ Previous: Create dummy variable if previous value exists

Dear Statalists,

The Stata manual for xtreg postestimation provides a description for predict as follows:

predictions, residuals, influence statistics, and other diagnostic measures

It mentions "influence statistics". However, I do not find information relevant to influence statistics in the manual for xtreg postestimation. Predict has some syntaxes for fitted values and residuals, but I do not think that they can be used for influence statistics due to the absence of information on leverage and standardized residual. So may I ask what the "influence statistics" in the manual refers to?

After reading lots of materials, I have not found statistical methods for detecting influential points in panel regressions. Cook's Distance and DIFFITS are designed for cross-sectional regressions, and the lack of agreed definition on standardized residuals for panel data further complicates this problem.

So can I say that there are no good approaches to detect influential points in panel regressions?

Many thanks!

↧

Combining strings by groups

September 18, 2018, 11:51 am

≫ Next: Exporting regression table to Latex with -esttab-: How to stack regressions vertically

≪ Previous: Influence statistics in Stata manual for xtreg postestimation

Hello! I am seeking your help with the following task. My data set (example given below) consists of comments (string) identified by listing_id and month. What would be the appropriate way of combining (appending) all comments for a given listing_id by month?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long listing_id strL comments byte month
2515 "A"   1
2515 "B"   1
2515 "C"   2
2515 "D"   4
2515 "E"   4
2515 "F"   5
2515 "G"   6
2515 "H"   6
2515 "I"   7
2515 "J"   7
2515 "K"   7
2515 "L"   7
2515 "M"   8
2515 "N"   8
2515 "O"   9
2515 "P"  10
2515 "Q"  11
2515 "R"  11
2539 "S"  12
2595 "T"   3
2595 "U"   4
2595 "V"   5
2595 "W"   9
2595 "X"   9
3330 "Y"   1
3330 "Z"   4
3330 "AA"  4
3330 "BB"  5
3330 "CC"  5
3330 "DD"  9
3330 "EE" 10
3330 "FF" 12
3831 "GG"  1
3831 "HH"  5
3831 "II"  7
3831 "JJ"  8
3831 "KK"  8
3831 "LL"  8
3831 "MM"  8
end

Thankfully,
Anton

↧

Exporting regression table to Latex with -esttab-: How to stack regressions vertically

September 18, 2018, 12:46 pm

≫ Next: Pooled, fixed effect or random effect

≪ Previous: Combining strings by groups

Hi everyone,

I am running regressions for multiple y variables against multiple x variables. Trying to figure out the best way to automate exporting to Latex, so that the end results looks like the table below. That is, the columns are all the dependent variables, and the rows are different specifications stacked vertically.

Array
The way I am doing it right now involves a lot of manual copy-pasting, which is repetitive and error-prone. I'm running the following code with -esttab-, and copying segments of the tex outputs ("x1.tex", "x2.tex") to a new tex file to construct the table above. Is there a way to directly construct the table though? I think maybe the trick is to use the -append- option, but can't figure out how exactly to do it. Any help would be greatly appreciated!

Code:

sysuse auto, clear

rename (weight length) (x1 x2)
rename (price mpg headroom) (y1 y2 y3)

local xlist x1 x2
local ylist y1 y2 y3

foreach x of local xlist {
    eststo clear
    foreach y of local ylist {
        eststo: reg `y' `x' 
        }
    esttab using "`x'.tex", label booktabs not nonotes nonumber r2 replace
    }

↧