Merging two rows

February 2, 2020, 5:36 am

≫ Next: Bivariat probit model using panel data

≪ Previous: Transform data or use a non-parametric analysis

Hi,

I've dataset like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id credit time)
12 1100 1101
12 1200 1102
13 1800 1303
13 475 1304
13 980 1305
13 600 1306
15 325 1201
15 478 1202
15 645 1203
15 741 1204
end

which two first 2 numbers of time are year(12=2012) and the second two numbers are month(01=jan)
I want to merge each two rows into one, something like this:

id credit time
12 (1100+1200=1300) 1101
13 (1800+475) 1303
13 (980+600) 1305
15 (325+478) 1201
15 (645+741) 1203

where actually 1100+1200 will replace with 1300
actually results observations will be half of the primary observations.

thanks a lot

↧

Bivariat probit model using panel data

February 2, 2020, 8:39 am

≫ Next: Coding for one relationship (Add Health WIV). Dropping people with the shortest relationship duration.

≪ Previous: Merging two rows

Dear,

I hope you are fine, maybe somebody know the command of a Bivariate using panel data structure please.

Best regards

↧

Coding for one relationship (Add Health WIV). Dropping people with the shortest relationship duration.

February 2, 2020, 10:46 am

≫ Next: Attention: Frank Lobue

≪ Previous: Bivariat probit model using panel data

I am using Wave IV of Add Health (the relationship sections are 16b and 16c), and I am trying to get one partner among all of the relationships reported. The data is in the long format, and I am trying to get it to wide, but with one partner (so not just converting from long to wide because I am deliberately choosing a specific partner). Generally, I am trying to pick the more serious relationship (if they say they are married but also dating someone else, I want to drop the dating relationship in favor of the marriage).

Specifically, I am using slide 33 of this presentation to guide my process https://www.cpc.unc.edu/projects/add...rence%20slides. I coded it so I am only using people that are in a current relationship. Then I coded if people reported that they are married, cohabiting or dating (also if they are with a partner that is pregnant I included them in dating). I also coded how many times they were married/cohabiting with this partner.

Where I am stuck is if they report more than one dating/cohabiting relationship, how to choose the relationship with the longest duration (which I coded in years). I used the command gsort by the participant ID, partner ID, relationship status, and relationship length. I am looking for a way to drop the partners with the shortest duration.

↧

Attention: Frank Lobue

February 2, 2020, 10:58 am

≫ Next: psmath2 and rbounds: estimating 'interaction effect' using Propensity Score Matching, and the sensitivity analysis

≪ Previous: Coding for one relationship (Add Health WIV). Dropping people with the shortest relationship duration.

By my count Frank Lobue -- which may be the name of a collective as one Anthony Santana sent me a related question directly -- is operating in six threads with variants of the same question.

Here they are:

https://www.statalist.org/forums/for...ndom-variables

https://www.statalist.org/forums/for...t-across-group

https://www.statalist.org/forums/for...d-save-results

https://www.statalist.org/forums/for...egression-loop

https://www.statalist.org/forums/for...treatment-arms

https://www.statalist.org/forums/for...in-range/page2

This is an extraordinary dissipation of effort, which probably sets a record in the history of Statalist.

Frank: Pleased that you found your way here, but here are some requests for your faster progress and our peace of mind: :

1. Please back up and study the FAQ Advice. The advice on bumping alone (#1 of https://www.statalist.org/forums/help#adviceextras) shows that you're hyper-bumping by posting so much. It's in nobody's best interests. least of all yours. Learning how to present code readably is also a good idea. See https://www.statalist.org/forums/help#stata on that.

2. Please explicitly sign out of 5 of those threads and summarize what you still want to know. I am exhausted just trying to keep track of where you are. 1534585 is a thread you started. so it's the best thread to continue.

↧

psmath2 and rbounds: estimating 'interaction effect' using Propensity Score Matching, and the sensitivity analysis

February 2, 2020, 11:39 am

≫ Next: Multi-Level Multiple Imputation Margins Help

≪ Previous: Attention: Frank Lobue

Hello,
I'm using Propensity Score Matching method to estimate a treatment effect (psmatch2 used). On top of that, I want to assess the robustness in the treatment effect to potential unobserved factors. Hence I use rbounds (Roesnbaum bounds analysis) following the estimation results from psmatch2. My question is: if I'm interested in knowing not just the main effect, but also heterogeneous effect across groups of individuals, how would the implementations be done with STATA?

So, let me start with plain model (only looking at main treatment effect). Below are the codes as an example:

Code:

psmatch2 treatment gender age X, out(wage)
gen delta = wage12- _wage12 if _treatment==1 & _support==1
rbounds delta, gamma(1 (0.1) 2)

Here, treatment indicates whether an individual received treatment or not, gender, age, and X, are the demographics and other covariates associated with each individual. I'm interested in outcome variable wage, and estimate how the impact of treatment is on the wage in the 12th period (indicated with wage12).

Now, if I want to know how the treatment effect varies across gender, how the PSM and Rosenbuam bounds analysis would be? That is, if we write a regression model:

Code:

wage ~ a*treatment + b*treatment*female + c*age + d*X + e

where the baseline gender is male, and coefficient b captures the differential effect for female, relative for male. I want to estimate b, and assess its robustness to unobservables. How should I write the psmatch2 and rbounds implementation? Thank you!!

↧

Multi-Level Multiple Imputation Margins Help

February 2, 2020, 12:26 pm

≫ Next: New package kmest on SSC

≪ Previous: psmath2 and rbounds: estimating 'interaction effect' using Propensity Score Matching, and the sensitivity analysis

Hello,

I am trying to run average marginal effects for my dependent variable "renstu". The model is multi-level by the school attended (sch_id). The data is already xtset and is imputed for missing variables. The "program" I am using is one extracted from a non-multi-level model, so I am wondering if the programming is wrong for a multi-level model?

program myprog1

xtlogit ftemp renstu3 i.artstotal i.sportstotal i.race i.gender c.ses ///
c.overstd c.gpa10 i.remedial i.apibcourse c.collres i.colltesta i.stuexp1 ///
i.parexp1 c.paracad i.famcomp ///
i.schctrl i.region c.schszln c.stdschool c.sesschool if f10!=1, re
margins, dydx(renstu3)
end
mi estimate, cmdock: myprog1

Once running the command, Stata responds with

varlist specification required

Any help is appreciated and this is on Stata 15.1.

Mitch

↧

New package kmest on SSC

February 2, 2020, 12:39 pm

≫ Next: xtreg fe, drop estimated constant in stored results?

≪ Previous: Multi-Level Multiple Imputation Margins Help

Thanks as always to Kit Baum, a new package kmest is now availble for download from SSC. In Stata, use the ssc command to do this.

The kmest package is described as below on my website, and estimates a vector of Kaplan-Meier survival probablities (without covariance estimates) as estimation results for input to the bootstrap or jackknife prefix. This is especially useful if the user wants to estimate Kaplan-Meier survival probabilities for clustered and/or sampling-probability-weighted sampling schemes. StataCorp did not seem to have written a command that did this. However, I would like to thank them loads for putting into place the foundations for other people to do so.

Best wishes

Roger

--------------------------------------------------------------------------------
package kmest from http://www.rogernewsonresources.org.uk/stata10
--------------------------------------------------------------------------------

TITLE
kmest: Compute Kaplan-Meier survival probabilities as estimation results

DESCRIPTION/AUTHOR(S)
kmest is intended for use in a survival time dataset set up by stset.
It computes Kaplan-Meier survival probabilities (as computed by sts
generate) for a list of times (sorted in ascending order), and saves
them as estimation results, without a variance matrix. kmest is
intended for use with the bootstrap prefix, or possibly with the
jackknife prefix, to create confidence intervals for the Kaplan-Meier
survival probabilities, possibly allowing for clustering and
sampling-probability weighting.

Author: Roger Newson
Distribution-Date: 23january2020
Stata-Version: 10

INSTALLATION FILES (click here to install)
kmest.ado
kmest.sthlp
--------------------------------------------------------------------------------
(click here to return to the previous screen)

↧

xtreg fe, drop estimated constant in stored results?

February 2, 2020, 12:43 pm

≫ Next: Solution to the small sample size?

≪ Previous: New package kmest on SSC

Hi,

I have panel data (firmid, timeid) and want a table in latex (using esttab) with two model specifications as described below:

i) "eststo: xtreg dep ind, fe"
ii) "collapse dep ind, by(firmid)" followed by "eststo: reg dep ind".

I want the table in latex to include the constant in (ii) but not in (i). Running "esttab , nocon" looses the constant in both.

Is there a way to drop the estimated constant for the first specification so I can simply run "esttab"?

Thanks!

↧

Solution to the small sample size?

February 2, 2020, 2:07 pm

≫ Next: grc1leg alternatives.

≪ Previous: xtreg fe, drop estimated constant in stored results?

Hi All,

I have tried searching for a similar case but haven't found it yet!
I need to run a regression for a project at university, and due to the subject I'm analyzing I have a relatively small sample size of 70 obs but many independent variables I would like to test.
Do you think it would be a good idea to split my independent variables into groups and test the dependent variable for each group at a time?
They could be splitted by a genre since each of these belongs to a different one (it concerns financial factors like profitability, liquidity and so on).

Do you think I risk heteroskedasticity and misspecification by proceeding in this way?

thank you in advance for your help!

↧

grc1leg alternatives.

February 3, 2020, 6:55 am

≫ Next: Negative R squared in IV regression

≪ Previous: Solution to the small sample size?

Dear All,

I am using Stata MP on my universities Linux based super computer and it does not have user written grc1leg installed on it. I want to use grc1leg instead of graph combine since I want all the graphs to have a combined legend. Is there an alternative way to achieve that? Or, can I "load" a user written command for the current session only?

I'll appreciate your advice.

Sincerely,
Sumedha.

↧

Negative R squared in IV regression

February 3, 2020, 7:17 am

≫ Next: GMM with Unit root data

≪ Previous: grc1leg alternatives.

Good afternoon everyone,

I am running an instrumental variable regression for panel data with the command xtivreg2 and everything looks fine: the Kleibergen-Paap and Hansen statistics indicate that the model is not underidentified and that the instruments are valid. My only concern is that the R squared is negative in some specifications (not in all of them).

I have read that a negative R squared is not necessarily problematic in IV, but it might be a signal that the model is misspecified. My question is, can I just ignore the negative R squared since my Kleibergen-Paap and Hansen tests are OK?

Thanks for your help

↧

GMM with Unit root data

February 3, 2020, 7:27 am

≫ Next: Generating an ID based on multiple variables

≪ Previous: Negative R squared in IV regression

Hello,
I have a question about the way I should treat unit root in unbalanced panel data analysis and estimating a GMM model

Preliminary:
I have run the fisher test on my key variable of interest x

Code:

xtfisher x, lag(1)

. It appears that unfortunately, it is a Unit Root as I could not reject the H_0 hypothesis.
Hence I transformed my data into d.x. Once transformed variable passes the fisher test and I can reject the H_o hypothesis

Code:

xtfisher D.x, lag(1)

.

The problem
In order to do some theory testing, I am first running a fixed-effects model and then I am using a difference GMM to deal with some endogeneity concerns. Both models should produce comparable results in order to allow conclusion

I have estimated the FE model like this

Code:

xtreg y D.l.x l.c i.Year, fe robust

. where y is my dependent variable, l.c are lagge controls, and D.l.x is the differenced lagged x. I have theoretical reason to enter all right hand side variable lagged.

The Question
my question is the following: how should I enter my variable in the differenced GMM model? should I enter it as it is because then all variables are differenced anyway?

In other words: should I estimate this model

Code:

xtabond2 y l.y l.x l.c yeardum*, gmm(y x c, lag(2 5) collapse) iv(yeardum*) noleveleq small noconstant robust

or this one to account for the UR of my x?

Code:

xtabond2 y l.y D.l.x l.c yeardum*, gmm(y D.x c, lag(2 5) collapse) iv(yeardum*) noleveleq small noconstant robust

thanks a lot in advance for your help

Best regards

↧

Generating an ID based on multiple variables

February 3, 2020, 7:46 am

≫ Next: Value weighted returns

≪ Previous: GMM with Unit root data

Hello Statalisters!

I am working with the Birth Recode survey of the Demographic and health surveys. It has individual panel data on women's birth histories -- includes unique id for the mother (uid), year of birth of the mother, each child (yobchild), birth order of the child (bidx), sex of the child (b4) etc. It basically has all of the mother's birth's recorded as a panel. IE mother 1 has child 1 in year 1972 with sex female, mother 1 had child 2 in year 1974 with sex male etc.

I wanted to create a dummy that identifies if the mother had her first child in the 1980s and keep that on until the survey in the 90s. I wrote the following code but it does not take into account children who are born after the first child.

Code:


gen jan180 = 0 
replace jan180 = 1 if yobchild >= 1980 & bidx == 1 // first child was born after 1980 Jan where bidx is birth order, yobch is the year of birth of the child

So the variable jan180 is 1 if the first child was born on or after 1980 and zero otherwise.

My question is, how do I replace jan180 == 1 for the subsequent children (bidx 2-10) but conditional on the fact that the mother's first child was born in 1980? This variable would then be indicative of the mother being fertile only after Jan 1980.

Thank you!

Lori

↧

Value weighted returns

February 3, 2020, 8:51 am

≫ Next: Create a subsample - Conditional, weighted random draw of observations, with repalcement

≪ Previous: Generating an ID based on multiple variables

Dear all,

I am analysing stock returns for approximately 700 companies in Japan. My data is ordered like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float firm_id long date str107 company double(stockprice market_value) float ret
1 200310 "77 BANK (#T) - 77 BANK (#T)" 3030 232266.8            .
1 200311 "77 BANK (#T) - 77 BANK (#T)" 2880 220768.4   -.04950495
1 200312 "77 BANK (#T) - 77 BANK (#T)" 2915 223451.4   .012152778
1 200401 "77 BANK (#T) - 77 BANK (#T)" 3020 231500.2   .036020584
1 200402 "77 BANK (#T) - 77 BANK (#T)" 2875 220385.1   -.04801324
1 200403 "77 BANK (#T) - 77 BANK (#T)" 3000 229967.1    .04347826
1 200404 "77 BANK (#T) - 77 BANK (#T)" 3280 251430.7    .09333333
1 200405 "77 BANK (#T) - 77 BANK (#T)" 3320 254496.9   .012195121
1 200406 "77 BANK (#T) - 77 BANK (#T)" 3340 256030.1   .006024096
1 200407 "77 BANK (#T) - 77 BANK (#T)" 3635 278643.5    .08832335
1 200408 "77 BANK (#T) - 77 BANK (#T)" 3540 271361.2    -.0261348
1 200409 "77 BANK (#T) - 77 BANK (#T)" 3250   249131    -.0819209
1 200410 "77 BANK (#T) - 77 BANK (#T)" 3185 244148.4         -.02
1 200411 "77 BANK (#T) - 77 BANK (#T)" 3165 242615.3  -.006279435
1 200412 "77 BANK (#T) - 77 BANK (#T)" 3310 253730.4    .04581359
1 200501 "77 BANK (#T) - 77 BANK (#T)" 3605 276343.8    .08912387
1 200502 "77 BANK (#T) - 77 BANK (#T)" 3765 288608.7     .0443828
1 200503 "77 BANK (#T) - 77 BANK (#T)" 3745 287075.6  -.005312085
1 200504 "77 BANK (#T) - 77 BANK (#T)" 3895 298573.9     .0400534
1 200505 "77 BANK (#T) - 77 BANK (#T)" 3575 274044.1   -.08215661
1 200506 "77 BANK (#T) - 77 BANK (#T)" 3455 264845.4  -.033566434
1 200507 "77 BANK (#T) - 77 BANK (#T)" 3435 263312.3  -.005788712
1 200508 "77 BANK (#T) - 77 BANK (#T)" 3325 254880.2   -.03202329
1 200509 "77 BANK (#T) - 77 BANK (#T)" 3790 290525.1    .13984962
1 200510 "77 BANK (#T) - 77 BANK (#T)" 4190 321187.4     .1055409
1 200511 "77 BANK (#T) - 77 BANK (#T)" 4810 368713.9    .14797136
1 200512 "77 BANK (#T) - 77 BANK (#T)" 4440 340351.3   -.07692308
1 200601 "77 BANK (#T) - 77 BANK (#T)" 4480 343417.6   .009009009
1 200602 "77 BANK (#T) - 77 BANK (#T)" 4425 339201.5  -.012276785
1 200603 "77 BANK (#T) - 77 BANK (#T)" 4250 325786.7   -.03954802
1 200604 "77 BANK (#T) - 77 BANK (#T)" 4610 353382.8    .08470588
1 200605 "77 BANK (#T) - 77 BANK (#T)" 4515 346100.5  -.020607375
1 200606 "77 BANK (#T) - 77 BANK (#T)" 4100 314288.4   -.09191584
1 200607 "77 BANK (#T) - 77 BANK (#T)" 3955 303173.3  -.035365853
1 200608 "77 BANK (#T) - 77 BANK (#T)" 4100 314288.4   .036662452
1 200609 "77 BANK (#T) - 77 BANK (#T)" 4270 327319.9    .04146342
1 200610 "77 BANK (#T) - 77 BANK (#T)" 4120 315821.5  -.035128806
1 200611 "77 BANK (#T) - 77 BANK (#T)" 3825 293208.1   -.07160194
1 200612 "77 BANK (#T) - 77 BANK (#T)" 3915 300107.1    .02352941
1 200701 "77 BANK (#T) - 77 BANK (#T)" 3775 289375.3  -.035759896
1 200702 "77 BANK (#T) - 77 BANK (#T)" 4065 311605.4     .0768212
1 200703 "77 BANK (#T) - 77 BANK (#T)" 4170 319654.3    .02583026
1 200704 "77 BANK (#T) - 77 BANK (#T)" 3805 291674.9   -.08752998
1 200705 "77 BANK (#T) - 77 BANK (#T)" 3965 303939.8    .04204993
1 200706 "77 BANK (#T) - 77 BANK (#T)" 4100 314288.4    .03404792
1 200707 "77 BANK (#T) - 77 BANK (#T)" 3975 304706.4  -.030487806
1 200708 "77 BANK (#T) - 77 BANK (#T)" 3915 300107.1   -.01509434
1 200709 "77 BANK (#T) - 77 BANK (#T)" 3905 299340.5 -.0025542784
1 200710 "77 BANK (#T) - 77 BANK (#T)" 3835 293974.6  -.017925736
1 200711 "77 BANK (#T) - 77 BANK (#T)" 3925 300873.6   .023468057
1 200712 "77 BANK (#T) - 77 BANK (#T)" 3710 284392.6   -.05477707
1 200801 "77 BANK (#T) - 77 BANK (#T)" 3495 267911.7   -.05795148
1 200802 "77 BANK (#T) - 77 BANK (#T)" 3320 254496.9   -.05007153
1 200803 "77 BANK (#T) - 77 BANK (#T)" 3000 229967.1   -.09638554
1 200804 "77 BANK (#T) - 77 BANK (#T)" 2885 221151.7   -.03833333
1 200805 "77 BANK (#T) - 77 BANK (#T)" 3060 234566.4    .06065858
1 200806 "77 BANK (#T) - 77 BANK (#T)" 3365 257946.4     .0996732
1 200807 "77 BANK (#T) - 77 BANK (#T)" 3390 259862.8    .00742942
1 200808 "77 BANK (#T) - 77 BANK (#T)" 3165 242615.3   -.06637168
1 200809 "77 BANK (#T) - 77 BANK (#T)" 3070   235333    -.0300158
1 200810 "77 BANK (#T) - 77 BANK (#T)" 2730 209270.1   -.11074919
1 200811 "77 BANK (#T) - 77 BANK (#T)" 2265 173625.1   -.17032968
1 200812 "77 BANK (#T) - 77 BANK (#T)" 2240 171708.8  -.011037528
1 200901 "77 BANK (#T) - 77 BANK (#T)" 2420 185506.8    .08035714
1 200902 "77 BANK (#T) - 77 BANK (#T)" 2235 171325.4   -.07644628
1 200903 "77 BANK (#T) - 77 BANK (#T)" 2260 173241.9   .011185682
1 200904 "77 BANK (#T) - 77 BANK (#T)" 2465 188956.4    .09070797
1 200905 "77 BANK (#T) - 77 BANK (#T)" 2475 189722.9   .004056795
1 200906 "77 BANK (#T) - 77 BANK (#T)" 2665 204287.5    .07676768
1 200907 "77 BANK (#T) - 77 BANK (#T)" 2805 215019.3    .05253283
1 200908 "77 BANK (#T) - 77 BANK (#T)" 2795 214252.8  -.003565062
1 200909 "77 BANK (#T) - 77 BANK (#T)" 2800   214636  .0017889087
1 200910 "77 BANK (#T) - 77 BANK (#T)" 2500 191639.3   -.10714286
1 200911 "77 BANK (#T) - 77 BANK (#T)" 2680 205437.4         .072
1 200912 "77 BANK (#T) - 77 BANK (#T)" 2700 206970.4   .007462686
1 201001 "77 BANK (#T) - 77 BANK (#T)" 2465 188956.4   -.08703703
1 201002 "77 BANK (#T) - 77 BANK (#T)" 2385 182823.9   -.03245436
1 201003 "77 BANK (#T) - 77 BANK (#T)" 2410 184740.3    .01048218
1 201004 "77 BANK (#T) - 77 BANK (#T)" 2665 204287.5    .10580913
1 201005 "77 BANK (#T) - 77 BANK (#T)" 2680 205437.4   .005628518
1 201006 "77 BANK (#T) - 77 BANK (#T)" 2355 180524.2   -.12126866
1 201007 "77 BANK (#T) - 77 BANK (#T)" 2365 181290.8  .0042462847
1 201008 "77 BANK (#T) - 77 BANK (#T)" 2290 175541.6  -.031712472
1 201009 "77 BANK (#T) - 77 BANK (#T)" 2130 163276.7     -.069869
1 201010 "77 BANK (#T) - 77 BANK (#T)" 2115 162126.9  -.007042253
1 201011 "77 BANK (#T) - 77 BANK (#T)" 1875 143729.5   -.11347517
1 201012 "77 BANK (#T) - 77 BANK (#T)" 2030 155611.1    .08266667
1 201101 "77 BANK (#T) - 77 BANK (#T)" 2155 165193.1    .06157636
1 201102 "77 BANK (#T) - 77 BANK (#T)" 2225 170558.9   .032482598
1 201103 "77 BANK (#T) - 77 BANK (#T)" 2615 200454.8     .1752809
1 201104 "77 BANK (#T) - 77 BANK (#T)" 2090 160210.4    -.2007648
1 201105 "77 BANK (#T) - 77 BANK (#T)" 1885 144496.1   -.09808613
1 201106 "77 BANK (#T) - 77 BANK (#T)" 1640 125715.4   -.12997347
1 201107 "77 BANK (#T) - 77 BANK (#T)" 1775 136063.9    .08231708
1 201108 "77 BANK (#T) - 77 BANK (#T)" 1730 132614.4  -.025352113
1 201109 "77 BANK (#T) - 77 BANK (#T)" 1570 120349.4   -.09248555
1 201110 "77 BANK (#T) - 77 BANK (#T)" 1655 126865.2    .05414013
1 201111 "77 BANK (#T) - 77 BANK (#T)" 1580   121116   -.04531722
1 201112 "77 BANK (#T) - 77 BANK (#T)" 1470 112683.9   -.06962025
1 201201 "77 BANK (#T) - 77 BANK (#T)" 1660 127248.5     .1292517
end

Based on this dataset we calculated the 12-1 momentum returns for each portfolio and then created equal weighted portfolios as follows:

tsset firm_id date
egen quintiles_momentum = xtile(cumret121), by(date) nq(10)

forvalues i = 1(1)10 {
egen ew_return_quin_`i' = mean(ret) if quintiles_momentum==`i', by (date)
}

collapse ew_return*, by(date)

For the next step in our analysis Iwould like to calculate value weighted returns based on the market value, could you help me out with this?

↧

Create a subsample - Conditional, weighted random draw of observations, with repalcement

February 3, 2020, 10:10 am

≫ Next: "variable __000002 already defined" after regression command

≪ Previous: Value weighted returns

Dear Statalisters

I want to create a random subsample of my original sample; I am basically drawing geographical area to which I will apply a treatment.
So, I have a few questions on random drawing:

Is there are a way to randomly draw observations from a larger dataset, to create a subsample?
If the answer to 1. is "yes", is it possible to carry out a conditional drawing, that is, observations with certain characteristics cannot be drawn? I guess that the simplest thing here is just to drop unwanted observations from the original sample.
What about weighted drawing, that is, observations with certain characteristics are more or less likely to be drawn?
What about drawing with and without replacement (some of the areas to which I will apply a treatment have a larger population, so I might be applying the tratment to them multiple times)?

I have been browsing through statalist material (e.g. online manuals) and Stata posts on statalist, but I cannot find what I am looking for.
It is the first time I try to create a subsample this way with Stata.

↧

"variable __000002 already defined" after regression command

February 3, 2020, 10:23 am

≫ Next: Spregress and esttab

≪ Previous: Create a subsample - Conditional, weighted random draw of observations, with repalcement

Whether executed from the command line or from within the sytax file (do-file), I get, for the following command, the error shown:

. xi: mepoisson event6 year##age_cat5_1##male if (parish_new != 3 & parish_new != 4 & parish_new != 7 & parish_new != 11) & year > 2014 , exp(pop_size_) || par_comm:
variable __000002 already defined
r(110);

I am using Stata 16 and I have successfully executed the command using other similar outcomes. Can anyone suggest reason(s) for this error and how it may be corrected or circumvented?

Regards

Novie

↧

Spregress and esttab

February 3, 2020, 10:30 am

≫ Next: Creating polynomials of unit specific time trends in an unbalanced panel

≪ Previous: "variable __000002 already defined" after regression command

Hello,

I am working with spatial data and trying to export my estimation results from the spregress command using the user-written package esttab (SSC). I am trying (and struggling) to manipulate esttab output to achieve two things:

Group spatial lags that were estimated using different weighting matrices together.
Rename the "groups" under which the variables are displayed.

Here is an example of the problem:

Code:

copy https://www.stata-press.com/data/r16/homicide1990.dta .
copy https://www.stata-press.com/data/r16/homicide1990_shp.dta .
use homicide1990
spmatrix create contiguity W
spmatrix create idistance W2

eststo model1: spregress hrate ln_population ln_pdensity gini, gs2sls dvarlag(W) ivarlag(W: gini) ivarlag(W2: ln_population ln_pdensity)
esttab model1

The table that I get looks like this:

Code:

----------------------------
                      (1)   
                    hrate   
----------------------------
hrate                       
ln_populat~n        0.894**
                   (2.66)   
ln_pdensity         0.179   
                   (0.54)   
gini                80.10***
                  (13.84)   
_cons              -34.25***
                  (-8.57)   
----------------------------
W                           
gini               -5.941   
                  (-1.58)   
hrate               0.301**
                   (2.58)   
----------------------------
W2                          
ln_populat~n       -0.521   
                  (-1.11)   
ln_pdensity         1.864   
                   (1.84)   
----------------------------
N                    1412   
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

What I would like to get instead is something like that:

Code:

----------------------------
                      (1)   
                    hrate   
----------------------------
hrate                       
ln_populat~n        0.894**
                   (2.66)   
ln_pdensity         0.179   
                   (0.54)   
gini                80.10***
                  (13.84)   
_cons              -34.25***
                  (-8.57)   
----------------------------
New name                         
gini               -5.941   
                  (-1.58)   
hrate               0.301**
                   (2.58)                         
ln_populat~n       -0.521   
                  (-1.11)   
ln_pdensity         1.864   
                   (1.84)   
----------------------------
N                    1412   
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

I've noticed that the variable names are stored in an unfamiliar to me notation which may hint as to what the possible solution is:

Code:

. di e(exogr)
hrate:ln_population hrate:ln_pdensity hrate:gini hrate:_cons exog*W:gini exog*W2:ln_population exog*W2:ln_pdensity endog*W:hrate

Using the rename option or the varlabel option does not help. Would anyone happen to have any suggestions on how to approach it?

↧

Creating polynomials of unit specific time trends in an unbalanced panel

February 3, 2020, 12:00 pm

≫ Next: Issues with getting minimum date value in a data set

≪ Previous: Spregress and esttab

Dear Statalist-members,

I have a bit of a beginner question: I have an unbalanced panel of several hundred localities, which I observe over several years, and I would like to do 2 things:
1) create a new set of variables containing locality-specific time trends. I know I could just interact localities with the time indicator, like

Code:

c.year##i.locality

and include this in the xtreg command, but I want to create a set of new variables containing the locality-specific time trends because of 2).
2) I then want to create the 2,3,4 and 5th polynomials of these locality-specific time trends.
Of course, I could manually create the time trends, but the number of localities is just too large. I am sure there is a more "elegant" way...

Thank you very much

Felix

↧

Issues with getting minimum date value in a data set

February 3, 2020, 12:28 pm

≫ Next: Appending Errors

≪ Previous: Creating polynomials of unit specific time trends in an unbalanced panel

Hi there I'm having issues trying to identify the minimum value of date of dispensation for each drug in the data set. Here's a sample data set and I've also attached a PDF copy to make it easier to read.

Rcpt_Anon_ID str8 DRUG_DIN double DSPN_AMT_QTY long DSPN_DATE double DSPN_DAY_SUPPLY_QTY str10 Prscb_Anon_ID str8 SUPP_DRUG_ATC_CODE
"000009106" "02261731" 3 18169 84 "390203576" "G03AA12"
"000009106" "02405628" 90 19971 90 "531829076" "C10AA07"
"000009106" "02353377" 180 19031 90 "277246846" "A10BA02"
"000009106" "02282445" 30 19400 30 "277246846" "N05CF01"
"000009106" "02405628" 90 20765 90 "277246846" "C10AA07"
"000009106" "02353377" 180 19838 90 "130332376" "A10BA02"
"000009106" "02405628" 90 21186 90 "385893456" "C10AA07"
"000009106" "02282445" 90 20974 90 "277246846" "N05CF01"
"000009106" "02353377" 180 19115 90 "277246846" "A10BA02"
.
Here's the code I ran to try and get a minimum date value for this data set

g opioid=substr(SUPP_DRUG_ATC_CODE,1,4)=="N02A"

. by Rcpt_Anon_ID: egen o_date = min(opioid) if opioid ==1

. format o_date %tdD_m_Y

. display o_date

after running the date, I'm not getting any values in return.

Any help would be great, thank you

↧

Appending Errors

February 3, 2020, 12:49 pm

≫ Next: Longitudinal data analysis

≪ Previous: Issues with getting minimum date value in a data set

Hello, I am trying to append several datasets together (yes they have been cleaned and are ready for appending) but keep getting the following error message:

variable E is str25 in master but byte in using data
You could specify append's force option to ignore this string/numeric mismatch. The using
variable would then be treated as if it contained "".
r(106);

Below is what I used for coding:

cd "/Users/dinardorodriguez/Desktop/CO"
/Users/dinardorodriguez/Desktop/CO

xls2dta , clear generate (newvar1): append using "/Users/dinardorodriguez/Desktop/CO"

Once I got the error I tried:

xls2dta , clear force: append using "/Users/dinardorodriguez/Desktop/CO"

but got the following error:

option force not allowed
r(198);

Any ideas how I can overcome this hiccup?

Thanks,

↧