Merge: Two time-series datasets with different unit of analysis (days vs time periods)

September 22, 2016, 11:09 am

≫ Next: Adding xline or yline with "by" option

≪ Previous: Combining 2 Variables into 1

Hello forum,

I have two time-series data of students that I need to merge. One contains the number of students’ misbehavior per day (BehaviorData-daily) and the other contains the time period and location where students were housed (HousedData). The problem with the HousedData is that only has time periods, not daily data (see below). I need to assign the misbehavior count to the corresponding date period. So if in my example below if the student misbehaved on April 9, it should be next to the April 7 to Apr 10.

1131600265	4-Apr-16	6-Apr-16
1131600265	7-Apr-16	10-Apr-16
1131600265	11-Apr-16	13-Apr-16

I guess there are two options here. One would be to generate daily observation of my HousedData and then merge. Another option would be to work first using the behavior data, and assign the time range (ex April 7 to Apr 10) to the specific date of the misbehavior. How can I do this using my strategies? Or perhaps there are another more efficient ways to do this? I would appreciate any help.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long ID int(startdt enddt) str2 room byte Ineed
         .     .     . ""   .
1131600265 20548 20550 "a1" 6
1131600265 20551 20554 "a1" 2
1131600265 20555 20557 "a2" .
1131600265 20558 20561 "a2" .
1131600265 20562 20564 "a2" .
1131600473 20583 20585 "a2" .
1131600473 20586 20589 "a1" .
1131600473 20590 20592 "a1" 1
1131600473 20593 20596 "a1" 2
1131600473 20597 20599 "a2" .
1131600265 20548 20550 "a1" .
1131600265 20551 20554 "a2" .
1131600265 20555 20557 "a3" .
1131600265 20558 20561 "a4" .
1131600265 20562 20564 "a5" 1
end
format %tddd-Mon-YY startdt
format %tddd-Mon-YY enddt

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long ID int behavior byte count
         .     . .
1131600265 20548 3
1131600265 20549 1
1131600265 20550 2
1131600265 20551 1
1131600265 20552 1
1131600473 20590 1
1131600473 20593 1
1131600473 20594 1
1131600265 20563 1
end
format %tdnn/dd/CCYY behavior

↧

Adding xline or yline with "by" option

September 22, 2016, 11:12 am

≫ Next: Hausman Interpretation - Multinomial LR

≪ Previous: Merge: Two time-series datasets with different unit of analysis (days vs time periods)

I have a twoway plot that I draw for several years. I am trying to draw an xline at the median value for each year. How can I do that.

As an example

Code:

twoway scatter income yrs_education, by (year)

I want each of the 5 years to have an xline at the median income level for THAT year. So this is xline at a different value for each year- one xline per graph.

I know I could save the median value in a local and call that but that seems to draw multiple lines in every graph.

I appreciate the help!

↧

Hausman Interpretation - Multinomial LR

September 22, 2016, 11:38 am

≫ Next: Problem with thh Predict command after estimating an MGarch Model

≪ Previous: Adding xline or yline with "by" option

Hello,

I've got a question about interpreting a Hausman test of the IIA assumption in MLR. My outcome has 3 categories: no arrest, one arrest, dual arrest. This is the output:

Hausman tests of IIA assumption (N=316308)

Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives

| chi2 df P>chi2

none | 21.862 18 0.238
one arrest | 47.549 18 0.000
dual arrest | 90.744 18 0.000

Note: A significant test is evidence against Ho.

I see that both one arrest and dual arrest violate the IIA assumptions. I'm curious about what would be an appropriate model to use given this test. I was thinking nested logit, but not sure... Basically what i would assume is that the choice to arrest or not is made, and then the choice to arrest one or both (dual arrest). Any input you all have would be appreciated.

Thanks

↧

Problem with thh Predict command after estimating an MGarch Model

September 22, 2016, 11:38 am

≫ Next: Bootstrap with overlapping clusters

≪ Previous: Hausman Interpretation - Multinomial LR

I have successfully run a MGARCH DCC model (using STATA14) but when I try to implement the Predict Command (i.e. Predict H*, variance ) to enable me to calculate the conditional correlations the error message appears: "H_rpgbre_rpgr already defined, " (rpgbre and rpgr are two of the four variables modelled.) I do not understand why this appears or what to do about it. If I try to go directly oo conditional correlation (from the drop down menu) I ahev to enter new varibels names but I dont how to do that or where to fien with this.

↧

Bootstrap with overlapping clusters

September 22, 2016, 11:49 am

≫ Next: AIDS model using nlsur command: R(480)

≪ Previous: Problem with thh Predict command after estimating an MGarch Model

I am trying to run a fixed effects regression with 2 clustering variables. These variables overlap as in the following:

ID	CLUSTERVAR1	CLUSTERVAR2
1	A	B
2	A	C
3	A	D
4	B	C
5	B	D
6	C	D

For example, pairs (1,4) and (1,5) overlap in VAR1 and VAR2 due to the clustering value 'B'.

How do I account for this overlap when clustering?

I use bootstrap in the following Stata code and it works, but I am unsure whether it is actually accounting for the overlapping clusters. The help file doesn't seem to address this.

bootstrap, cluster(var1 var2) idcluster(myclid) group(id) seed(22): xtreg y x1 x2 i.year, fe

Let me know if there is any other commands or documentation that I could reference. Thanks in advance for your help.

↧

AIDS model using nlsur command: R(480)

September 22, 2016, 12:33 pm

≫ Next: Interpreting contrasts in margins with interaction terms

≪ Previous: Bootstrap with overlapping clusters

Dear Statistitians,
I am trying to do AIDS model using nlsur commad, but error 480 keeps returning, although I have NO missing values.

Calculating NLS estimates...
could not evaluate equation 1
starting values invalid or some RHS variables have missing values
r(480);

Did anyone solve similar issue? Thank you very much for your help.
Hana

Code:

program nlsuraids




version 13
   
syntax varlist(min=14 max=14) if, at(name)
tokenize `varlist'
args w1 w2 w3 w4 w5 w6 lnp1 lnp2 lnp3 lnp4 lnp5 lnp6 lnp7 lnm




tempname a1 a2 a3 a4 a5 a6 a7
scalar `a1' = `at'[1,1]
scalar `a2' = `at'[1,2]
scalar `a3' = `at'[1,3]
scalar `a4' = `at'[1,4]
scalar `a5' = `at'[1,5]
scalar `a6' = `at'[1,6]
scalar `a7' = 1 - `a1' - `a2' - `a3'- `a4'- `a5'- `a6'




tempname b1 b2 b3 b4 b5 b6
scalar `b1' = `at'[1,7]
scalar `b2' = `at'[1,8]
scalar `b3' = `at'[1,9]
scalar `b4' = `at'[1,10]
scalar `b5' = `at'[1,11]
scalar `b6' = `at'[1,12]




tempname g11 g12 g13 g14 g15 g16 g17
tempname g21 g22 g23 g24 g25 g26 g27
tempname g31 g32 g33 g34 g35 g36 g37
tempname g41 g42 g43 g44 g45 g46 g47
tempname g51 g52 g53 g54 g55 g56 g57
tempname g61 g62 g63 g64 g65 g66 g67
tempname g71 g72 g73 g74 g75 g76 g77




scalar `g11' = `at'[1,13]
scalar `g12' = `at'[1,14]
scalar `g13' = `at'[1,15]
scalar `g14' = `at'[1,16]
scalar `g13' = `at'[1,17]
scalar `g15' = `at'[1,18]
scalar `g16' = `at'[1,19]
scalar `g17' = -`g11'-`g12'-`g13'-`g14'-`g15'-`g16'




scalar `g21' = `g12'
scalar `g22' = `at'[1,20]
scalar `g23' = `at'[1,21]
scalar `g24' = `at'[1,22]
scalar `g25' = `at'[1,23]
scalar `g26' = `at'[1,24]
scalar `g27' = -`g21'-`g22'-`g23'-`g24'-`g25'-`g26'
 




scalar `g31' = `g13'
scalar `g32' = `g23'
scalar `g33' = `at'[1,25]
scalar `g34' = `at'[1,26]
scalar `g35' = `at'[1,27]
scalar `g36' = `at'[1,28]
scalar `g37' = -`g31'-`g32'-`g33'-`g34'-`g35'-`g36'




scalar `g41' = `g14'
scalar `g42' = `g24'
scalar `g43' = `g34'
scalar `g44' = `at'[1,29]
scalar `g45' = `at'[1,30]
scalar `g46' = `at'[1,31]
scalar `g47' = -`g41'-`g42'-`g43'-`g44'-`g45'-`g46'




scalar `g51' = `g15'
scalar `g52' = `g25'
scalar `g53' = `g35'
scalar `g54' = `g45'
scalar `g55' = `at'[1,32]
scalar `g56' = `at'[1,33]
scalar `g57' = -`g51'-`g52'-`g53'-`g54'-`g55'-`g56'




scalar `g61' = `g16'
scalar `g62' = `g26'
scalar `g63' = `g36'
scalar `g64' = `g46'
scalar `g65' = `g56'
scalar `g66' = `at'[1,34]
scalar `g67' = -`g61'-`g62'-`g63'-`g64'-`g65'-`g66'




scalar `g71' = `g17'
scalar `g72' = `g27'
scalar `g73' = `g37'
scalar `g74' = `g47'
scalar `g75' = `g57'
scalar `g76' = `g67'
scalar `g77' = -`g71'-`g72'-`g73'-`g74'-`g75'-`g76'




quietly {
  tempvar lnpindex
  gen double `lnpindex' = 5 + `a1'*`lnp1' + `a2'*`lnp2' + `a3'*`lnp3' + `a4'*`lnp4' + `a5'*`lnp5' + `a6'*`lnp6'+ `a7'*`lnp7'
 
  forvalues i = 1/7 {
    forvalues j = 1/7 {
      replace `lnpindex' = `lnpindex' + 0.5*`g`i'`j''*`lnp`i''*`lnp`j''
         }
       }
     




replace `w1' = `a1' + `g11'*`lnp1' + `g12'*`lnp2' + `g13'*`lnp3' + `g14'*`lnp4' + `g15'*`lnp5' + `g16'*`lnp6' + `g17'*`lnp7'+`b1'*(`lnm' - `lnpindex')




replace `w2' = `a2' + `g21'*`lnp1' + `g22'*`lnp2' + `g23'*`lnp3' + `g24'*`lnp4' + `g25'*`lnp5' + `g26'*`lnp6' + `g27'*`lnp7' +`b2'*(`lnm' - `lnpindex')




replace `w3' = `a3' + `g31'*`lnp1' + `g32'*`lnp2' + `g33'*`lnp3' + `g34'*`lnp4' + `g35'*`lnp5' + `g36'*`lnp6' + `g37'*`lnp7' +`b3'*(`lnm' - `lnpindex')




replace `w4' = `a4' + `g41'*`lnp1' + `g42'*`lnp2' + `g43'*`lnp3' + `g44'*`lnp4' + `g45'*`lnp5' + `g46'*`lnp6' + `g47'*`lnp7' +`b4'*(`lnm' - `lnpindex')




replace `w5' = `a5' + `g51'*`lnp1' + `g52'*`lnp2' + `g53'*`lnp3' + `g54'*`lnp4' + `g55'*`lnp5' + `g56'*`lnp6' + `g57'*`lnp7' +`b5'*(`lnm' - `lnpindex')




replace `w6' = `a6' + `g61'*`lnp6' + `g62'*`lnp2' + `g63'*`lnp3' + `g64'*`lnp4' + `g65'*`lnp5' + `g66'*`lnp6' + `g67'*`lnp7' +`b6'*(`lnm' - `lnpindex')


}
end

Code:

nlsur aids @ w1 w2 w3 w4 w5 w6 lnp1 lnp2 lnp3 lnp4 lnp5 lnp6 lnp7 lnm, parameters(a1 a2 a3 a4 a5 a6  b1 b2 b3 b4 b5 b6  g12 g13 g14 g15 g16 g22 g23 g24 g25 g26 g33 g34 g35 g36 g44 g45 g46 g55 g56 g66) neq(6) ifgnls

↧

Interpreting contrasts in margins with interaction terms

September 22, 2016, 12:38 pm

≫ Next: Matching values in corresponding columns

≪ Previous: AIDS model using nlsur command: R(480)

Dear all,
I am investigating the effects of a continuous#categorical interaction on the binary outcome "aki2". What I am looking to show is how the probability of aki2 changes across different values in the continuous variable (c.log_avl), as a function of the interaction term (i.it_type). Running margins and marginsplot defines this relationship nicely.

logistic aki2 c.log_avl#i.it_type i.agecat male race i.bmicat i.cci_cat Auto_CKD_Preop i.renal i.clavien_cat
quietly margins it_type, at(log_avl=(-2(0.1)6))
marginsplot

Array

So far so good.

Visually, when I inspect the curves 2.it_type and 1.it_type, the confidence intervals separate at the x-axis (log_avl) value of ~1.7. However, my goal is attach statistical proof to that visual observation. When I use the contrast term with margins, I don't get the results I expect. The graph of the difference in curves follow the expected shape; however, the confidence intervals are almost non-existent, and the curves develop statistically significant separation from one another at very small values of x. Why is that? I would expect the red curve below begin to have confidence intervals different from zero at roughly an x-axis value of 1.7. However, that's not what's shown. This happens irrespective of how I specify the contrast option, and irrespective of any correction for multiple comparisons. Any ideas? Many thanks for any help

margins rb1.it_type, at(log_avl=(-2(0.1)6)) mcompare(bonferroni)
marginsplot, yline(0)

Array

Best,
Julien

↧

Matching values in corresponding columns

September 22, 2016, 1:30 pm

≫ Next: power calculation: R-squared in sampsi command

≪ Previous: Interpreting contrasts in margins with interaction terms

I have a giant data set (~80K rows by ~32000 columns) of data on arrest histories of individuals. What I need to do is determine whether an individual gets rearrested after they're released from the their first incarceration. To do this, I need to determine the individual's first arrest that led to an incarceration. Ideally, what would happen is that I would look at a date variable and look at a corresponding variable that has a "verdict code" in it. If the verdict code indicated that the person was guilty, then I would know the person would be incarcerated and could look for the first rearrest after the end of the incarceration.

However, the structure of the data is a bit irregular and makes this task difficult in Stata. I'm an R user by default, but this has to be done in Stata. The main problem is that it's hard to match a column with a date to the column with the corresponding verdict code ("GY" for "guilty").

If this were R, I would do something like this:

Code:

 
 firstIncarcerationArrestDate <- min(arrestDates[verdictCodes == "GY"], na.rm = T)

where arrestDates is a vector created by unlist-ing all the columns containing arrestDates. I can't figure out an analogous way to do this in Stata

Below is a small snapshot of the dataset in Stata showing the first five arrest data columns and the first five verdict code columns. For example, the fourth row shows an instance of being arrested on three different charges on 04apr2000. One of the corresponding verdict codes to that date is "GY", so I'd consider 04apr2000 to indicate an incarceration. However, because of multiple arrests and multiple charges per arrest, the arrest and verdict code data is spread out over hundreds of columns. What I need is a way to match, say arrestDate3 to verdictCode3 in a way similar to the R code above.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(unifcrimhist_arrestdat_1 unifcrimhist_arrestdat_2 unifcrimhist_arrestdat_3 unifcrimhist_arrestdat_4) str8(unifcrimhist_verdictcd_1 unifcrimhist_verdictcd_2 unifcrimhist_verdictcd_3 unifcrimhist_verdictcd_4)
18234     .     . 20020 "TM" ""   ""   "NO"
15557 20382 14656 20267 "TM" ""   "43" ""  
17470     . 17470 19107 "TM" ""   "GY" "GC"
14704     . 14880 14704 "TM" ""   "NO" "GY"
17139     . 17139 15884 "TM" ""   "SI" "DM"
16692 15932     .     . "TM" "32" ""   ""  
17704 17704     .     . "TM" "GC" ""   ""  
17000 14988     .     . "TM" "GY" ""   ""  
15456 15456 15456 15456 "TM" "GY" "GY" "SI"
16390 18349     . 18349 "TM" "NO" ""   "SV"
18005 16547 18005 16111 "TM" "NO" "GY" "GY"
17059 17059 16921 17059 "TM" "NO" "GY" "GY"
14913 17347 17123 15022 "TM" "NO" "NO" "GY"
17932 17159     . 17932 "TM" "NP" ""   "NO"
17601 18068 17959 17959 "TM" "SI" "NO" "NO"
16516 16635 17847 17847 "TM" "TM" "GY" "GY"
14986     .     .     . "TR" ""   ""   ""  
16373 16636 16636     . "TR" "GY" "GY" ""  
end
format %td unifcrimhist_arrestdat_1
format %td unifcrimhist_arrestdat_2
format %td unifcrimhist_arrestdat_3
format %td unifcrimhist_arrestdat_4

↧

power calculation: R-squared in sampsi command

September 22, 2016, 1:32 pm

≫ Next: Weighted mean with confidence interval

≪ Previous: Matching values in corresponding columns

Dear statalisters,
I'm performing power calcs for an education RCT with a continuous outcome variable. Lets assume my final specification will be an OLS regression with the treatment assignment and several covariates, such as pre-randomization absences, grade level, school, LEP, etc.

In the past I have used sampsi/sampclus for my power calcs. However, now I want to account for the Predictive power of individual-level covariates (R-squared), for which sampsi has no subcommand. Would it be correct to use the r01(#) subcommand instead, using the square root of the R-squared?

* from help sampsi: "r01(#) specifies the correlation between baseline and follow-up measurements in a repeated-measure study. For a repeated-measure study, either r01(#) or r1(#) must be specified. If r01(#) is not specified, sampsi assumes that r01() = r1()"

The code would look something like this:
reg y c.covar1 i.covar2 i.covar3, cluster(hh_id)
local R = sqrt(`e(r2)')
sampsi 0 0.20, alpha(.05) power(.8) sd(1) r01(`R') pre(1) post(1)

Actually the experiment is a clustered randomization, so the complete power calculation code would include
sampclus, obsclus(`obs') rho(`Rho')
... but I think that is irrelevant for my question.

thanks
Gonzalo

↧

Weighted mean with confidence interval

September 22, 2016, 1:35 pm

≫ Next: Line graph fitted values from categorical variables

≪ Previous: power calculation: R-squared in sampsi command

Hello!

I have calculated different proportion estimates with 95% confidence intervals. Now i want to calculate a mean of these different proportions with 95% confidence intervals. Also, when doing so, i want to weight the three different proportions according to my own liking.

How do i do this?

I have searched google, as well as through stata's search command. I found the CI command, but can't see how i can use this to my purpose. I am new to Stata and to statistics in general, and i know this question might be very basic to many on this forum. I hope however that someone would be willing to help me or point me in the right direction.

Regards Adam

↧

Line graph fitted values from categorical variables

September 22, 2016, 3:10 pm

≫ Next: conditional drop

≪ Previous: Weighted mean with confidence interval

Hi

I'm trying to create a line graph with predicted (fitted) values from a regression where the dependent variable is continuous (log earnings) but the explanatory variable is categorical (educational attainments).

The graph I get looks like a "nest", with every point connected to the others, and is unreadable.

Array

I do not understand why this is happening.
While playing around with my data, I managed to have it appear "clean" once, but ever since this has been happening, even though I don't think I changed anything.

I use stata 12.

Here is my code:

reg logearn i.educrec, r
predict discretefitted
twoway (line discretefitted educyears)

Thank you so much!

↧

conditional drop

September 22, 2016, 3:30 pm

≫ Next: Syntax for converting zip into urban/rural categoricals

≪ Previous: Line graph fitted values from categorical variables

Hello. I want to drop some rows based on somewhat complicated conditions. Attached is the snapshot of unbalanced panel data of 3 firms. As you can see Firm 1 (ONT) does not have any value for the year 2005 and 2006 therefore its DSCD_Matched for these empty years is indicated a #N/A. Similarly Firm 2 (DDD) has missing values for the year 2007 and 2008. Finally Firm 3 (AIA) has all the firm year observations missing.

My row dropping criteria should be set as it should only drop rows of the only the firms which have few years missing. Such as Firm 1 and Firm 2 has few years available data and few years missing data, therefore STATA should only delete the missing years data for these firms. Also for the Firm 3 STATA should keep all the data. More precisely I only want to drop the missing value rows of only those firms which have some available data for some years and some missing data for some years. And I want to keep those firms which have all the data missing for all the years. Please help. Thank you.

Array

↧

Syntax for converting zip into urban/rural categoricals

September 22, 2016, 3:32 pm

≫ Next: Multimarket contact counts

≪ Previous: conditional drop

Hello,
I am attempting to convert US Zip Codes into categorical variables related to urban vs. rural status. Would anyone be willing and able to share existing syntax for this? The Census categories of "urban cluster, urbanized area, rural" would be great, though syntax reflecting other established systems used in epi/public health research would also be quite helpful. As you might imagine, fleshing this out for each zip code would be quite arduous, so any help is greatly appreciated!

↧

Multimarket contact counts

September 22, 2016, 3:35 pm

≫ Next: Assigning variables to observations

≪ Previous: Syntax for converting zip into urban/rural categoricals

Hi everyone,

I have been trying to perform a task, but I have failed miserably, and would like to know if you guys can give me a hand. I have Stata 14 in a PC.

Say, you have a dataset with different markets (msaid) and banks (bankid) like this:

msaid	bankid	b01	b02	b03	b04	b05	b06	b07	b08	b09	b10
1	1	4	2
1	2	2	3
1	3	3
1	4	2
2	1	4
2	3	3
2	5
2	10
2	4
2	7
2	6
3	1
3	2
3	4
4	1
4	3
4	5
4	7
4	9
4	10
5	2
5	4
5	6
5	8

I created 10 extra vaiables representing the 10 banks (b01-b10). I need to fill these variables as follows: In b01 I want to count the number of multimarket contact that bank 1 has with bankid. For example:

if bankid=1, then b01 is the number of markets bank 1 shows up in;
If bankid=2 then b01 would be the number of markets that bank 1 meets bank 2,
and so on and so forth...

I completed some cells by hand, so you get the idea. But of course I have hundreds of markets and dozens of banks. The procedure for the other 9 variables is analogous.

Any thoughts?

↧

Assigning variables to observations

September 23, 2016, 8:48 am

≫ Next: blabel and mlabel not allowed in twoway bar graphs?

≪ Previous: Multimarket contact counts

Dear all,

I'm using Stata (Stata 12; Windows 8.1) for the first time and therefore I have some issues. Possibly it's a very basic question and I would like to apologize for any inconvenience.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 Underlying str25 Buyer str9 BuyOrSell double CreationDateTime float NoOfSellOrBuyTRX
"AT0000606306" "DE000LS9DRJ4" "OrderBuy"  1770887377000 1
"AT0000606306" "DE000LS9CUD3" "OrderSell" 1773574656000 3
"AT0000606306" "DE000LS9CUY9" "OrderSell" 1773235216000 3
"AT0000606306" "DE000LS9ETU5" "OrderSell" 1773827720000 3
"AT0000613005" "DE000LS9H697" "OrderBuy"  1773325591000 1
"AT0000613005" "DE000LS9EUF4" "OrderSell" 1767777965000 2
"AT0000613005" "DE000LS9HP25" "OrderSell" 1769937762000 2
"AT0000644505" "DE000LS9EVV9" "OrderBuy"  1774347616000 2
"AT0000644505" "DE000LS9ENA0" "OrderBuy"  1774429842000 2
"AT0000730007" "DE000LS9ENA0" "OrderBuy"  1767520608000 1
"AT0000730007" "DE000LS9BLP8" "OrderSell" 1771431478000 1
"AT0000741053" "DE000LS9CUD3" "OrderBuy"  1774378121000 1
"AT0000741053" "DE000LS9GHP5" "OrderSell" 1772447841000 1
"AT0000743059" "DE000LS9DRN6" "OrderBuy"  1768063251000 2
"AT0000743059" "DE000LS9BHW2" "OrderBuy"  1774426117000 2
"AT0000743059" "DE000LS9BQE1" "OrderSell" 1774854081000 7
"AT0000743059" "DE000LS9BTG0" "OrderSell" 1769008627000 7
"AT0000743059" "DE000LS9BCX1" "OrderSell" 1769008374000 7
"AT0000743059" "DE000LS9HS06" "OrderSell" 1774213156000 7
"AT0000743059" "DE000LS9ELD8" "OrderSell" 1768319746000 7
"AT0000743059" "DE000LS9BPF0" "OrderSell" 1768986824000 7
"AT0000743059" "DE000LS9FZL8" "OrderSell" 1768756118000 7
"AT0000746409" "DE000LS9BHW2" "OrderBuy"  1773138054000 1
"AT0000746409" "DE000LS9FZL8" "OrderSell" 1768756712000 1
"AT0000758305" "DE000LS9EVV9" "OrderBuy"  1770126371000 1
"AT0000758305" "DE000LS9ENA0" "OrderSell" 1770132442000 1
"AT0000809058" "DE000LS9BHW2" "OrderBuy"  1773823572000 3
"AT0000809058" "DE000LS9JCN5" "OrderBuy"  1771247183000 3
"AT0000809058" "DE000LS9BSN8" "OrderBuy"  1770131012000 3
"AT0000809058" "DE000LS9CNZ1" "OrderSell" 1773584915000 1
"AT0000818802" "DE000LS9ENA0" "OrderSell" 1774196577000 2
"AT0000818802" "DE000LS9BWF6" "OrderSell" 1770561761000 2
"AT0000821103" "DE000LS9BHW2" "OrderBuy"  1769939526000 2
"AT0000821103" "DE000LS9BSN8" "OrderBuy"  1770130986000 2
"AT0000837307" "DE000LS9CUY9" "OrderBuy"  1772633356000 2
"AT0000837307" "DE000LS9BHW2" "OrderBuy"  1773051589000 2
"AT0000837307" "DE000LS9GJY3" "OrderSell" 1768723521000 1
"AT0000908504" "DE000LS9ENA0" "OrderBuy"  1773834461000 5
"AT0000908504" "DE000LS9JB86" "OrderBuy"  1771529117000 5
"AT0000908504" "DE000LS9CUY9" "OrderBuy"  1773840643000 5
"AT0000908504" "DE000LS9BHW2" "OrderBuy"  1774444347000 5
"AT0000908504" "DE000LS9HS06" "OrderBuy"  1773163793000 5
"AT0000911805" "DE000LS9BPX3" "OrderSell" 1773159001000 3
"AT0000911805" "DE000LS9H754" "OrderSell" 1769617565000 3
"AT0000911805" "DE000LS9DVW9" "OrderSell" 1770567566000 3
"AT0000922554" "DE000LS9GKF0" "OrderBuy"  1773305691000 3
"AT0000922554" "DE000LS9HZ15" "OrderBuy"  1768729108000 3
"AT0000922554" "DE000LS9EVV9" "OrderBuy"  1771421954000 3
"AT0000922554" "DE000LS9ENA0" "OrderSell" 1772985169000 2
"AT0000922554" "DE000LS9CCQ3" "OrderSell" 1768727968000 2
end
format %tcCCYY-NN-DD_HH:MM:SS.sss CreationDateTime

With the help of

by Underlying BuyOrSell, sort: gen NoOfSellOrBuyTRX = _n

by Underlying BuyOrSell, sort: replace NoOfSellOrBuyTRX = NoOfSellOrBuyTRX1[_N]

I was able to generate the number of sell orders and buy orders for each Underlying in one variable (column). Now i want to assign both, the number of sell orders and the number of buy order to each observation (underlying) as variable. The final result should look like this (but for all observations):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 Underlying str25 Buyer str9 BuyOrSell double CreationDateTime float(NoOfSellOrBuyTRX NoOfBuyOrders NoOfSellOrders)
"AT0000606306" "DE000LS9DRJ4" "OrderBuy"  1770887377000 1 1 3
"AT0000606306" "DE000LS9CUD3" "OrderSell" 1773574656000 3 1 3
"AT0000606306" "DE000LS9CUY9" "OrderSell" 1773235216000 3 1 3
"AT0000606306" "DE000LS9ETU5" "OrderSell" 1773827720000 3 1 3
"AT0000613005" "DE000LS9H697" "OrderBuy"  1773325591000 1 1 2
"AT0000613005" "DE000LS9EUF4" "OrderSell" 1767777965000 2 1 2
"AT0000613005" "DE000LS9HP25" "OrderSell" 1769937762000 2 1 2
"AT0000644505" "DE000LS9EVV9" "OrderBuy"  1774347616000 2 2 0
"AT0000644505" "DE000LS9ENA0" "OrderBuy"  1774429842000 2 2 0
"AT0000730007" "DE000LS9ENA0" "OrderBuy"  1767520608000 1 1 1
"AT0000730007" "DE000LS9BLP8" "OrderSell" 1771431478000 1 1 1
"AT0000741053" "DE000LS9CUD3" "OrderBuy"  1774378121000 1 1 1
"AT0000741053" "DE000LS9GHP5" "OrderSell" 1772447841000 1 1 1
"AT0000743059" "DE000LS9DRN6" "OrderBuy"  1768063251000 2 2 7
"AT0000743059" "DE000LS9BHW2" "OrderBuy"  1774426117000 2 2 7
"AT0000743059" "DE000LS9BQE1" "OrderSell" 1774854081000 7 2 7
"AT0000743059" "DE000LS9BTG0" "OrderSell" 1769008627000 7 2 7
"AT0000743059" "DE000LS9BCX1" "OrderSell" 1769008374000 7 2 7
"AT0000743059" "DE000LS9HS06" "OrderSell" 1774213156000 7 2 7
"AT0000743059" "DE000LS9ELD8" "OrderSell" 1768319746000 7 2 7
"AT0000743059" "DE000LS9BPF0" "OrderSell" 1768986824000 7 2 7
"AT0000743059" "DE000LS9FZL8" "OrderSell" 1768756118000 7 2 7
end
format %tcCCYY-NN-DD_HH:MM:SS.sss CreationDateTime

Thank you in advance.

Best regards.

↧

blabel and mlabel not allowed in twoway bar graphs?

September 23, 2016, 8:52 am

≫ Next: Contrast at the mean

≪ Previous: Assigning variables to observations

Hello,

I am trying to create a simple twoway bar graph with confidence intervals over two categories. I want to add a label showing sample size (N) on top of each of the bar graphs, but I get an error saying that mlab or blabel not allowed. How can I add just the count of individuals in each group as a label on top of the bar graphs?

I first collapse the data as shown below:

collapse (mean) meanretain_year3= retain_year3 (sd) sdretain_year3=retain_year3 (count) n=retain_year3, by(belongtreat disadvgrp)

Then I generate the CI's:

generate hiretain_year3 = meanretain_year3 + invttail(n-1,0.025)*(sdretain_year3 / sqrt(n))
generate loretain_year3 = meanretain_year3 - invttail(n-1,0.025)*(sdretain_year3 / sqrt(n))

Then I combine the two groups (disadvgrp, belongtreat) into one category - each of those have 2 sub categories

generate disadvgrptreat = belongtreat if disadvgrp == 0
replace disadvgrptreat = belongtreat+5 if disadvgrp == 1

Graph works as expected below:

twoway (bar meanretain_year3 disadvgrptreat if belongtreat==1) ///
(bar meanretain_year3 disadvgrptreat if belongtreat==0) ///
(rcap hiretain_year3 loretain_year3 disadvgrptreat), ///
legend(row(1) order(1 "Intervention" 2 "Control" ) ) ///
xlabel( 0 "Advantaged Students" 5 "Disadvantaged students", noticks) ///
xtitle("Experimental Condition by Demographic Groups") title("Continuous Enrollment Over 2 years post-intervention")

But, both these do not seem to work:

mlabel(tolabel) mlabpos(12) mlabcolor(white) xla(0 1, tlcolor(none) valuelabel)
blabel(total, position(inside) format(%9.1f) color(white))

I have a variable called n in my collapsed dataset that I want to add as a label to the bar graphs on top.

I appreciate any help with my question above. I also apologize if this has been answered before - I could not find responses to a similar question posted on statalist before.

Thank you,
Maithreyi

↧

Contrast at the mean

September 23, 2016, 8:58 am

≫ Next: Duplicates Within Groups

≪ Previous: blabel and mlabel not allowed in twoway bar graphs?

Using Stata MP 14.1 under Win 7E. I typically use margins for jobs like the following, but I'm not exactly sure how to accomplish it using that approach. So contrasts may be easier. I'd like to run contrasts after logit that compare each level of a categorical variable with the previous level. Pretty straightforward in the single IV case:

Code:

logit y a
contrast{a -1 1 0},effects
contrast{0 -1 1},effects

It gets more complicated if there are two IVs and one wants to run the contrast at the mean of the second IV

Code:

logit y a b
contrast{a -1 1}, effects /*something is missing here-need average of b*/
contrast{a 0 -1 1},effects /*something is missing here-need average of b*/

What needs to be added?

↧

Duplicates Within Groups

September 23, 2016, 9:26 am

≫ Next: Comparing ICC across dependent groups

≪ Previous: Contrast at the mean

I am a new STATA user with minimal experience. Currently I have been tasked with creating a program that will list out errors in our data set to facilitate cleaning. One key problem I am hung up on is finding a way to list out duplicates within multiple levels of another variable. For example, we have multiple clusters (cluster) randomly numbered between 1 and 300. Within each cluster there are households (household) that are numbered from 1-20. How would I write code that would loop through each cluster and find duplicate households associated with the cluster variable?

↧

Comparing ICC across dependent groups

September 23, 2016, 9:47 am

≫ Next: Creating a Stata data file from a JSON formatted file

≪ Previous: Duplicates Within Groups

Hello,

I have run a two-way mixed effects model to estimate ICCs for two different rating methods (new and old). Both methods utilize the same group of students (n=50). I now want to compare the two ICCs to determine whether reliability significantly differs across the new and old rating methods. Although the results provide 95% CIs, I am not confident that this is sufficient for comparison. Any thoughts on how to proceed?

Thank you,

Chris

↧

Creating a Stata data file from a JSON formatted file

September 23, 2016, 9:54 am

≫ Next: Export Very Long Local to CSV

≪ Previous: Comparing ICC across dependent groups

Hi,

I've seen other posts on this topic but they haven't helped for the particular file I'm working with. Here is the link to that file:
https://raw.githubusercontent.com/va...PTSD_FY15.json.

I've tried to simply import the file. I saved the text from the weblink and created a .json file in the do-file editor. The code ((import delimited using "PTSD FY15.json") created a file with lots of variables and no observations (I didn't specify a delimiter although commas separate the observations). I thought maybe the commas would be viewed automatically as the delimiter.
Stata returns: import delimited using "PTSD FY15.json"
(13,448 vars, 0 obs)

I installed and tried the insheetjson command. It returns this: insheetjson using "https://raw.githubusercontent.com/vacobrydsk/VHA-Files/master/NEPEC_AnnualDataSheet_PTSD_FY15.json"
3362 observations updated/written. When I try to save this file, Stata returns 'no vars defined' r(111)... which of course is true since no variables show up in the variables window.. I'm using Stata 14.1 (Windows).

So, one strategy returns vars and no observations while the other returns observations and no variables. Sadly, these clues haven't been enough for me to solve the puzzle.

Thanks,

Eric

↧