Quantcast
Channel: Statalist
Viewing all 72776 articles
Browse latest View live

Cox model with competing risk error message

$
0
0
Dear all,

I am trying to run a competing model with stata and have always the same error, please see above, I don't know if someone could help me.

Thanks by advance

Sophie

My script is

egen facid_merge = group(facid5 patcens), label

gen status=0
replace status=2 if all_died_7==1
replace status=1 if hosptype_new==1

stset eventdt1, origin (start_dt) f(status==1) scale (365.25) id(facid_merge)

stcrreg mage, compete (status==2)

The stset part runs well, it is the part with stcrreg that gives the error message :

option compete(): competing risks events must be stset as censored
r(459);


Interpretation of path analysis [direct, indirect and total effect]

$
0
0
Dear Statalist respected users,

I am doing path analysis using SEM and I estimated 3 different models. In the first model, I have one causal variable X, one mediator M and the outcome variable Y.
In the second model, I have one causal variable X, 2 mediators M1 and M2 [2 different paths], and outcome variable Y.
The third model is non-recursive in which I used an instrumental variable (IV) that influences the causal variable X, a mediator M and I allowed for the reverse causality between X and Y.

I am quite confused in interpreting the indirect effect (ab) (the product of multiplying the coefficient (a) of the impact of X on M and the coefficient (b) of the impact of M on Y).

(a) is negative and significant: this comes in accordance with theory.
(b) is positive and significant: this contradicts the theory.

the product (ab) is negative and statistically significant

direct effect (C') is positive and significant.

total effect (C) is negative and insignificant.

Results are robust in the three models, the indirect effect is negative and significant, the direct effect is positive and significant and the total effect is insignificant.

My research question is to see if there is a mediation effect from M on the relationship between X and Y.

the interpretation of (b) coefficient is my problem, I am not sure if I can say that X reduces M given the negative association between X and M, then the reduced M improves Y [given the positive association]?
according to theory, the increase in M should deteriorate Y and vice versa.

Thanks a lot for your interest.

I am looking forward to hearing from you.

Kind regards,
Mohammed

how to make groups of non-missing observations

$
0
0
Dear Stata Users,

Please, help me with the following issue. I want to create a variable "group" that will unite a firm ("gvkey") and "cons_year" so that there are no missing values of "cons_year". For the data below what I expect is as follows:

Expected Result:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gvkey double fyear float(p_v_decile cons_year group)
"001009" 1992 3 . .
"001009" 1994 . . .
"001013" 1992 4 . .
"001013" 1993 3 . .
"001013" 1994 5 1 1
"001013" 1995 5 2 1
"001013" 1996 5 3 1
"001013" 1997 4 . .
"001013" 1998 5 1 2
"001013" 1999 5 2 2
"001013" 2000 5 3 2
"001013" 2001 3 . .
"001013" 2002 1 . .
"001013" 2003 4 . .
"001013" 2004 5 1 3
"001013" 2005 5 2 3
"001013" 2006 2 . .
"001013" 2007 2 . .
"001013" 2008 2 . .
"001013" 2009 2 . .
"001013" 2010 3 . .
"001034" 1992 4 . .
"001034" 1993 4 . .
"001034" 1995 . . .
"001034" 1997 . . .
"001034" 1998 . . .
"001034" 1999 5 1 4
"001034" 2000 5 2 4
end


What I have now:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gvkey double fyear float(p_v_decile cons_year)
"001009" 1992 3 .
"001009" 1994 . .
"001013" 1992 4 .
"001013" 1993 3 .
"001013" 1994 5 1
"001013" 1995 5 2
"001013" 1996 5 3
"001013" 1997 4 .
"001013" 1998 5 1
"001013" 1999 5 2
"001013" 2000 5 3
"001013" 2001 3 .
"001013" 2002 1 .
"001013" 2003 4 .
"001013" 2004 5 1
"001013" 2005 5 2
"001013" 2006 2 .
"001013" 2007 2 .
"001013" 2008 2 .
"001013" 2009 2 .
"001013" 2010 3 .
"001034" 1992 4 .
"001034" 1993 4 .
"001034" 1995 . .
"001034" 1997 . .
"001034" 1998 . .
"001034" 1999 5 1
"001034" 2000 5 2
end

Loop within a loop?

$
0
0
Hi,

I'm aware this potentially may be quite trivial so apologies in advance. I am running the below command, I would like it to be more efficient. I'm familiar with setting a global macro of all the sport and then running a loop but this creates a A-Z variables for each sport which I do not want...only want A to correspond to Football, B to Cricket ...

gen A=0
replace A=1 if sport=="Football"

gen B=0
replace B=1 if sport=="Cricket"

gen C=0
replace C=1 if sport=="Rugby"

....

gen Z=0
replace Z=1 if sport=="Snooker"


Any assistance would be much appreciated! Many thanks in advance.

inrange command, selecting a sample between two date

$
0
0
Hi,

I want to select a sample between two dates. Below is an example of the data. ISIN is an indicator of each company which shows daily basis observation. I wonder if it is possible to keep the data if date1 is in range of date2 to date3, using inrange command.

I would be grateful if someone helps me.

input str12 ISIN int(date1 date2 date3) double(AskPrice BidPrice)
"GB0001771426" 20457 20458 20514 11.414373583684 11.2784881838783
"GB0001771426" 20458 20458 20514 11.4680789882926 11.331554238432
"GB0001771426" 20494 20458 20514 12.347536699046 11.9616761772009
"GB0001771426" 20495 20458 20514 11.8908344792647 11.5072591734819
"GB0001771426" 20496 20458 20514 11.8534304197731 11.5957471497781
"GB0001771426" 20517 20458 20514 11.9996199092448 11.6375624119831
"GB0001771426" 20520 20458 20514 12.0197515368487 11.6570866197886
"GB0001771426" 20521 20458 20514 11.9817036942277 11.7492999587793
"NL0009739416" 20457 20490 20543 3.501 3.5
"NL0009739416" 20458 20490 20543 3.477 3.474
"NL0009739416" 20459 20490 20543 3.301 3.298
"NL0009739416" 20488 20490 20543 3.308 3.301
"NL0009739416" 20489 20490 20543 3.368 3.366
"NL0009739416" 20492 20490 20543 3.172 3.17
"NL0009739416" 20524 20490 20543 3.66 3.659
"NL0009739416" 20527 20490 20543 3.679 3.67
"NL0009739416" 20528 20490 20543 3.69 3.68
"PLTAURN00011" 20457 20456 20529 .656249432992289 .651595181694471
"PLTAURN00011" 20458 20456 20529 .629382265591854 .627059821807382
"PLTAURN00011" 20460 20456 20529 .605224920111718 .60292368467403
"PLTAURN00011" 20493 20456 20529 .61283553384721 .610582462031596
"PLTAURN00011" 20494 20456 20529 .617185481185481 .612647352647353
"PLTAURN00011" 20495 20456 20529 .606199790018182 .603937850503188
"PLTAURN00011" 20528 20456 20529 .640904913534224 .638574350212281
"PLTAURN00011" 20529 20456 20529 .678190736478711 .671199079401611
"PLTAURN00011" 20530 20456 20529 .699251984967182 .692236078763498
"PLTAURN00011" 20531 20456 20529 .711241834386819 .706547168813308
"SE0000103814" 20457 20460 20512 21.8191724941725 21.808300958301
"SE0000103814" 20458 20460 20512 21.4538572309267 21.4322085354364
"SE0000103814" 20460 20460 20512 21.3318790542696 21.3102662082166
"SE0000103814" 20461 20460 20512 21.1457953167493 21.1242509506599
"SE0000103814" 20486 20460 20512 19.5434532298412 19.5220357468496
"SE0000103814" 20487 20460 20512 19.2750535618957 19.2536843229136
"SE0000103814" 20488 20460 20512 19.4563917599428 19.4351395570374
"SE0000103814" 20521 20460 20512 21.8216434448452 21.8002286818964
"SE0000103814" 20522 20460 20512 21.7809211570111 21.75938786867
"SE0000103814" 20523 20460 20512 21.5181356518833 21.5074566763737

Graphing - tline by dummy

$
0
0
Hi all,

I want to add shaded bars to my graph using tline. I notice that the stata help file only indicates how to add bars by explicit dates input. I have too many to input manually and wish to add them by a variable which is a dummy corresponding to a date.

Thanks

Code:
twoway (line SPFrespondentsexpect1519 TIME, ylabel(0(20)60)) (line CCI_EA19 TIME, yaxis(2)), ylabel(90(10)105, axis(2)), tsline SPFrespondentsexpect1519 CCI_EA19, tline(dummy_I_wish_to_add)
dataex

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(CCI_EA19 SPFrespondentsexpect1519) int TIME byte recession
101.2044   10 14304 0
100.9661 13.5 14396 0
101.2185 15.6 14488 0
101.5935  7.7 14579 0
101.9143 59.5 14670 0
102.1269 55.1 14762 0
101.8142 30.7 14854 0
101.8627  5.9 14945 0
101.7925 34.5 15035 1
101.2683 16.3 15127 1
100.3191  2.1 15221 1
 100.043  3.4 15312 1

Import JSON file to STATA

$
0
0
Hey!

I've been trying to import a JSON file into Stata but just cannot figure out how to make it work.
At the moment the file is fairly small and I could thus just export it to CSV and import it that way, it then looks the following way in Stata:

Array


However, in the future I will have to import JSON files. I tried both the jsonio package and insheetjson, but cannot make it work. I am also very new to Stata, that's probably because why.

Any hints/ideas?

Best wishes,
Tom

Egen xtile & portfolio sorting

$
0
0
Hi everybody,
this is the dataset that I have:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long permno float date double Return long numboftrades float(MeR RF MEt Exret) double firstbeta float(monthidio dummyendofmonth)
10000  9527 -.014085 .  .0092 .00025     16100 -.014335  2.5143529763022494   .06413759 1
10000  9555        0 .  .0019 .00028     11960  -.00028   .9500515549985641   .03131281 1
10000  9586  .007092 .  .0006  .0003     16330  .006792    .975191955026165   .04484615 1
10000  9616 -.015385 .  -.019 .00024     15172 -.015625   .5980838126958284  .013047387 1
10000  9646  .015306 . -.0013 .00023 11793.878  .015076   .2896041894082146   .03917417 1
10000  9677  .010204 .  .0052 .00025 11734.594  .009954  .38846733224656743  .019607043 1
10000  9708  .096386 . -.0012 .00024 10786.344  .096146   .5768399748387934   .04623256 1
10000  9737        0 .  .0004 .00022 4148.5938  -.00022   .3954379265009734     .096399 1
10000  9769  .015385 .  .0065 .00021  3911.531  .015175   .5341150474454677   .04805862 1
10000  9800        0 .  .0005  .0002  3002.344   -.0002   .6142194073333234   .04291996 1
10000  9828        0 .  .0019 .00021  3182.504  -.00021   .5121709319162139   .03591325 1
10000  9861        0 . -.0037 .00022  1981.566  -.00022   .6205842421862775   .04912256 1
10000  9891        0 . -.0002  .0002 1581.5313   -.0002   .4051215011366245   .03017019 1
10000  9919 -.071429 .  .0037 .00023 1581.5313 -.071659   .3677733912958665  .025082354 1
10000  9951 -.111111 .  .0069 .00021    973.25 -.111321  .01944200731215832   .05910159 1
10000  9981  .153846 .  .0112 .00021  912.4413  .153636  .10909586043253999   .04275477 1
10000 10010        0 . -.0004 .00019  851.5938  -.00019 -.20707865128488628   .01545156 1
10000 10024        . .  .0086 .00022         .        .                   . .0011192125 1
10001  9527  .010309 .  .0092 .00025  6033.125  .010059    .807071924428063  .011264178 1
10001  9555 -.019608 .  .0019 .00028   6156.25 -.019888    .272703714704386   .00933914 1
end
format %td date
My aim is to sort stocks (identified by permno) into 10 portfolios based on the volatility of their residuals (monthidio). I have to repeat the sorting at the end of every month. I call that variable "monthidio" because I have already reduced the complete dataset to a smaller one that contains only the dates at the end of eevry month (as you can seee). Now, to sort stocks into 10 portfolios based on monthidio in every date I run:

egen voladecile= xtile(monthidio), by(date) nq(10)

which gives me the error message "too many values". This is very strange for me because that same command worked on a similar dataset which contained even more values for both dates and monthidio. Do you have an idea to why this happen and how to solve the issue?

I have already read https://www.stata.com/statalist/arch.../msg00365.html and https://www.statalist.org/forums/for...y-values-error but it doesn't seem to exactly fit the my case, because I would like to avoid loops and because I actually have missing values in my variable monthidio and thus what I have differs from these two cases.
Thank you in advance

Svy: means generating different results and macros

$
0
0
I'm doing an analysis of the sample adult files from the Behavioral Risk Factor Surveillance Survey (BRFSS) from the Center on Disease Control and Prevention, pooling 3 years of data from 2001, 2007, and 2015; yielding 1,084,878 observations. I'm using complex survey design to complete my analysis in Stata 15.1 SE.

Code:
//WEIGHT VARIABLE GENERATION
    {
        generate weightvar = _finalwt
        assert _finalwt == . if _llcpwt ~= .
        replace weightvar = _llcpwt if _llcpwt ~= .
        
        }
        
//SVYSET FOR ENTIRE MERGED DATASET
    {
        svyset _psu [pw = weightvar], strata(_ststr)
In my analysis, I'm looking at the means of indicator variables in the dataset for various subpopulations, and using these to do a lincom calculation (for the difference in means between two subpopulations). However, when I input my code for the mean for the indicator variables, I get different estimates depending on whether or not I put all of the indicator variables in the same line of code or not.

Code:
foreach i in 2007 2015 {

        *rural (mscode == 5)
        svy, subpop(if mscode == 5 & year == `i'): mean bmi //body mass index
        svy, subpop(if mscode == 5 & year == `i'): mean drinkingvol //drinking volume (drinks/per month)
        svy, subpop(if mscode == 5 & year == `i'): mean binarysmoker //smoker or non-smoker
        svy, subpop(if mscode == 5 & year == `i'): mean actlim2 //disability (1= has an activity limitation 0= does not)
        svy, subpop(if mscode == 5 & year == `i'): mean disabilitya //disability (1= has an activity limitation or req. special equip. 0= no)
        
        }
        
***********CODE BELOW YIELDS DIFFERENT MEAN ESTIMATES THAN ABOVE

foreach i in 2007 2015 {

        svy, subpop(if mscode == 5 & year == `i'): mean bmi drinkingvol ///
        binarysmoker actlim2 disabilitya

        }
I'm attempting to write a more succinct code, so I prefer to have the variables in one line of code, especially as Stata formats the results in one table of easily readable output (which is important to retain for this analysis), but the differences in the estimates are something I'd like to address, whether it's a coding or interpretation issue (as in I haven't coded what I've intended to).

Along the the lines of writing a more succinct code, I've attempted to use local macros to make the variables bmi drinkingvol binarysmoker actlim2 and disabilitya easily callable in the do-file. However, when I do the code below, Stata gives this error message for lincom: [bmidrinkingvolbinarysmokeractlim2disabilitya] equation not recognized. r(303).

Code:
local variables bmi drinkingvol binarysmoker actlim2 disabilitya

foreach i in 2001 2007 2015 {

        svy, subpop(if year == `i'): mean `variables', over(primeage2)
        
        lincom [`variables']oldage - [`variables']primeage
        
        }
This exact Stata error persists even when I define the local macro like so:

Code:
 local variables "bmi drinkingvol binarysmoker actlim2 disabilitya"
Using a nested loop yields individual tables for each variable mean, so it is also not optimal:

Code:
foreach i in 2001 2007 2015 {

    foreach var in bmi drinkingvol binarysmoker actlim2 disabilitya {
    
        svy, subpop(if year == `i'): mean `var', over(primeage2)
        
        }
        
    }
Here is a dataex, which only has an extract from year 2007:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte mscode float binarysmoker int(drinkingvol bmi) float(actlim2 disabilitya year primeage2) byte gender
3 0   . 19 1 1 2007 1 1
3 0   . 36 0 0 2007 1 0
3 0   . 29 0 0 2007 0 1
5 0  12 21 0 0 2007 1 1
2 0   . 31 0 0 2007 0 1
1 1   . 25 1 1 2007 0 0
1 1   . 21 . . 2007 1 0
5 0   1 21 0 0 2007 1 1
5 1  30 32 0 0 2007 1 0
5 0   .  . . . 2007 1 1
5 0   . 30 0 0 2007 0 1
2 0   . 25 . . 2007 1 0
1 1   . 26 0 0 2007 0 0
1 1   . 25 . . 2007 1 0
1 0   . 24 0 0 2007 0 1
2 0   . 30 . . 2007 0 1
1 0   . 38 . . 2007 1 0
5 0   4 33 0 0 2007 1 1
1 0   9 22 0 0 2007 . 0
5 0   . 25 0 0 2007 . 1
5 0   . 28 0 0 2007 1 1
5 0   . 23 0 0 2007 0 1
2 1 105 20 1 1 2007 1 0
2 1   . 20 0 0 2007 1 1
1 1   . 32 0 0 2007 1 1
1 1   . 25 0 0 2007 0 0
1 0   . 50 1 1 2007 1 1
3 0  42 22 0 0 2007 0 1
2 1   . 25 0 0 2007 1 0
2 0  96 28 . . 2007 1 0
1 0   . 22 . . 2007 . 1
1 0   . 28 0 0 2007 0 1
3 0   . 29 1 1 2007 1 1
3 0   . 23 1 1 2007 1 1
3 0   . 31 0 0 2007 1 1
3 0   . 25 0 0 2007 0 1
3 1   . 28 1 1 2007 1 0
3 1   2 55 0 0 2007 . 1
3 1  60 32 0 0 2007 1 0
3 0   . 29 0 0 2007 1 1
5 0   . 23 0 0 2007 0 1
5 1   . 26 0 0 2007 1 1
5 0   . 19 0 0 2007 1 1
3 0   . 25 0 0 2007 0 1
3 0   . 29 1 1 2007 0 1
3 0   . 44 0 0 2007 1 1
3 1   . 27 1 1 2007 0 1
3 0   9 20 0 0 2007 0 1
3 0   . 27 1 1 2007 1 0
3 0   1 27 0 0 2007 1 1
3 0   8 26 0 0 2007 1 1
3 0   5 23 0 0 2007 1 1
3 0   4 26 0 0 2007 1 1
5 1   . 26 0 0 2007 0 0
5 0   . 32 1 1 2007 0 1
3 1   . 29 1 1 2007 0 1
3 0   . 25 0 0 2007 . 0
3 0   . 27 0 0 2007 1 1
3 0   . 22 0 0 2007 0 1
3 0   . 29 1 1 2007 0 0
3 0   . 35 0 0 2007 0 1
3 0   . 29 0 0 2007 1 1
3 0   . 33 0 0 2007 1 1
5 0   . 23 0 0 2007 1 1
5 0   . 28 1 1 2007 0 1
1 0   . 22 0 0 2007 1 0
2 0   . 23 1 1 2007 1 1
3 0   . 24 0 0 2007 0 1
3 0   .  . 0 0 2007 . 1
3 1  60 24 1 1 2007 0 0
3 1   . 25 0 1 2007 0 0
3 1  36 26 0 0 2007 1 0
3 1  32 36 0 0 2007 1 1
3 0  40 25 0 0 2007 0 0
3 1   . 19 1 1 2007 1 1
3 0   4 27 1 1 2007 0 1
3 0 144 24 0 0 2007 1 0
3 0   . 28 1 1 2007 0 1
3 1  42 26 0 0 2007 1 1
3 0   5 24 0 0 2007 1 1
3 0   . 17 0 0 2007 1 1
3 0   . 39 0 1 2007 0 1
3 0   . 24 1 1 2007 0 1
5 0   8 26 0 1 2007 0 1
5 0   . 25 0 0 2007 0 0
1 0   . 23 1 1 2007 0 1
5 1   6 19 0 0 2007 1 0
5 0   . 28 0 0 2007 0 0
5 0   . 28 0 0 2007 0 1
5 1   . 24 0 0 2007 1 1
5 0   . 26 0 0 2007 0 1
5 0   . 32 1 1 2007 1 1
5 1   . 24 0 0 2007 0 0
5 0   8 28 0 0 2007 1 0
5 1   . 22 0 0 2007 0 1
5 .   . 21 0 0 2007 0 1
5 0   . 26 0 0 2007 0 1
5 1 360 23 0 0 2007 1 0
5 0  12 20 0 0 2007 1 1
5 1   . 32 0 0 2007 0 1
5 1   . 22 1 1 2007 1 1
5 0   . 22 0 0 2007 0 1
5 0  24  . 0 0 2007 0 1
5 1   . 23 0 0 2007 1 1
5 0   8 24 0 0 2007 0 1
5 0   . 23 0 0 2007 0 1
5 0   . 29 0 0 2007 0 1
5 0   6 19 1 1 2007 1 1
5 0   . 27 0 0 2007 0 1
5 0  17 32 0 0 2007 0 0
5 0   . 33 0 0 2007 1 1
5 0  32 25 0 0 2007 1 0
5 0   2 21 0 0 2007 1 1
5 0  40 35 0 0 2007 1 0
5 1  90 19 0 0 2007 0 1
5 1   . 20 1 1 2007 0 0
5 0   3 22 1 1 2007 0 1
5 0   . 24 0 0 2007 0 0
5 0   . 23 0 0 2007 1 1
5 0   4 28 0 0 2007 0 1
5 0   . 25 1 1 2007 0 1
5 0   2 18 0 0 2007 0 1
5 0   6 25 0 0 2007 1 1
5 0   . 27 1 1 2007 0 0
5 0   2 23 0 0 2007 1 1
5 0   .  . 1 1 2007 1 1
5 0   2 21 0 0 2007 1 1
5 0   . 24 1 1 2007 0 1
5 0   0 25 1 1 2007 0 1
5 0   . 23 1 1 2007 0 1
5 1   . 21 0 0 2007 0 1
5 0  21 19 0 0 2007 0 1
5 0   4 19 0 0 2007 1 1
5 0   . 25 0 0 2007 1 1
5 0   . 32 0 0 2007 1 0
5 0   . 36 0 0 2007 1 0
5 1  36 28 1 1 2007 1 1
5 0   . 24 0 0 2007 0 1
5 0   4 27 0 0 2007 1 0
5 0   . 28 1 1 2007 0 1
5 0   . 23 0 0 2007 0 0
5 0   . 24 0 0 2007 0 0
5 0   . 21 0 0 2007 0 1
5 0   5 29 1 1 2007 0 1
5 1   . 32 0 0 2007 1 1
5 0   1 20 0 0 2007 1 1
5 0   0 41 1 1 2007 0 1
5 0   . 28 0 0 2007 1 1
5 0  63 27 1 1 2007 0 0
5 0   1 29 1 1 2007 1 1
5 0   1 23 0 0 2007 1 1
5 0   . 38 . . 2007 1 0
5 0   . 27 1 1 2007 0 0
5 0   . 26 1 1 2007 1 0
5 0   8 31 1 1 2007 1 1
5 0   . 35 0 0 2007 0 0
5 0  50 24 1 1 2007 0 0
5 1   4 23 0 0 2007 1 0
5 1   . 31 0 0 2007 0 1
5 0   1 25 0 0 2007 0 1
5 0   . 26 1 1 2007 0 1
5 0   . 28 1 1 2007 0 1
5 1  15 27 0 0 2007 1 1
5 0   . 28 1 1 2007 0 1
5 0   . 45 1 1 2007 1 1
5 1  36 18 0 0 2007 0 1
5 0   . 30 0 0 2007 0 1
5 0   1 19 0 0 2007 1 1
5 0   . 27 0 0 2007 0 1
5 0   . 17 0 0 2007 1 1
5 0   . 27 0 0 2007 0 1
5 0   . 31 1 1 2007 0 1
5 0   .  . 0 0 2007 1 1
5 1   4 21 0 0 2007 1 1
5 0   . 22 0 0 2007 0 1
5 0  16 27 0 0 2007 0 1
5 0   8 39 1 1 2007 1 0
5 0   2 29 0 0 2007 0 0
5 0   . 20 0 0 2007 1 1
5 0   2 21 0 0 2007 0 1
5 0   . 16 1 1 2007 0 1
5 0   8 17 0 0 2007 1 1
5 0   . 25 1 1 2007 0 1
5 0   . 16 0 0 2007 0 1
5 0   . 19 0 0 2007 0 1
5 1  36 23 0 0 2007 1 0
5 0   1 22 . . 2007 1 1
5 0   . 39 0 0 2007 1 1
5 1 360 23 0 0 2007 1 0
5 1  12 29 0 0 2007 . 0
5 0   6 32 0 0 2007 0 1
5 0  21 22 0 0 2007 1 1
5 0   . 25 1 1 2007 0 1
5 1   . 19 1 1 2007 1 0
5 0   . 27 0 0 2007 0 0
5 1   . 34 0 0 2007 0 1
5 0   . 20 0 0 2007 0 1
5 0  24 27 0 0 2007 1 1
5 1   . 28 1 1 2007 0 0
5 0   . 31 0 0 2007 0 1
5 0   . 23 0 0 2007 0 1
5 0 126 27 1 1 2007 0 0
5 0   . 24 1 1 2007 0 0
5 0   1 26 0 0 2007 0 1
5 0   . 22 0 0 2007 0 1
5 0   . 25 0 0 2007 1 1
5 0  48 19 0 0 2007 0 1
5 1   . 22 0 0 2007 0 1
5 0  16 27 0 0 2007 1 1
5 1   3  . 0 0 2007 . 1
5 0  30 25 1 1 2007 0 1
5 0  32 25 0 0 2007 1 1
5 0   . 35 0 0 2007 0 1
5 0   4 23 1 1 2007 0 1
5 0  24 26 0 0 2007 0 1
5 0   8 23 0 0 2007 1 1
5 0   . 29 1 1 2007 0 1
5 0   . 30 0 0 2007 0 1
5 0   . 31 0 0 2007 0 1
5 0   .  . 1 1 2007 1 1
5 1   . 27 0 0 2007 1 0
5 0   . 24 0 0 2007 0 1
5 1   8 25 1 1 2007 0 0
5 1  42 21 0 0 2007 0 0
5 0   . 28 0 0 2007 1 1
5 0  60 25 0 0 2007 0 0
5 0   9 33 0 0 2007 0 0
5 1   . 23 0 0 2007 1 1
5 0   . 25 0 0 2007 0 1
5 1   . 23 1 1 2007 1 1
5 0   . 22 0 0 2007 1 1
5 0  15 35 . 9 2007 1 0
5 0   .  . 1 1 2007 0 1
5 0   . 28 1 1 2007 0 1
5 0   . 28 0 0 2007 0 0
5 0   . 31 1 1 2007 0 1
5 0   . 24 0 0 2007 0 1
5 0   . 22 0 0 2007 0 0
5 0   . 26 0 0 2007 1 1
5 1  20 33 0 0 2007 1 1
5 1  50 23 0 0 2007 1 1
5 1   . 22 0 0 2007 0 1
5 0  90 26 0 0 2007 1 1
5 0   . 31 0 0 2007 1 0
5 0  18 27 0 0 2007 1 1
5 0   . 26 0 0 2007 0 0
5 0   . 24 0 0 2007 1 1
5 0   . 31 1 1 2007 0 0
5 1  72 25 0 0 2007 1 0
5 0   . 19 0 0 2007 . 1
5 0   . 31 0 0 2007 0 0
5 0   . 24 0 0 2007 0 1
5 1   3 22 0 0 2007 0 1
5 1  36 19 0 0 2007 1 1
5 0 420  . 0 0 2007 0 1
5 0   . 31 0 0 2007 0 1
5 1   . 33 0 0 2007 1 0
5 0   . 29 1 1 2007 0 1
5 0   3 44 0 0 2007 1 0
5 0   6 21 0 0 2007 1 1
5 0   8 25 0 0 2007 1 1
5 0   . 24 0 0 2007 1 1
5 0   . 26 0 0 2007 1 1
5 0   1 25 0 1 2007 1 1
5 0   3 22 1 1 2007 0 1
5 0   0 18 0 1 2007 0 1
5 0  42 24 0 0 2007 0 1
5 0   . 30 0 0 2007 1 1
5 0  42 28 0 0 2007 1 0
5 0   . 39 1 1 2007 1 1
5 0   6 25 0 0 2007 1 1
5 0   . 27 0 0 2007 0 0
5 0   . 20 0 0 2007 . 0
5 1 105 19 1 1 2007 0 1
5 0   . 27 0 0 2007 0 1
5 0   . 23 0 0 2007 0 1
5 0  12 23 0 0 2007 1 1
5 0   . 29 0 0 2007 0 1
5 0   . 31 0 0 2007 1 0
5 0   4 27 0 0 2007 0 1
5 0  10 18 0 0 2007 0 1
5 0   8 27 0 0 2007 0 1
5 0  12 25 1 1 2007 0 0
5 0   1 20 0 0 2007 1 1
5 0  60 22 0 0 2007 0 0
5 0  21 26 0 0 2007 1 0
5 1   . 18 0 0 2007 1 1
5 1  18 30 0 0 2007 1 1
5 0   . 35 0 0 2007 1 1
5 0  30  . 0 0 2007 0 1
5 0  40 30 0 0 2007 0 0
5 0   4 25 0 0 2007 0 1
5 0   . 22 0 0 2007 0 1
5 0   . 25 1 1 2007 0 1
5 0  10 24 1 1 2007 1 0
5 0  20 23 0 0 2007 1 0
5 1   4 24 0 0 2007 1 1
5 0   2 23 0 0 2007 0 0
5 0   8 24 0 0 2007 1 1
5 0   . 23 0 0 2007 1 1
5 0   . 31 1 1 2007 0 1
5 1   . 29 1 1 2007 1 1
5 0   9  . 0 0 2007 0 1
5 1   1 29 1 1 2007 1 1
5 0   . 20 0 0 2007 0 1
5 1  40 22 1 1 2007 0 0
5 0   . 37 1 1 2007 0 0
5 0   . 27 0 0 2007 0 1
5 0  60 25 0 0 2007 0 1
5 1   . 40 0 0 2007 1 1
5 0   . 23 0 0 2007 0 1
5 1   .  . 0 0 2007 0 1
5 0   . 35 1 1 2007 0 1
5 1  60 21 0 0 2007 0 1
5 1  60 22 0 0 2007 0 0
5 0   . 26 0 0 2007 0 0
5 0   . 26 0 0 2007 0 1
5 1  40 23 0 0 2007 0 1
5 1   . 35 1 1 2007 1 1
5 0  12 32 0 0 2007 0 0
5 0   6 33 0 0 2007 0 0
5 0   . 21 0 0 2007 0 1
5 0  60 24 0 0 2007 0 1
5 0   . 31 1 1 2007 0 1
5 0   . 23 1 1 2007 0 1
5 0   2 36 1 1 2007 0 1
5 0  16 26 0 0 2007 0 0
5 1   . 19 0 0 2007 . 1
5 0   . 37 0 0 2007 0 1
5 1   4 20 0 0 2007 0 1
5 0   . 22 0 0 2007 1 1
5 0   . 43 1 1 2007 0 1
5 0   . 19 0 0 2007 1 1
5 1  16 24 0 0 2007 0 0
5 0   . 35 1 1 2007 1 0
5 0   . 29 0 0 2007 . 1
5 0   . 31 1 1 2007 . 1
5 0   . 20 1 1 2007 0 1
5 0   . 31 1 1 2007 1 1
5 0   . 35 0 0 2007 1 1
5 0  60 30 0 0 2007 0 0
5 0   . 30 1 1 2007 0 1
5 0   . 24 0 0 2007 0 1
5 0   4 44 0 0 2007 1 0
5 1   . 27 0 0 2007 0 1
5 0   4 25 0 . 2007 0 0
5 0   . 30 . . 2007 0 1
5 0  30 20 0 0 2007 0 1
5 1   . 26 0 0 2007 1 1
5 0   . 30 0 0 2007 1 1
5 0   . 31 0 0 2007 1 1
5 0   . 29 0 0 2007 1 0
5 1   4 30 0 0 2007 0 1
5 0   . 27 0 0 2007 0 1
5 0   . 25 0 0 2007 0 1
5 1   . 25 0 0 2007 0 0
5 0   . 25 1 1 2007 0 1
5 0   . 25 0 1 2007 0 1
5 1   . 25 1 1 2007 0 1
5 0   . 30 0 0 2007 1 1
5 0   . 35 1 1 2007 0 0
5 0   . 41 1 1 2007 1 0
5 0   . 24 0 0 2007 0 1
5 1   . 24 1 1 2007 1 0
5 0   . 29 0 0 2007 1 0
5 0   . 31 0 0 2007 0 1
5 0   . 32 . . 2007 0 1
5 0   . 30 0 0 2007 0 1
5 0   1 24 0 0 2007 1 1
5 1   8 19 1 1 2007 0 0
5 0   . 24 1 1 2007 0 1
5 0   . 34 0 0 2007 . 1
5 0   . 32 0 0 2007 0 1
5 0   . 46 1 1 2007 0 1
5 1   .  . 0 0 2007 1 1
5 0   . 32 0 0 2007 1 1
5 0   . 31 0 0 2007 1 0
5 1   2 25 0 0 2007 1 0
5 0   . 28 1 1 2007 0 1
5 0   . 27 0 0 2007 0 0
5 0   . 25 0 . 2007 1 1
2 0   . 26 1 1 2007 1 1
5 0   . 32 1 1 2007 0 1
5 0   4 31 0 0 2007 1 1
5 0   . 19 0 0 2007 1 1
5 0   8 33 0 0 2007 1 1
5 0   . 27 0 0 2007 0 1
5 0   4 26 1 1 2007 0 1
5 0   . 33 0 0 2007 0 1
5 0   1 32 1 1 2007 0 0
5 0   .  . 0 0 2007 0 1
5 0   . 39 0 0 2007 0 0
3 0   . 26 1 1 2007 1 1
3 0   . 15 1 1 2007 1 1
3 0   .  . 0 0 2007 1 0
3 0   . 30 0 0 2007 1 0
3 1  18 31 1 1 2007 0 0
2 0   . 27 0 0 2007 1 1
2 0   . 32 1 1 2007 1 0
3 0   . 33 0 0 2007 0 1
3 1  45 21 0 0 2007 1 0
3 0   . 20 0 0 2007 1 1
3 0   . 27 0 1 2007 0 1
3 0   . 34 1 1 2007 0 1
3 0   . 26 0 0 2007 0 0
3 0   . 25 0 0 2007 0 0
3 1   . 24 0 0 2007 0 1
3 0   . 22 0 0 2007 1 1
3 1   . 38 0 0 2007 0 0
3 1   . 22 1 1 2007 0 1
3 1   . 19 1 1 2007 1 0
1 1   4 26 0 0 2007 0 0
3 0   . 33 1 1 2007 1 1
3 0   .  . 0 0 2007 0 1
3 0   . 21 0 0 2007 1 1
3 0 180 23 1 1 2007 0 0
3 0   8 25 1 1 2007 1 0
3 0   . 37 0 0 2007 0 1
3 0   3 29 1 1 2007 0 0
3 0   8 30 0 0 2007 0 1
3 0   . 27 0 0 2007 1 1
3 1   . 62 1 1 2007 1 1
3 0   . 41 0 0 2007 0 1
3 0   . 21 0 0 2007 0 1
3 0   4 32 0 0 2007 0 0
3 0   . 40 1 1 2007 0 1
3 0   .  . 0 0 2007 1 1
3 1   . 28 1 1 2007 1 0
3 1   . 24 0 0 2007 1 1
3 0   . 24 1 1 2007 0 1
3 0   . 24 0 0 2007 0 0
3 0   . 25 0 0 2007 1 1
3 0   6 20 0 0 2007 1 1
2 0   . 23 0 0 2007 . 1
3 0   . 22 0 0 2007 1 0
3 0   . 44 1 1 2007 0 1
3 0   . 39 1 1 2007 1 1
3 1   . 22 0 0 2007 0 1
3 1 120 30 0 0 2007 1 1
3 0   . 21 1 1 2007 0 1
3 0   . 29 0 0 2007 1 1
3 0   . 22 0 0 2007 0 0
3 0   1 26 0 0 2007 1 1
3 0   . 34 0 0 2007 1 1
3 0   2 27 0 0 2007 0 1
3 0   5 17 0 0 2007 1 1
3 1   . 27 0 0 2007 1 1
3 0   . 20 0 0 2007 1 1
3 1   . 29 0 0 2007 0 0
3 0   .  . 0 0 2007 1 1
3 0  32 28 0 0 2007 1 1
3 0   4 22 0 0 2007 0 1
3 0   . 23 0 0 2007 1 1
3 1   . 26 1 1 2007 1 1
3 0   0 21 0 0 2007 1 1
3 0   . 23 0 0 2007 1 1
3 0   . 19 1 1 2007 0 1
3 0   . 25 0 0 2007 0 0
3 0   . 39 1 1 2007 1 1
3 1 160 29 0 0 2007 1 0
1 0   . 36 0 0 2007 1 0
3 0   . 23 1 1 2007 0 1
3 0   . 23 0 0 2007 0 1
1 0   9 25 0 0 2007 1 0
1 0   . 26 0 0 2007 0 0
2 0   . 33 1 1 2007 0 1
2 0   . 24 0 0 2007 0 0
2 0   4 27 0 0 2007 0 0
2 0   . 43 0 0 2007 0 1
2 0   . 18 1 1 2007 1 1
2 0   4 36 1 1 2007 0 0
2 0   . 32 1 1 2007 0 1
3 0   . 35 1 1 2007 1 0
3 1   .  . 1 1 2007 1 1
3 1   . 37 0 0 2007 0 1
3 0   2 36 0 0 2007 1 0
3 0   . 22 0 0 2007 1 1
3 1   . 34 0 0 2007 1 1
3 1   . 38 1 1 2007 0 1
2 1   . 27 0 0 2007 1 0
2 0   . 28 0 0 2007 0 1
2 1   . 23 0 0 2007 1 1
2 1   4 33 1 1 2007 0 0
3 1  18 18 0 0 2007 . 1
3 0   . 23 0 0 2007 1 0
3 1   . 40 1 1 2007 1 1
3 0   . 30 1 1 2007 0 1
3 0  10 27 0 0 2007 1 0
3 0   4 23 0 0 2007 1 0
3 0   . 28 0 0 2007 1 1
3 1  50 28 1 1 2007 0 0
2 1   . 24 1 1 2007 1 1
2 0   . 38 0 0 2007 1 0
2 0   . 30 1 1 2007 0 1
2 0   4 22 1 1 2007 0 1
3 0   . 31 0 0 2007 1 0
3 0   1 26 0 0 2007 . 0
3 0   . 22 0 0 2007 0 1
3 0  24 38 0 0 2007 1 0
3 1  25 22 1 1 2007 0 0
end
label values mscode metroarea
label def metroarea 1 "center city of MSA", modify
label def metroarea 2 "outside center city but inside county w/ center city", modify
label def metroarea 3 "inside suburban county of MSA", modify
label def metroarea 5 "not in MSA", modify
label values actlim2 yesnobinary
label values disabilitya yesnobinary
label def yesnobinary 0 "no", modify
label def yesnobinary 1 "yes", modify
label values primeage2 page2
label def page2 0 "oldage", modify
label def page2 1 "primeage", modify
label values gender gender
label def gender 0 "male", modify
label def gender 1 "female", modify
Any assistance in helping me simplify my code and figure out why I'm generating different point estimates would be helpful! Thank you in advance!

Feature request: Using pdfs of graphics with -putdocx- and -putpdf-

$
0
0
Hi all,

The new -putdocx- and -putpdf- are really fantastic additions to Stata's capabilities, and fulfill a very important role. However the default .png format of graphics is really not optimal- graphics are grainy (on Mac Sierra and Stata 15.1 at least), so I end up manually adding hi-res pdfs for some graphics.

If this isn't an inherent limitation of the platforms, can pdfs be added as an option?

cheers-
Andrew

panel data, sig change in proportions over time testing; nonparametric tests vs unit root/stationary ARIMA

$
0
0
Long time reader, first time poster-
I received a journal review and am trying to address their comments but running into some issues with which types of tests are the most appropriate, thus, this is more best practices in statistics rather than an issue with STATA but I have exhausted my social networks, read through too many postings, and books on the subject(s) and am in need of some advice/motivation .

Quick summary: I have 42 years of data where I want to test whether a trend is significantly increasing/decreasing. I have created separate groups from the categorical data (e.g. White Male) and examined the proportion over time to control for changes in rates/counts. I am examining crime over time within the same race (proportion White on White vs not; Black on Black vs not) and need to determine the best type of testing for a bunch of panel models (for each race, gender, and crime type so about 24 different groups' proportions with 42 years/data points each). I am not sure if simple nonparametric tests are best (ologit, etc.) or if I need to assess using time series analyses (unit root then ARIMA).



More detail:
Data are structured for proportion over time for each group (e.g. White male crime perpetrated by another White male). Crime is largely intraracial so White males, for example, experience a crime committed by another White male about 75% of the time. The range of the data is a low of 66 to a high of 80% (or proportion of .66 and .80 respectively). The trend bounces around over the 42 years of data but the question is, has it changed significantly over time? How best do I address this?

Some of the trends seem linear (increasing, decreasing, or flat), others curvilinear (~) when potted over time (visual inspection). Including linear trend lines show most groups have little change over time but some slopes are greater than others. Initial dfuller tests suggest some of the group trends are stationary and some are not so I would use the first difference term 0 1 0.

It does not seem the reviewers want a simple nonparametric test but time series analysis (specifically stated I should assess if the trend is "stable" or "stationary" in review). While I outlined that time series analyses were not appropriate for so few data points, there is a precedent for begrudgingly performing these tests with limited data points (a similar published article's footnote states they did a unit root test and a "positive time trends were found to be significant with and without the autoregressive error process included". To me, this means they did an Augmented Dickey Fuller test and a ARIMA model with and without the p term (ARIMA 0 0 0 and 1 0 0)).

This is the first time I have worked with time series data and have few colleagues who have either so I am running out of options. I am starting to wonder what is the point of any an all of these tests and am feeling discouraged about the time spent and lack of progress. I have googled the crap out of the issue, searched this forum (e.g. here), have a very long syntax/do file with all types of test but no idea which is best/better (e.g. nonparametric: ologit, nptrend, spearman, ktau, jonter; stationarity: dfuller with Varsoc, dfgls, lomackinlay, kpss, pperron ; ARIMA: twoway, ac, pac to assess model fit), read and replicated Box-Steffensmiere book (link), and just need someone to help if possible. I am not even sure why I need to do a full ARIMA model or if I just need to assess stationarity or just do a more simple nonparametric test.

Any and all advice or references would be greatly appreciated because my brain hurts and I am feeling discouraged.

Problems with creating newvar

$
0
0
Hello All,

I have a dyadic panel data-set that has the following variables: country a, country b, year and alliances between a and b (dichotomous). I want to calculate, by year, for each country a, the sum of alliances that each country b has with all countries that do not have an alliance with that given country a. I would like to loop that through all dyads.

is this possible? Any tips on how I could do that?

Many thanks!

Cross-Classified Model

$
0
0
Hello all,

I am trying to estimate a cross-classified model within the mixed command. I believe I have at least close the right code and I am hoping that someone could help me get more clarity. To sum things up, I have observations at the firm-state level (many observations for each firm and many observations within states) across 4 periods. I believe that observations are cross-classified within firms and states, so I would like to model the intercept for firms and states as random. I also have interactions that I am trying to estimate as well. So far my code is:

mixed y c.x1##c.x2 x3 x4 x5 x6 i.year || _all: r.firm||_all: r.state, var ml


However, when I estimate this model it provides results for 1 cluster, is this correct? I was expecting more detailed information for how many firms (50), how many states (50), and how many firm-state combinations (2500).

Two more questions:
If x2 is a 'state level' predictor, do I need to put it or the interaction in the 'random' part of the model?
Also, to get the ICCs for firm or state, I understand that I can divide their respective variance estimates by the residual...but how do I get the ICC value for the firm-state dyad?


Thank you in advance.








Simple ICC Question

$
0
0
Hi all: I am running an ICC on my dataset of 2 raters and 149 targets, where the 2 raters are rating the netscore of products based on online reviews (using an established training method and algorithm). I have attached my dataset longitudinally arranged, along with the output from Stata when I tried to run the command. I need to switch the raters and targets: Stata is incorrectly interpreting this as 149 raters and 2 targets. I've tried switching the columns and switching the order of the variables in the command. Does anyone have any advice? It seems like a simple problem with a certain answer! Thank you all for your contemplation!
netscore is the rating given by the 2 raters
productnumber is the common identifier for the 149 products
var is whether the netscore was given by rater 1 or rater 2
netscore productnumber var
8 1 1
0 2 1
0 3 1
2 4 1
5 5 1
3 6 1
3 7 1
-1 8 1
2 9 1
0 10 1
3 11 1
0 12 1
-6 13 1
7 14 1
10 15 1
1 16 1
4 17 1
2 18 1
-3 19 1
2 20 1
2 21 1
5 22 1
-3 23 1
1 24 1
0 25 1
3 26 1
1 27 1
1 28 1
0 29 1
0 30 1
1 31 1
4 32 1
1 33 1
0 34 1
1 35 1
0 36 1
-2 37 1
5 38 1
0 39 1
0 40 1
5 41 1
0 42 1
0 43 1
-1 44 1
-3 45 1
1 46 1
6 47 1
1 48 1
5 49 1
2 50 1
0 51 1
5 52 1
4 53 1
-1 54 1
0 55 1
-2 56 1
3 57 1
1 58 1
-1 59 1
-3 60 1
8 61 1
5 62 1
1 63 1
2 64 1
2 65 1
-1 66 1
-2 67 1
0 68 1
1 69 1
4 70 1
2 71 1
5 72 1
1 73 1
-2 74 1
3 75 1
0 76 1
1 77 1
-3 78 1
3 79 1
-2 80 1
0 81 1
1 82 1
3 83 1
0 84 1
2 85 1
-1 86 1
0 87 1
1 88 1
-2 89 1
0 90 1
1 91 1
-1 92 1
1 93 1
-2 94 1
0 95 1
1 96 1
3 97 1
-2 98 1
-2 99 1
7 100 1
0 101 1
2 102 1
2 103 1
3 104 1
1 105 1
0 106 1
0 107 1
-1 108 1
0 109 1
-1 110 1
-3 111 1
0 112 1
0 113 1
2 114 1
-1 115 1
0 116 1
-1 117 1
1 118 1
2 119 1
2 120 1
2 121 1
2 122 1
-1 123 1
4 124 1
3 125 1
3 126 1
8 17 1
2 128 1
3 129 1
0 130 1
-2 131 1
-1 132 1
2 133 1
5 134 1
1 135 1
1 136 1
1 137 1
-2 138 1
1 139 1
2 140 1
2 141 1
3 142 1
-1 143 1
0 144 1
-1 145 1
2 146 1
1 147 1
0 148 1
-5 149 1
8 1 2
0 2 2
0 3 2
2 4 2
6 5 2
3 6 2
3 7 2
-1 8 2
0 9 2
0 10 2
3 11 2
0 12 2
-6 13 2
7 14 2
11 15 2
1 16 2
4 17 2
2 18 2
-4 19 2
2 20 2
2 21 2
5 22 2
-3 23 2
1 24 2
0 25 2
3 26 2
1 27 2
1 28 2
0 29 2
0 30 2
1 31 2
4 32 2
1 33 2
0 34 2
1 35 2
0 36 2
-2 37 2
4 38 2
0 39 2
0 40 2
5 41 2
0 42 2
0 43 2
-1 44 2
-3 45 2
2 46 2
6 47 2
1 48 2
5 49 2
2 50 2
0 51 2
5 52 2
4 53 2
-1 54 2
0 55 2
-2 56 2
3 57 2
1 58 2
-1 59 2
-3 60 2
8 61 2
5 62 2
1 63 2
2 64 2
2 65 2
-1 66 2
-2 67 2
0 68 2
1 69 2
4 70 2
2 71 2
5 72 2
1 73 2
-2 74 2
3 75 2
0 76 2
1 77 2
-3 78 2
3 79 2
-2 80 2
0 81 2
1 82 2
3 83 2
-1 84 2
2 85 2
-1 86 2
0 87 2
1 88 2
-2 89 2
0 90 2
1 91 2
-1 92 2
1 93 2
-2 94 2
0 95 2
1 96 2
3 97 2
-2 98 2
-2 99 2
7 100 2
0 101 2
2 102 2
2 103 2
3 104 2
1 105 2
0 106 2
0 107 2
-1 108 2
0 109 2
0 110 2
-3 111 2
0 112 2
0 113 2
2 114 2
-1 115 2
0 116 2
-1 117 2
1 118 2
2 119 2
2 120 2
2 121 2
2 122 2
-1 123 2
4 124 2
3 125 2
3 126 2
8 17 2
2 128 2
3 129 2
0 130 2
-2 131 2
-1 132 2
2 133 2
5 134 2
1 135 2
0 136 2
2 137 2
-2 138 2
1 139 2
2 140 2
2 141 2
2 142 2
-1 143 2
-1 144 2
-1 145 2
2 146 2
1 147 2
0 148 2
-5 149 2

Replace Not Working for Alpha-Numeric Observation Even After Destring

$
0
0
Hi StataList,

I am working on calculating exclusive breastfeeding rates for Nigeria, and have ran into some difficulties with the "replace" option. I receive an error message that says "type mismatch r(109)." When I tab my m4 variable because here are both alpha-numeric and numeric variables. I destrung the m4 variable to see if it would work and I still received an error message.

Then, I tried to change my "exclusive breastfeeding" observation to a number, but this was not allowed.

Below, I have provided the code I am using. The last line of code is where I am receiving the error message.

gen water=0
gen liquids=0
gen milk=0
gen solids=0
gen breast=0

* Water
replace water=1 if (v470a>=1 & v470a<=7)

* Other non-milk liquids
* check for country specific liquids
foreach xvar of varlist v470b v470c v470d v470i v470j v470k v470l* {
replace liquids=1 if `xvar'>=1 & `xvar'<=7
}

* Powdered or tinned milk, formula, fresh milk
foreach xvar of varlist v470e v470f v470g v470h {
replace milk=1 if `xvar'>=1 & `xvar'<=7
}

* Solid food
* check for country specific foods
foreach xvar of varlist v470m v470n v470o v470p v470q v470r v470s v470t v470u v470v v470w v470x v470y v470z v470xx v470xy v470xz* {
replace solids=1 if `xvar'>=1 & `xvar'<=7
}

* Still breastfeeding
tab m4
describe m4
replace breast=1 if m4=="still breastfeeding"

================================================== =================================================
The Stata Output after from *Still breastfeeding and below:


. * Still breastfeeding
. tab m4

duration of |
breastfeeding | Freq. Percent Cum.
---------------------+-----------------------------------
1 | 1 0.05 0.05
2 | 2 0.09 0.14
5 | 2 0.09 0.23
6 | 3 0.14 0.37
7 | 2 0.09 0.46
8 | 8 0.37 0.82
9 | 6 0.27 1.10
10 | 10 0.46 1.55
11 | 8 0.37 1.92
12 | 38 1.73 3.65
13 | 11 0.50 4.15
14 | 27 1.23 5.39
15 | 26 1.19 6.57
16 | 24 1.10 7.67
17 | 25 1.14 8.81
18 | 59 2.69 11.50
19 | 24 1.10 12.60
20 | 18 0.82 13.42
21 | 8 0.37 13.78
22 | 11 0.50 14.29
24 | 6 0.27 14.56
48 | 1 0.05 14.61
never breastfed | 6 0.27 14.88
still breastfeeding | 1,862 84.98 99.86
dk | 3 0.14 100.00
---------------------+-----------------------------------
Total | 2,191 100.00

. describe m4

storage display value
variable name type format label variable label
----------------------------------------------------------------------------------------------------------------------------------------
m4 byte %8.0g m4 duration of breastfeeding

. replace breast=1 if m4=="still breastfeeding"
type mismatch
r(109);

================================================== =================================================

Any suggestions as to why I cannot replace "still breastfeeding" within m4 as a numeric observation, or why my last replace command will not work would be greatly appreciated.

Beginner Stata User Help - consecutive years of donations variable

$
0
0
Hello,

I am trying to generate new variable, let's call it "current_year_loyalty", which records the number of years in a row a donor has made at least one donation. So for the first year in which they donate, current_year_loyalty would be assigned a value of 0. If they also made a donation the following year, current_year_loyalty would be assigned a value of 1. However, if they stop donating the year after that, and then resume in some later year, the current_year_loyalty would be reset back to 0.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long PPidm double FiscalYearGifts float FISCALYEAR
70004546  88 2007
70004546  45 2010
70004546  45 2011
70004546  45 2012
70004546  45 2013
70004546  45 2014
70004546  75 2017
70004546 175 2018
end
format %ty FISCALYEAR
So in the above example, the user identified by Pidm 70004546 would be assigned the following values:
  • 2007: current_year_loyalty = 0
  • 2010: current_year_loyalty = 0
  • 2011: current_year_loyalty = 1
  • 2012: current_year_loyalty = 2
  • 2013: current_year_loyalty = 3
  • 2014: current_year_loyalty = 4
  • 2017: current_year_loyalty = 0 (reset to 0 here because they made no donations in 2015)
  • 2018: current_year_loyalty = 1
Thank you

Interpretation of interaction terms in models with standardised variables

$
0
0
Hello All,

I am running a multiple regression that looks like this:
Code:
reg car c.indexA##c.indexB  controls
I am interested mainly on the impact of indexB on car (cumulative abnormal returns) and whether the interaction with indexA makes any difference.

I have standardised all my variables as I read it was common practice in M&A research, and that it helps to get more steady results. However, since I am using interaction terms, I am struggling with the interpretation of each term and the effect they have on the also standardised dependent variable. Can someone help me understand the interpretation of standardised interactions on a standardised dependent variable? And does it add any value to my regression to have standardised variables? i.e. am I more likely to get significant results if I standardise my variables?

An excerpt of my data is provided below:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(car indexA indexB zlvalue zpremium)
-.04265274 -.031235166  .09289306 -1.3743476          .
 .05864397    .8657029  -2.986656   .5466814  -.7211906
  .9124655   .04350964   .3392571 -1.4847015 -1.1008819
 -.4733196   .04350964   .3392571 -1.1832519  -.6927265
  .3147303  -1.6008778 -2.0012002 -.07337776   .4242368
  2.403794   -2.946285   .3392571 -1.0905318          .
-1.1410446   1.0899378   .3392571 -1.2296888 -1.0978322
  .5169623   -.0760824  .08673403 -.05377782  -.7135663
  .4899784    .4022853   .8135074 -.27336898          .
 .10567318  -.17325082  -3.017451 -1.1589853          .
-.43367136  -2.2735817   .4624389  -.5439201 -.28711247
 -.7486368   1.2394273   .9551666  -.9914123  -.9766055
-.18198207   .26774403 -.03028878 -1.0312611  -.9560198
  .3653311    1.015193  -.2766528  1.1410673 -1.0863957
  .1568007   .41723365   .3392571  -.3966787 -1.0658101
-.13443676   .26774403 -1.1389264  -.9418356 -1.0365835
 .09781788   .19299924  -.2150619 -1.2979195  -.8502958
-.14205901    1.314172  -.2150619  .25849387 -1.0607272
-.02275374   .19299924   .6472117 -1.1530515 -1.0630145
-1.2362403    .7909581  -.2766528   .3465459  -.8231024
end
Thanks in advance for your help.

Best wishes,
Henry

Balanced data for threshold* model

$
0
0
Dear all

im working on my phd thesis and i use the threshold model, but i can not run the command because of unbalanced data
i tried many propositions but is the same


one of the problem i got, using stata14.2

Array


please i need help as soon as possible

best regardes
sedki

using xtmixed for cluster RCT.....include cluster variable as both fixed and random?

$
0
0
I am analyzing an RCT where four sites were randomly assigned to four condition (one site per condition) with four timepoints (baseline, 6 month, 12 month, and 18 month). I am working on specifying my MLM using xtmixed and am a little torn about the best way to handle the treatment condition. It is clustered, but only 1 site per condition.

I am including all covariates that predict baseline differences between the four groups as fixed effects. The question is do I include treatment condition as a fixed effects only? Do I include it as a random effect as well? Or use the cluster option? See below for the various options I am considering.

Option 1 (group as fixed effect only)
xtmixed dv i.group time covariate1 covariate2 covariate3 covariate4 covariate5 covariate6 || Subject:, var reml

Option 2 (group as level 2 random effect, along with Subject)
xtmixed dv i.group time covariate1 covariate2 covariate3 covariate4 covariate5 covariate6 || Subject: group,

Option 3 (group as a random effect on it's own level)
xtmixed dv i.group time covariate1 covariate2 covariate3 covariate4 covariate5 covariate6 || group: || Subject: ,

Option 4 (use cluster command - this option yields estimates most dramatically different from the others)
xtmixed dv i.group time covariate1 covariate2 covariate3 covariate4 covariate5 covariate6 || group: || Subject: , vce(cluster group)

Any advice would be greatly appreciated.



Fractional logit or heteroskedastic fractional probit?

$
0
0

Dear all,

I have estimated a fractional logit model and a heteroskedastic fractional probit model with the same specification using Stata 15. In fractional logit model, the marginal effects of variables of interest are all significant (at 1%, 5% and 10% levels) and sizes and signs are all as expected according to the hypothesis. On the other hand, in heteroskedastic fractional probit model, some of the marginal effects are not statistically significant, although their sizes and signs are as expected. My understanding is, in heteroskedastic model, standard errors are modified so as to give statistically insignificant effects.

My dilemma is which results I should report? According to the hypothesis, it would be easier for me to claim the results of fractional logit model. However, the lnsigma tests associated with heteroskedastic model indicate that data is suitable for heteroskedastic model. Should I report both the models? In that case, how do I align statistically insignificant results of heteroskedastic models with the hypothesis? Or should I only report fractional logit model and drop the results of heteroskedastic model? Please give me some advice. Thanks.

Ujjwal Kumar Das
Viewing all 72776 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>