Quantcast
Channel: Statalist
Viewing all 72764 articles
Browse latest View live

VCE is not positive definitive

$
0
0
Hello everyone.

I am running a multiple imputation using data from a longitudinal study with two points of follow up, 6 and 12 months. Some variables are missing at 6 and other ones are missing at 12 months. My database now is in wide form (initially I ran the imputation with my database as long but my advisor and some articles recommend for longitudinal data, running the imputation with the dataset in wide format). After reshaping my data to wide form, declare "mi set mlong", then seeing my missing data patterns "mi misstable patterns", then "mi register imputed", I could run my imputation.

This is my code:
mi impute chained (mlogit, augment) raceg1 injtype1 dischdispo1 (ologit, augment) edlev1 comorb1 (logit, augment) pain11 pain12 work1 work2 func1 func2 anx1 anx2 dep1 dep2 ptsd1 ptsd2 (regress) iss1 AGG_PHYS1 AGG_PHYS2 AGG_MENT1 AGG_MENT2 = age1 sex1 sevheadinj1 icu1 vent1 loscat1 threegroups1, add (30) rseed (5)

After running the code Stata produced this:
convergence not achieved
convergence not achieved
logit failed to converge on observed data
error occurred during imputation of raceg1 injtype1 dischdispo1 edlev1 comorb1 pain11 pain12 work1 work2 func1 func2 anx1 anx2 dep1 dep2 ptsd1 ptsd2 iss1 AGG_PHYS1 AGG_PHYS2 AGG_MENT1 AGG_MENT2 on m=1
r (430);


The "augment" before the commands was because of previous perfect predictor (s) detected.

Then restarted the imputation by specifying "noisily" option,
mi impute chained (mlogit, augment) raceg1 injtype1 dischdispo1 (ologit, augment) edlev1 comorb1 (logit, augment) pain11 pain12 work1 work2 func1 func2 anx1 anx2 dep1 dep2 ptsd1 ptsd2 (regress) iss1 AGG_PHYS1 AGG_PHYS2 AGG_MENT1 AGG_MENT2 = age1 sex1 sevheadinj1 icu1 vent1 loscat1 threegroups1, add (30) noisily

and I got this below:

mi impute: VCE is not positive definitive
The posterior distribution from which mi impute drew the imputations for ptsd2 is not proper when the VCE estimated from the observed data is not positive definitive. This may happen, for example, when the number of parameters exceeds the number of observations. Choose an alternate imputation model.
error occurred during imputation of raceg1 injtype1 dischdispo1 edlev1 comorb1 pain11 pain12 work1 work2 func1 func2 anx1 anx2 dep1 dep2 ptsd1 ptsd2 iss1 AGG_PHYS1 AGG_PHYS2 AGG_MENT1 AGG_MENT2 on m = 1

r (498);

end of do-file

r (498);


Is there anyone who can help me?

What is my best option at this moment?

By the way, I ran my imputation before in long format and the codes worked perfectly, but since this is a data base from a longitudinal study, I decided running this imputation in wide form.
If you know whether there is a possibility for running multiple imputation with data base in long, please let me know.

Thank you very much.

C



Rangestat (sd) over many variables, having missing values

$
0
0
Hello everyone

I am trying to calculate the rolling standard deviation over the past 11 observations plus the current(makes it from t to t-11). A necessary condition to do so is that the Variables (Nr_xxx) contain a value and are not empty in the past 11 observations.

This is my data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int Jahr byte Monat double(Nr_922924 Nr_982102 Nr_288950) float seqnum
1999 1 -3.42 2.5 . 1
1999 2 13.83 5.69 . 2
1999 3 10.62 7.69 . 3
1999 4 13.38 -1.79 . 4
1999 5 -2.7 -6.91 . 5
1999 6 11.29 2.54 . 6
1999 7 -1.19 -4.76 . 7
1999 8 6.74 0 . 8
1999 9 .32 -2 . 9
1999 10 -.97 .41 . 10
1999 11 2.44 12.8 . 11
1999 12 23.85 -.9 . 12
2000 1 -6.29 -3.27 . 13
2000 2 -2.74 1.5 . 14
2000 3 9.35 -7.04 . 15
2000 4 .26 0 . 16
2000 5 9.4 0 15.38 17
2000 6 -6.8 9.56 10.83 18
2000 7 1.92 0 36.84 19
2000 8 -2.01 0 7.69 20
2000 9 -13.85 9.09 3.95 21
2000 10 -4.91 -14.67 -2.45 22
2000 11 -1.1 7.42 -6.42 23
2000 12 9.34 0 -1.08 24
2001 1 -5.64 -3.82 -7.61 25
2001 2 -13.19 3.97 -8.82 26
2001 3 -8.75 -5.45 -13.71 27
2001 4 -.79 -1.92 -60 28
2001 5 3.2 16.67 13.08 29
2001 6 -15.66 -7.56 -3.47 30
2001 7 -29.6 -2.73 -18.66 31
2001 8 -9.66 1.12 1.05 32
2001 9 -32.95 -2.03 -14.58 33
2001 10 19.83 0 41.46 34
2001 11 23.74 -3.77 7.59 35
2001 12 -6.98 2.94 25 36
2002 1 -7.81 -19.05 -13.46 37
2002 2 -17.63 0 14.07 38
2002 3 9.88 0 3.77 39
2002 4 8.61 12.94 -4.88 40
2002 5 -2.76 0 -11.18 41
2002 6 -6.03 -16.67 -20 42
2002 7 -40.23 0 -5.65 43
2002 8 3.54 0 9.52 44
2002 9 -40.61 -20 -7.97 45
2002 10 -59.55 12.5 10.03 46
2002 11 152.28 -16.67 7.96 47
2002 12 -20.93 0 0 48
2003 1 1.78 .33 -6.39 49
2003 2 -29.75 -.33 9.46 50
2003 3 12.81 0 2 51
2003 4 31.55 0 36.24 52
2003 5 -1.2 0 10.13 53
2003 6 8.01 .83 -5.7 54
2003 7 34.38 -17.36 4.99 55
2003 8 33.61 -11.6 5.6 56
2003 9 -8.64 -9.5 13.25 57
2003 10 7.53 26 12.14 58
2003 11 1.8 17.26 -2.56 59
2003 12 -1.1 -1.86 7.88 60
2004 1 15.15 -3.45 3.93 61
2004 2 6.51 0 6.67 62
2004 3 -2.86 6.96 -7.43 63
2004 4 -2.28 -4.84 3.83 64
2004 5 -4.11 -6.93 -3.34 65
2004 6 -2.14 -27.78 4.73 66
2004 7 .88 0 -15.1 67
2004 8 3.76 -21.54 -3.48 68
2004 9 6.28 7.84 8.47 69
2004 10 -9.19 -7.09 7.03 70
2004 11 1.73 5.19 -23.36 71
2004 12 -9.8 11.63 11.24 72
2005 1 2.99 33.17 -3.42 73
2005 2 7.95 -9.26 14.36 74
2005 3 5.24 6.9 -6.2 75
2005 4 -.67 12.9 5.37 76
2005 5 10.43 -2.86 -.78 77
2005 6 3.07 0 5.22 78
2005 7 4.52 2.94 3.83 79
2005 8 2.96 2.86 5.5 80
2005 9 4.65 2.78 -4.32 81
2005 10 4.23 -3.24 3.94 82
2005 11 17.14 .56 -11.38 83
2005 12 10.39 5.56 -15.41 84
2006 1 9.02 5.26 5.15 85
2006 2 12.95 9 -1.66 86
2006 3 4.78 3.21 15.21 87
2006 4 7.6 2.22 8.26 88
2006 5 -12.4 0 1.21 89
2006 6 3.25 -4.35 -13.18 90
2006 7 0 -12.27 25.89 91
2006 8 2.83 3.63 6.25 92
2006 9 .61 0 8.86 93
2006 10 12.46 4.35 16.78 94
2006 11 4.59 -6.09 2 95
2006 12 12.92 -2.86 25.41 96
2007 1 .69 2.42 12.87 97
2007 2 -7.05 -1.03 -12.07 98
2007 3 1.71 16.58 6.48 99
2007 4 17.79 -4.44 1.68 100
end
[/CODE]

I first tried to calculate it only for one variable Nr_288950 and use the rangestat command
rangestat(sd) test_1 = Nr_288950 if Nr_288950[_n-11] ~=., interval (seqnum -11 0)

This does generate a new variable test_1, but somehow the first 10 entries (marked red) are wrong (checked it using excel), afterwards the calculations are correct.
Besides the fact, that the first 10 observations seems to be calculated wrong (e.g. not how I expect the standard deviation to be calculated), I would expect the first result of the rangestat command to be one seqnum earlier, as the condition with past 11 observation + current observation is satisfied.

* Example generated by -dataex-. To install: ssc install dataex
clear
input int Jahr byte Monat double(Nr_922924 Nr_982102 Nr_288950) float seqnum double test_1
1999 1 -3.42 2.5 . 1 .
1999 2 13.83 5.69 . 2 .
1999 3 10.62 7.69 . 3 .
1999 4 13.38 -1.79 . 4 .
1999 5 -2.7 -6.91 . 5 .
1999 6 11.29 2.54 . 6 .
1999 7 -1.19 -4.76 . 7 .
1999 8 6.74 0 . 8 .
1999 9 .32 -2 . 9 .
1999 10 -.97 .41 . 10 .
1999 11 2.44 12.8 . 11 .
1999 12 23.85 -.9 . 12 .
2000 1 -6.29 -3.27 . 13 .
2000 2 -2.74 1.5 . 14 .
2000 3 9.35 -7.04 . 15 .
2000 4 .26 0 . 16 .
2000 5 9.4 0 15.38 17 .
2000 6 -6.8 9.56 10.83 18 .
2000 7 1.92 0 36.84 19 .
2000 8 -2.01 0 7.69 20 .
2000 9 -13.85 9.09 3.95 21 .
2000 10 -4.91 -14.67 -2.45 22 .
2000 11 -1.1 7.42 -6.42 23 .
2000 12 9.34 0 -1.08 24 .
2001 1 -5.64 -3.82 -7.61 25 .
2001 2 -13.19 3.97 -8.82 26 .
2001 3 -8.75 -5.45 -13.71 27 .
2001 4 -.79 -1.92 -60 28 .
2001 5 3.2 16.67 13.08 29 51.67536356911289
2001 6 -15.66 -7.56 -3.47 30 38.319337589960156
2001 7 -29.6 -2.73 -18.66 31 31.301476402879143
2001 8 -9.66 1.12 1.05 32 28.31795102050994
2001 9 -32.95 -2.03 -14.58 33 25.331505021744498
2001 10 19.83 0 41.46 34 31.15119786736023
2001 11 23.74 -3.77 7.59 35 29.230631896546964
2001 12 -6.98 2.94 25 36 29.02252779212115
2002 1 -7.81 -19.05 -13.46 37 27.647296612869763
2002 2 -17.63 0 14.07 38 26.683282406780467
2002 3 9.88 0 3.77 39 25.474507206700633
2002 4 8.61 12.94 -4.88 40 17.443962813431
2002 5 -2.76 0 -11.18 41 17.730711137014286
2002 6 -6.03 -16.67 -20 42 18.821975470339467
2002 7 -40.23 0 -5.65 43 17.95064603800705
2002 8 3.54 0 9.52 44 18.078856476977506
2002 9 -40.61 -20 -7.97 45 17.600930480741035
2002 10 -59.55 12.5 10.03 46 13.168627318538006
2002 11 152.28 -16.67 7.96 47 13.186978741672913
2002 12 -20.93 0 0 48 10.727275452619068
2003 1 1.78 .33 -6.39 49 10.190354024837172
2003 2 -29.75 -.33 9.46 50 9.647400144362964
2003 3 12.81 0 2 51 9.576483032550481
2003 4 31.55 0 36.24 52 14.379941732928692
2003 5 -1.2 0 10.13 53 13.912324245398068
2003 6 8.01 .83 -5.7 54 12.197216864614004
2003 7 34.38 -17.36 4.99 55 11.73298648979436
2003 8 33.61 -11.6 5.6 56 11.676127516770224
2003 9 -8.64 -9.5 13.25 57 11.035114305129456
2003 10 7.53 26 12.14 58 11.09923611895172
2003 11 1.8 17.26 -2.56 59 11.466678470747977
2003 12 -1.1 -1.86 7.88 60 11.278653476476004
2004 1 15.15 -3.45 3.93 61 10.51110696820129
2004 2 6.51 0 6.67 62 10.509467989591217
2004 3 -2.86 6.96 -7.43 63 11.311157404005208
2004 4 -2.28 -4.84 3.83 64 6.613074458062512
2004 5 -4.11 -6.93 -3.34 65 6.693672498348834
2004 6 -2.14 -27.78 4.73 66 6.071006294202175
2004 7 .88 0 -15.1 67 8.209101748373662
2004 8 3.76 -21.54 -3.48 68 8.311861185296369
2004 9 6.28 7.84 8.47 69 7.807801265789275
2004 10 -9.19 -7.09 7.03 70 7.285414393072496
2004 11 1.73 5.19 -23.36 71 10.107527460444338
2004 12 -9.8 11.63 11.24 72 10.413194536478489
2005 1 2.99 33.17 -3.42 73 10.340512559829905
2005 2 7.95 -9.26 14.36 74 11.082866759375058
2005 3 5.24 6.9 -6.2 75 11.018846114587937
2005 4 -.67 12.9 5.37 76 11.08184414693555
2005 5 10.43 -2.86 -.78 77 11.042748752009166
2005 6 3.07 0 5.22 78 11.063099564303712
2005 7 4.52 2.94 3.83 79 10.023433755077361
2005 8 2.96 2.86 5.5 80 9.95088834164462
2005 9 4.65 2.78 -4.32 81 9.911520806464047
2005 10 4.23 -3.24 3.94 82 9.785795673817075
2005 11 17.14 .56 -11.38 83 7.4009364681816185
2005 12 10.39 5.56 -15.41 84 8.302327667405672
2006 1 9.02 5.26 5.15 85 8.37539686983684
2006 2 12.95 9 -1.66 86 7.140644865200138
2006 3 4.78 3.21 15.21 87 8.278701543262944
2006 4 7.6 2.22 8.26 88 8.460932680158956
2006 5 -12.4 0 1.21 89 8.43956855675943
2006 6 3.25 -4.35 -13.18 90 9.290399859287788
2006 7 0 -12.27 25.89 91 11.965765149210501
2006 8 2.83 3.63 6.25 92 11.98991722871827
2006 9 .61 0 8.86 93 11.995661968420338
2006 10 12.46 4.35 16.78 94 12.66434280915827
2006 11 4.59 -6.09 2 95 11.760035817302702
2006 12 12.92 -2.86 25.41 96 11.22813417134374
2007 1 .69 2.42 12.87 97 11.249365908056033
2007 2 -7.05 -1.03 -12.07 98 12.479567634070234
2007 3 1.71 16.58 6.48 99 12.281837538957
2007 4 17.79 -4.44 1.68 100 12.38623050248162


I hope anyone knows if there is a possibility to solve that error in my code?

Especially, because in a second step I would like to do the rolling standard deviation for all variables. However, the command

foreach `i' of varlist(Nr_922924 - Nr_288950){
rangestat(sd) rsd_`i'= `i' if `i'[_n-11]~=., interval (seqnum -11 0)
}

leaves me with the error message "invalid" syntax, but to me its not obvious where the error in the command lies.

Thank you in advance.

indirect, direct and total effects in spivreg

$
0
0
I want to estimate the indirect, direct and total effects in spivreg.

II did estimates the spivreg equation but I don´t know how to use the margins command to recover the indirect and direct effects.

Can u help me??

xsmle model not converging

$
0
0
Dear statalist,

I conducted a spatial durbin model with the following code (the data is available here ):




Code:
cd "C:\Users\hulenyi\Disk Google\Dokumenty\ISA\natura2000\calculations\dista nce matrices\distance"
use "dm_cut1c.dta"
spmat dta dm_st V*, replace
spmat summarize dm_st

clear
import excel "C:\Users\hulenyi\Disk Google\Dokumenty\ISA\natura2000\calculations\Data\ esifdata94_new.xlsx", sheet("Sheet 1") firstrow

destring year, replace
destring regid36, replace

xtset regid36 year


gen wgilefp = wgipca*lefpayr

xsmle gdppcgr laglgdppc linvr lpopgr wgipca lefpayr wgilefp, wmat(dm_st) model(sdm) robust fe type(both) effects
estimates store `var', title(GDPPCGR)
test [Wx]laglgdppc = [Wx]linvr = [Wx]lpopgr = [Wx]wgipca = [Wx]wgilefp = [Wx]lefpayr = 0
testnl ([Wx]laglgdppc = - [Spatial]rho*[Main]laglgdppc) ([Wx]linvr = - [Spatial]rho*[Main]linvr) ([Wx]lpopgr = - [Spatial]rho*[Main]lpopgr) ([Wx]wgipca = - [Spatial]rho*[Main]wgipca) ([Wx]lefpayr = - [Spatial]rho*[Main]lefpayr) ([Wx]wgilefp = - [Spatial]rho*[Main]wgilefp)


esttab dm_cut1c using "C:\Users\hulenyi\Disk Google\Dokumenty\ISA\natura2000\calculations\resul ts2401a.tex", se bic aic
With the model not converging:
Code:
Iteration 0:   Log-pseudolikelihood =  10689.627  
Iteration 1:   Log-pseudolikelihood =  10753.541  
Iteration 2:   Log-pseudolikelihood =  10815.604  
Iteration 3:   Log-pseudolikelihood =   10815.85  
Iteration 4:   Log-pseudolikelihood =   10815.85  
Iteration 5:   Log-pseudolikelihood =   10815.85  (backed up)
Iteration 6:   Log-pseudolikelihood =   10815.85  (backed up)
...

Iteration 100: Log-pseudolikelihood =   10815.85  (backed up)
convergence not achieved

Why does my model not converge? What would be a possible solution here?

Best,

Martin Hulenyi

Displaying number of observations underneath twoway line plots

$
0
0
Hello,

I wonder if I could please get advice on how to display the number of observations underneath twoway line plots, much like the option
Code:
 sts graph, by() risktable
For instance the sts graph manual (https://www.stata.com/manuals13/ststsgraph.pdf) shows the "number at risk" in this graph Array

I would like to produce the same thing for a twoway plot for instance for this code

Code:
use http://www.stata-press.com/data/r16/nlswork.dta, clear

gen ct_wks_work = 1 if wks_work !=.
gen ct_wks_ue = 1 if wks_ue !=.

collapse (mean) wks_work wks_ue (count) ct_wks_work ct_wks_ue,  by (year)

twoway (line wks_work year )   ///
       (line wks_ue  year)

I produce the following twoway graph but I would like to include two rows for the values of ct_wks_work ct_wks_ue
Array

Counting different variable associated with one variable

$
0
0
I have a list of patients who have pulmonary nodule treatment with three different types of embolic agents (MVP, AVM, or Coil). My data is saved as presented below. I want to know the following count

How many patients (ID) have only MVP?
How many patients (ID) have only Coil?
How many patients (ID) have MVP and Coil?
How many patients (ID) have AVP and Coil?

Also, I want to count how many coils per patients (ID)






ID Embolic1 Embolic2 Embolic3 Embolic4 Embolic5 Embolic6 Embolic7 Embolic8 Embolic9 Embolic10 Embolic11

1 MVP

2 Coil Coil Coil Coil Coil Coil Coil Coil Coil Coil Coil

3 Coil Coil

3 AVP AVP

3 Coil Coil

3 Coil Coil Coil Coil Coil Coil Coil

3 Coil Coil

3 Coil Coil

3 Coil Coil Coil Coil Coil Coil

3 Coil Coil Coil Coil Coil Coil Coil Coil

3 Coil Coil Coil Coil Coil Coil Coil

3 Coil Coil

3 MVP Coil Coil Coil Coil Coil Coil Coil Coil Coil

3 Coil Coil

3 Coil Coil Coil Coil Coil Coil Coil Coil Coil Coil Coil

4 Coil Coil

4 Coil Coil

5 Coil Coil

5 Coil Coil

5 Coil

6 Coil Coil Coil Coil Coil

7 MVP

7 MVP Coil

8 MVP Coil Coil Coil Coil

9 Coil Coil Coil Coil

10 AVP

11 AVP

11 Coil Coil Coil Coil Coil Coil

11 AVP

11 Coil Coil Coil Coil Coil Coil

12 Coil Coil Coil

12 AVP

13 Coil Coil Coil Coil

14 Coil Coil Coil

15 Coil Coil Coil Coil Coil

15 Coil Coil Coil Coil Coil Coil Coil Coil Coil Coil Coil

15 AVP

15 Coil

15 Coil Coil Coil Coil Coil Coil Coil Coil Coil Coil

15 Coil Coil Coil Coil Coil Coil Coil Coil

15 AVP


Means at a time interval

$
0
0
Good evening, I'm having a datset with age as the independent variable. Age is given as a point number, i.e. 38.82 for a person who is 38 years and one month old. I'm supposed to get averages three times a year with 1/3 of a year between them. What would be a suitable way of doing so? The examples I've seen in the archives here are unfortunately all for cases where months or days are given as variables.

GSEM and Survey Weights Issues

$
0
0
Hello,

Thanks for giving this a look. First, I am working with the ELS:2002 restricted, so it is a bit difficult to copy/paste in the code. I took some images and have typed in the code and the response I am receiving

My survey set command:

svyset sch_id, weight(byschwt) || stu_id, weight(f1xpnlwt) strata(strat_id)
GSEM Command:
svy: gsem (i.sport2 <- c.schsize c.stdschool c.sesschool i.bysctrl i.byurban M1[sch_id@1]) if f1!=1, mlogit
When I run the GSEM command, I keep receiving the lines:
initial values not feasible
an error occurred when svy executed gsem
I have tried other student-level weights that conceptually make sense for this my analyses. Also, I have tried running the model with only one independent variable and still have had the same results. All the time I have been getting the same For anyone that has worked with the ELS:2002, am I doing something wrong in the command line or am I just S.O.L. on getting this to run?

Thanks for any help!

Market share of different firms in different regions

$
0
0
HI,
I have a dataset of some firms which provide services to some clients in 10 different regions. How can I calculate market share of each firm in each region based on the number of their clients?
my variables are: firm_id, firm_name, client_id, region

for example:
firm_id, firm_name, client_id, region
10,firm1,client1,region1
10,firm1,client2,region1
10,firm1,client3,region1
20,firm2,client4,region1
20,firm2,client5,region1
30,firm3,client6,region2
30,firm3,client7,region2
40,firm4,client8,region2
10,firm1,client9,region2
so on
in region 1, the market share of firm1 is 60%, but in region 2, its market share is 25%
​​​​​​​thank you

Dark Mode Stata 16 on macOS

$
0
0
Hi,

I'm using Stata 16 on macOS Catalina and Dark Mode enabled in the macOS System Preferences. However I cannot figure out how to enable Dark Mode/Dark Color Scheme for the Do-file editor? Thus, I can't see the cursor in the Do-file editor. Works fine for all the other Stata windows (results, command, etc)

Any advice? Is it not possible with dark theme in the do-file editor???

Thanks

Making line plot visible over bars

$
0
0
The following code plots what I want.
Code:
twoway line listed_domestic year || bar mcap_bil_usd year, yaxis(2) barwidth(.75)
How do I make the line plot visible over the bars?

In the plot, blank space is shown from 1970-1974 since data is available from 1975. How do I remove that blank space i.e. start the graph from 1975?

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long year double(listed_domestic mcap_bil_usd)
1975 3422   194.100171148777
1976 3428 157.19968158006668
1977 2877 139.13697683811188
1978 3288 236.03695690631866
1979 3171 234.20436120033264
1980 3241    231.95478951931
1981 3244  196.1764224767685
1982 3170 185.80165040493011
1983 3111  224.0923033952713
1984 3218   270.100231051445
1985 3193  461.6768672466278
1986 3324   807.946816444397
1987 3475  816.0708112716675
1988 3249  1026.451008796692
1989 3275 1441.8436336517334
1990 3316 1409.0109004974365
1991 4278 1597.3958539962769
1992 4096 1395.0458869934082
1993 4379 1781.5791530963033
1994 3800  2103.305383119732
1995 5085 2634.7207982726395
1996 5322 3227.5260013584048
1997 5824 3991.5663554668427
1998 5639 3499.5871432870626
1999 6363  5302.511682383716
2000 6855  5538.939034000039
2001 7626  4547.681827552617
2002 7442 3682.3636319358193
2003 7284  5292.589614300814
2004 3883  6059.987166102976
2005 3499  6323.672760486603
2006 7036  8596.499108314514
2007 7550 10511.337663650513
2008 7584  5191.032473564148
2009 7341 6900.2222266197205
2010 7190  6705.177469730377
2011 7308  5601.655090332031
2012 7144  6332.388786554337
2013 7107  7978.493138551712
2014 6810  7227.137099504471
2015 6655  6313.031332492828
2016 5653  2956.075694322586
2017 5167 3861.1491498947144
2018 4989 3040.0976524353027
end

Take values of previous year if values for current period are not availabe

$
0
0
Dear forum,

I have a dataset which looks like this

Company code year .....
123 2000
123 2001
244 2000
244 2001
355 2000
588 2000
588 2001

Now I want to regress my data. The regression should take all company data from 2001. If the company has only data for 2000, the regression should take 2000 (like the number 355 in my dataset). I do not have "missing data", I already cutted them out. So replacing missing values is not possible for me.
How can I do that?

My regression at the moment is like this:
reg car size if year==2001, robust

regress to rangestat or rangerun

$
0
0
Hi all,

I am trying to run below regression by intervals and years on the whole sample with rangestat & rangerun; however, whatever I do I am getting different results than for the below.

Code:
webuse grunfeld
bysort year (invest):gen id_invest = _n  
regress mvalue kstock if company[8] & year == 1954 & inrange(id_invest, id_invest[8]-3, id_invest[8] +3)
Can anyone please advise how to write the above code for the whole sample.


Thank you.

Metaprop meta analysis, Variance of the synthesized proportion.

$
0
0
Hi All,

I am using metaprop to meta-analyse proportions from 8 studies using a random effects approach. Metaprop is a user written Stata package.
Meta prop uses an exact binomial method to estimate the included studies' proportions of interest and their standard errors along with 95% confidence intervals.
However, for the overall synthesized proportion, the estimated proportion is given along with its 95% CI but not its standard error. As I am using a random effects approach, metaprop also give a measure of heterogeneity and the between study variance.

Does anyone know how to extract this from metaprop easily. I can calculate it by summing the product of the study specific weights and their estimated standard errors but I am hoping that there may be a more direct method using metaprop.

Thanks for taking the time to consider this,

Don

Predict residual error after oprobit regression

$
0
0
Hello,
We are trying to generate the residuals after running an oprobit regression. Our dataset consists of 202 towns and our dependent variable is the number of tire dealers, which can be 0,1,2,3,4 or 5.
The regression we are running is the following:
oprobit N_tire ln_Sm eld pinc lnhdd ffrac landv, robust

Where N_tire is the number of tire dealers. ln_Sm is the logarithm of the market size. eld is the fraction of old people in the population. pinc is the per capita income. lnhdd is the logarithm of heating degree days. ffrac is the fraction of land in farms and landv is the value per acre of land.

The output of the regression is:
Ordered probit regression
N_tire Coef. St.Err. t-value p-value [95% Conf Interval] Sig
ln_Sm 1.271 0.122 10.44 0.000 1.033 1.510 ***
eld -2.957 1.823 -1.62 0.105 -6.530 0.617
pinc 0.049 0.076 0.64 0.524 -0.101 0.198
lnhdd 0.041 0.202 0.20 0.840 -0.355 0.437
ffrac 0.089 0.261 0.34 0.733 -0.422 0.600
landv -0.129 0.470 -0.28 0.783 -1.051 0.792
cut1 0.259 1.854 .b .b -3.374 3.893
cut2 1.142 1.873 .b .b -2.530 4.814
cut3 1.957 1.880 .b .b -1.728 5.641
cut4 2.535 1.879 .b .b -1.147 6.217
cut5 2.938 1.875 .b .b -0.737 6.613
Mean dependent var 2.233 SD dependent var 1.815
Pseudo r-squared 0.255 Number of obs 202.000
Chi-square 143.725 Prob > chi2 0.000
Akaike crit. (AIC) 541.263 Bayesian crit. (BIC) 577.654
*** p<0.01, ** p<0.05, * p<0.1
To generate the residuals of this regression we use the following command right after this regression:
predict uhat, resid

However, if we do this, we get an error message saying option resid not allowed r(198);

We need to get the residuals of this regression in order to be able to get the standard deviation of the error term. We don't know how to fix this error. Please let us know if you know what we are doing wrong. Thank you very much.

Estpost for summary statistics with complex design survey data

$
0
0
Hi ,

I am working on quarterly Labor Force Survey from 2008 to 2010 .
I would like to create a table for summary statistics for 4 subpopulation of my data ( Male before, Male After , Female before and Female After ) where After is the variable Post.
My dataset is already svyset , but the estpost tabstat doesn't seem to support svy , only the tabulate but not tabstat.

I use Stata 16 and the estout package.

The code I used is below ;

Code:
* generate the column identifiers by flag
gen flag=1 if sex==1 & post==0
replace flag=2 if sex==1 & post==1
replace flag=3 if sex==0 & post==0
replace flag=4 if sex==0 & post==1
estpost tabstat age mar_st reg* edu2 edu3 edu4 edu6 edu8 lfs unemployed [aw=int_wt], by(flag) statistics(mean sd N) columns(statistics) nototal
esttab using table1.rtf, replace main(mean) aux(sd) unstack nomtitle nonumber label nogaps title("Discriptive statistics for sample ")

Discriptive statistics for sample
1 2 3 4
age 27.10 26.42 27.53 26.72
(19.26) (19.21) (19.02) (19.02)
Marital status 2.049 2.048 2.423 2.381
(0.986) (0.984) (1.403) (1.409)
region 4.142 4.155 4.127 4.137
(1.813) (1.809) (1.812) (1.803)
region==Greater Cairo 0.157 0.155 0.159 0.156
(0.364) (0.362) (0.365) (0.363)
region==Alex & Suez Canal 0.0769 0.0770 0.0774 0.0773
(0.266) (0.267) (0.267) (0.267)
region==Urban Upper Egypt 0.0996 0.0964 0.102 0.0998
(0.299) (0.295) (0.302) (0.300)
region==Urban Lower Egypt 0.0873 0.0910 0.0873 0.0910
(0.282) (0.288) (0.282) (0.288)
region==Rural Upper Egypt 0.309 0.307 0.307 0.308
(0.462) (0.461) (0.461) (0.462)
region==Rural Lower Egypt 0.250 0.255 0.252 0.252
(0.433) (0.436) (0.434) (0.434)
region==Frontier 0.0194 0.0186 0.0162 0.0156
(0.138) (0.135) (0.126) (0.124)
educ_st==Illitrate 0.161 0.149 0.292 0.267
(0.367) (0.356) (0.455) (0.442)
educ_st==Read and Write 0.139 0.134 0.105 0.0982
(0.346) (0.341) (0.307) (0.298)
educ_st==Primary or Prep 0.172 0.172 0.144 0.148
(0.377) (0.378) (0.351) (0.355)
educ_st==Technical/Vocational Diploma 0.172 0.174 0.139 0.148
(0.377) (0.379) (0.346) (0.355)
educ_st==College Degree 0.0846 0.0880 0.0638 0.0718
(0.278) (0.283) (0.244) (0.258)
Labor Force 0.729 0.752 0.227 0.228
(0.445) (0.432) (0.419) (0.419)
All Unemployed 0.0421 0.0383 0.0521 0.0559
(0.201) (0.192) (0.222) (0.230)
Observations 1008611
mean coefficients; sd in parentheses
* p < 0.05, ** p < 0.01, *** p < 0.001



My problem is that the above gives different results than using the code below, how can I incorporate the svy settings in the above code to get descriptive statistics with same results.


svy, subpop (if sex==1 & post==0): mean age lfs unemployed reg* ( for any of the 4 subpopulations above ) , I get different results .

The following
Survey: Mean estimation
Number of strata = 154 Number of obs = 852,739
Number of PSUs = 7,427 Population size = 194,685,642
Subpop. no. obs = 273,166
Subpop. size = 62,671,956.6
Design df = 7,273
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
age | 33.70677 .0251904 33.65739 33.75615
lfs | .7654193 .0011021 .7632589 .7675798
unemployed | .0420523 .000682 .0407153 .0433892
region | 4.06441 .0108864 4.04307 4.085751
reg1 | .1657978 .0026677 .1605684 .1710272
reg2 | .0814624 .0011528 .0792026 .0837221
reg3 | .1037393 .0018627 .1000878 .1073908
reg4 | .087979 .0017812 .0844873 .0914708
reg5 | .3117424 .0020266 .3077696 .3157151
reg6 | .2311122 .0018523 .2274812 .2347432
reg7 | .018167 .0015668 .0150955 .0212385
--------------------------------------------------------------
Note: Strata with single sampling unit centered at overall
mean.
.


Any suggestions to how to incorporate the svy settings to get svy like results using estpost ?

Thanks,


Estpost for summary statistics with complex design survey data

$
0
0
Hi ,

I am working on quarterly Labor Force Survey from 2008 to 2010 .
I would like to create a table for summary statistics for 4 subpopulation of my data ( Male before, Male After , Female before and Female After ) where After is the variable Post.
My dataset is already svyset , but the estpost tabstat doesn't seem to support svy , only the tabulate but not tabstat.

I use Stata 16 and the ESTPOST command.

The code I used is below ;

Code:
generate the column identifiers by flag
gen flag=1 if sex==1 & post==0
replace flag=2 if sex==1 & post==1
replace flag=3 if sex==0 & post==0
replace flag=4 if sex==0 & post==1
estpost tabstat age mar_st reg* edu2 edu3 edu4 edu6 edu8 lfs unemployed [aw=int_wt], by(flag) statistics(mean sd N) columns(statistics) nototal
esttab using table1.rtf, replace main(mean) aux(sd) unstack nomtitle nonumber label nogaps title("Discriptive statistics for sample ")

and the results were as below;

Discriptive statistics for sample
1 2 3 4
age 27.10 26.42 27.53 26.72
(19.26) (19.21) (19.02) (19.02)
Marital status 2.049 2.048 2.423 2.381
(0.986) (0.984) (1.403) (1.409)
region 4.142 4.155 4.127 4.137
(1.813) (1.809) (1.812) (1.803)
region==Greater Cairo 0.157 0.155 0.159 0.156
(0.364) (0.362) (0.365) (0.363)
region==Alex & Suez Canal 0.0769 0.0770 0.0774 0.0773
(0.266) (0.267) (0.267) (0.267)
region==Urban Upper Egypt 0.0996 0.0964 0.102 0.0998
(0.299) (0.295) (0.302) (0.300)
region==Urban Lower Egypt 0.0873 0.0910 0.0873 0.0910
(0.282) (0.288) (0.282) (0.288)
region==Rural Upper Egypt 0.309 0.307 0.307 0.308
(0.462) (0.461) (0.461) (0.462)
region==Rural Lower Egypt 0.250 0.255 0.252 0.252
(0.433) (0.436) (0.434) (0.434)
region==Frontier 0.0194 0.0186 0.0162 0.0156
(0.138) (0.135) (0.126) (0.124)
educ_st==Illitrate 0.161 0.149 0.292 0.267
(0.367) (0.356) (0.455) (0.442)
educ_st==Read and Write 0.139 0.134 0.105 0.0982
(0.346) (0.341) (0.307) (0.298)
educ_st==Primary or Prep 0.172 0.172 0.144 0.148
(0.377) (0.378) (0.351) (0.355)
educ_st==Technical/Vocational Diploma 0.172 0.174 0.139 0.148
(0.377) (0.379) (0.346) (0.355)
educ_st==College Degree 0.0846 0.0880 0.0638 0.0718
(0.278) (0.283) (0.244) (0.258)
Labor Force 0.729 0.752 0.227 0.228
(0.445) (0.432) (0.419) (0.419)
All Unemployed 0.0421 0.0383 0.0521 0.0559
(0.201) (0.192) (0.222) (0.230)
Observations 1008611
mean coefficients; sd in parentheses
* p < 0.05, ** p < 0.01, *** p < 0.001



My problem is that the above doesn't include the svy settings and give results very different from using for example the svy commands below

Code:
svy, subpop (if sex==1 & post==0): mean age lfs unemployed reg* ( for any of the 4 subpopulations above ) , I get different results .

The following
Survey: Mean estimation
Number of strata = 154 Number of obs = 852,739
Number of PSUs = 7,427 Population size = 194,685,642
Subpop. no. obs = 273,166
Subpop. size = 62,671,956.6
Design df = 7,273
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
age | 33.70677 .0251904 33.65739 33.75615
lfs | .7654193 .0011021 .7632589 .7675798
unemployed | .0420523 .000682 .0407153 .0433892
region | 4.06441 .0108864 4.04307 4.085751
reg1 | .1657978 .0026677 .1605684 .1710272
reg2 | .0814624 .0011528 .0792026 .0837221
reg3 | .1037393 .0018627 .1000878 .1073908
reg4 | .087979 .0017812 .0844873 .0914708
reg5 | .3117424 .0020266 .3077696 .3157151
reg6 | .2311122 .0018523 .2274812 .2347432
reg7 | .018167 .0015668 .0150955 .0212385
--------------------------------------------------------------
Note: Strata with single sampling unit centered at overall
mean.
.


Any idea on how can I incorporate svy settings to get svy like results using estpost ? Or what am I doing wrong in the above.

Thanks,

How to use the foreach code with multiple varlists.

$
0
0
Hello Statalister's

Im using Stata 12.1, my dataset has around 50000 participants, each participant has a village code, the villages are grouped in predetermined clusters and I have a key to assign villages into their respective clusters. What i'd like to do is assign each participant their respective cluster based on what village code (variable: vill_no) they have.
Iv'e set up two local varlist one with each unique village_code and other with respective clusterID, the order they come in the lists is very important, ie the 1st villagecode, is assigned the 1st cluster ID.

The idea is to loop it using foreach, if the village code(vill_no) variable matches the local village code, to assign that partcipant the same ordered clusterID.
the script runs but is not making any changes or assigning any clusterIDs.

Anyone have ideas, it seems like a simple concept but Im stuck on the code.
Thanks!
-Ben

Code:
gen clustername=""

local clusterID "BNBS    BNBS    BNBS   BNST    BNST    BNST    BNSB    BNSB    BNSB    BNSB ...."
local villagecode "64006    73015    73048    73059    73187    73216    73016    73049    73132    73188    73002    73020 .........."

foreach x in `clusterID' & y in `villagecode'{
    replace clustername="`x'" if vill_no=="`y"
    }


How to draw bivariate random vector (v,u) which is bivariate normal, but with a correlation which depends on a third variable x?

$
0
0
Dear all,

I would like to simulate some data: a bivariate random vector (v,u) which is bivariate normal, but with a correlation which depends on a third variable x.

Suppose I have 10 x 1 vector of xis. I can define the correlation as

p(xi)=[exp(xi)-1]/[exp(xi)+1]

such that it each p(xi) with the interval [-1,1].

Then I would like to draw a bivariate random vector of size 10 (v,u) with correlation matrix (1 , p(xi) \ p(xi) , 1). Essentially I would like to have a distinct Corr(vi,ui)=p(xi).

Anyone has any ideas?

Thank you!

Riju


Finding a target variabel across datasets in the same directory*

$
0
0
Hi,
I've several datasets in the directory, and want to know which ones contain my variable of interest. I've been doing this manually so far, but having to open datasets one by one gets really tedious if too many datasets are involved. So I'm wondering if this can be done mechanically. I understand this sounds a bit wild, but some people here seem to know good wild tricks.
Please let me know if is feasible.
Thank you!
Viewing all 72764 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>