Quantcast
Channel: Statalist
Viewing all 72868 articles
Browse latest View live

Exploratory Factor Analysis

$
0
0
Hi,

I'm using exploratory factor analysis on a 5 point Likert Scale. The problem I'm facing is that the questions in my questionnaire are less (4 questions), so after conducting the analysis, I end up with 2 questions only. To retain the factors, I am considering a cut off of 0.3 and 0.4 for factor loadings and communality respectively. Also, if I calculate the cronbach's alpha for the 2 questions I get, it is above 0.7 but if I consider all 4, the alpha is between 0.4 and 0.5. What should I do? I read somewhere that the construct validity is affected by deleting the items.

Also, since I'm doing an exploratory research, is it okay to use a chronbach's alpha value between 0.4 and 0.5?

Making marginsplots equal size with graph combine

$
0
0
Hi Statalist,

I'm struggling with a graphing issue perhaps someone can help me with. I'm trying to get three comparison plots into one panel, and since they all share the same Y-axis comparisons, I would like to include the labels only one time. The trouble is, if I include the labels on one graph only, they take up space so that this graph is smaller than the other two. I've played around with fxsize but to no avail, and while I've looked up solutions to similar graph sizing problems on here, I don't know how to apply them to a comparison plot.

Code:

Code:
margins relcon4x, pwcompare dydx(ptend1) predict(outcome(0))

marginsplot, horizontal unique xline(0) recast(scatter) yscale(reverse) title("Never affiliated") ylabel(1"Mod_vs_Con" ///
2"Lib_vs_Con" 3"NoID_vs_Con" 4"Lib_vs_Mod" 5"NoID_vs_Mod" 6"NoID_vs_Lib", nogrid) xtitle("") legend(off) name(graph4, replace)

margins relcon4x, pwcompare dydx(ptend1) predict(outcome(1))

marginsplot, horizontal unique xline(0) recast(scatter) yscale(reverse) title("Always affiliated") ylabel(none) ytick(1(1)6) ///
ytitle("") xtitle("") legend(off) name(graph5, replace)

margins relcon4x, pwcompare dydx(ptend1) predict(outcome(2))

marginsplot, horizontal unique xline(0) recast(scatter) yscale(reverse) title("Disaffiliated") ylabel(none) ytick(1(1)6) ytitle("") ///
xtitle("") ytick(1(1)6) legend(off) name(graph6, replace)

graph combine graph41 graph4 graph5 graph6, ycommon altshrink rows(1) col(3) ///
title("Group Differences in Effect of Parents' Worship Attendance" "on W4 Child Affiliation") imargin(tiny)
Result:
Array

Any assistance here is greatly appreciated.

Jesse

Error when using merge

$
0
0
Dear All,

I have a master data set, called master, with approx. 100.000 firms. One of the variables is the city where the firm is located. There are multiple firms within each city in my data. I want to add some city characteristics to the master data set. I have a separate data set, called mergedata where the cities are listed uniquely with different variables. I have tried the following:

use master
merge m:1 city using mergedata

This gives the error: "variable city does not uniquely identify observations in the using data"

What am I doing wrong? I have checked that the variable "city" is uniquely identified in mergedata.

Thanks for your help!

Best,
Fredrik Bakkemo

Exclude older data

$
0
0
Hello,

I want to create a variable about inheritances. I have data when these inheritances were received (year). I want to only consider data newer than 2011.

However the data tracks up to 3 separate inheritance transfers. So each observation can have received up to 3 different inheritances at different times. Person 1 could have received an inheritance in 1990, 2010, and 2014 (see attached file).

I want to only consider observations that received either NO inheritance, or inheritances after 2011. How do I do this since i have 3 separate year variables?

Code:
mi xeq:sort id survey; gen control = 2 if (expectation == 2 & gift_received_lead == 2) | (expectation_lag ==2 & gift_received == 2) 
mi xeq:sort id survey; replace control = 1 if (expectation == 2 & gift_received_lead == 1) | (expectation_lag ==2 & gift_received == 1)
This is the binary variable I am trying to create. just for reference, in my data 2 means zero (so expectation == 2 means the person had no expectation of an inheritance. gift_received == 2 also means the person didn't receive an inheritance).

adding something like
Code:
 mi xeq:sort id survey; replace control = 1 if (expectation == 2 & gift_received_lead == 1) | (expectation_lag ==2 & gift_received == 1) & gift1_year > 2011 & gift2_year > 2011 & gift3_year > 2009
or similar but that makes no sense.

I am unsure how I can create the variable where the variable only considers observations newer than a certain threshold

Pie charts after margins in cmclogit postestimation

$
0
0
I am doing a multinomial logit estimation, using cmclogit in version 16. I then look at the predicted probabilities for different levels of factor variables. For example
Code:
use data, clear

cmset cid alts
cmclogit choice income, casevars(i.race i.gender)
margins race
Clearly the predicted values are the probabilities of selecting each option for each of the categories in race. My question is, is there any way to get the pie charts for each category of race after this estimation? marginsplot presents the lines with the dots. I haven't tried recast() because I'm not sure it would work.

Advice please?

Amending the date values on the horizontal axis of a stacked bar chat.

$
0
0
Dear All,

I am using Stata 16 to create a stacked bar chart on which I would also like to impose a line graph. A part of the data I have is:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int year float(productivity markup demand globall predictedactual)
1952   -.0341   .0123   .0784 -.199 -.143
1953 -.000557 -.00285   .0454 -.178 -.136
1954   -.0287 -.00484     .02 -.122 -.136
1955  -.00819 .000763  -.0297  .104 .0671
1956   -.0296 -.00215  -.0963  .174 .0457
1957     .027  .00684  -.0595  1.96  1.94
1958     .137    .184   .0298  .816  1.17
1959     .161   .0437   .0273  .464  .695
1960      .15  -.0355  -.0321  .494  .576
1961     .136  -.0507   .0021  .379  .466
1962      .16  -.0591 -.00901  .151  .243
1963     .141  -.0756   .0119   .22  .298
1964     .145  -.0768   -.024  .469  .513
1965      .18   -.056  -.0446  .413  .493
1966     .171  -.0533  -.0226  .335   .43
1967       .2  -.0621  .00155  .291  .431
1968     .234  -.0654  .00801  .337  .514
1969      .23  -.0576  -.0203  .298   .45
1970     .226   -.079  -.0133  .508  .641
1971     .262  -.0545  .00744   .26  .475
end
I would like productivity markup demand and globall to be part of the stacked bar chart, while predictedactual is the line plot.

The code attached shows what I have achieved so far:

Code:
graph bar productivity markup demand globall, over(year, gap(*0.25)) stack bar(1, bcolor(blue)) bar(2, bcolor(red)) bar(3, bcolor(green)) bar(4, bcolor(yellow)) ytitle("") ylabel(-0.5(0.5)2)  yscale(range(-0.2 2)) title("") legend(row(2)) legend(label(1 "Productivity") label(2 "Price") label(3 "Domestic demand") label(4 "Foreign demand")) legend(region(lwidth(none))) legend(size(large)) graphregion(fcolor(white))
I still have the following problems:

1. I have the stacked bar chart; unfortunately I have not been able to create date intervals needed to tidy the horizontal axis up (see attached image please).
2. Also, I would like to include the predicted value as a line plot.

Can anyone please help with these?

Thanks in advance,

Olayinka

Dividing the panel data

$
0
0
Dear all,
The data are longitudinal quarterly data sets. Here, I want to change the base year from 2015 to 2010. As the new-base year of this index data, the data need to divide by the 2010 data. From a new base year, quarterly-1 (2011q1) will be divided by 2010q1, such as 2013q1 and 2013q2 will be divided by 2010q1 and 2010q2 respectively. How can I do that? Thanks


input float(id qdate) double(emp gwage) float indp double(exp_index imp_index indpturnover ex_rate) str154 industry byte _merge
7 200 67.4 32.5 71.53333 93 78.6 90.8 118.86 "B07. (Mining Of Metal Ores)" 3
7 201 82.5 38.5 78.16666 100.7 85.7 102.93333333333334 121.6 "B07. (Mining Of Metal Ores)" 3
7 202 87.9 42 74.13333 97.4 116.8 99.63333333333333 123.2 "B07. (Mining Of Metal Ores)" 3
7 203 87.4 52.4 73.86667 108.9 119 106.7 120.22 "B07. (Mining Of Metal Ores)" 3
7 204 84.4 44.4 72.73333 114.5 130.3 110.03333333333335 111.68 "B07. (Mining Of Metal Ores)" 3
7 205 97 51.5 78 123 139.4 113.40000000000002 109.16 "B07. (Mining Of Metal Ores)" 3
7 206 104.3 56.5 94.8 130.1 166.7 165.6 99.89 "B07. (Mining Of Metal Ores)" 3
7 207 101.8 60.8 99.2 127.6 184.5 178.79999999999998 103.56 "B07. (Mining Of Metal Ores)" 3
7 208 97.8 59.7 84.5 121.8 155.2 177.9 108.3 "B07. (Mining Of Metal Ores)" 3

Display st.error under mean in esttab

$
0
0
Hi!
This is my code and my output, the problem is I don't know how to put standard errors under the mean results (to save space)
eststo clear
sort race
eststo: quietly estpost summarize ///
yearsexp honors volunteer military,
by race: eststo: quietly estpost summarize ///
yearsexp honors volunteer military,

esttab, cells("mean(fmt(%12.3fc)) sd") label mtitles(Races)


"negative binomial model" vs. "power model"

$
0
0
Dear STATALIST,

I have a count outcome with some independent variables and I fitted a negative binomial regression. I want to compare this model with a power model.
in theory I can say:

"ys are counts; I should not simply take logs but consider models suitable for counts -Poisson regression or negative binomial regression; if I take logs and use regression, I don't have constant variance on the log scale."

I'm wondering if there are any approaches to statistically compare these two models?

Regards,

Weighted*Average for a Group Excluding the ith member of that group

$
0
0
Hello, I hope you can help me with a question.

I have a panel data with variables year, firm, industry, weight and indepvar.
For each observation i, I want to find out what is the weighted average of indepvar for the industry of the observation i, by year,but removing that firm i from the calculation of the weighted average of indepvar. The idea here is to have a weighted average of indepvar for the particular industry of firm i, in a given year, but pretending firm i is not part of that Industry. The weighted average must pretend firm i is not part of that industry.

I know how to create the weighted average of indepvar by year by industry:

bys year industry: asgen weighted_average_IndepVar = indepvar, w(weight)

But that is how it should not be. This code includes frim i in the calculation of the weighted average indepvar by industry by year.
I do not know how to exclude the current observation i from the calculation.

I'm attaching a pet data file that shows what the weighted average should and should not be.
(The original dataset I'm working on has over 50 thousand firms, so creating flag columns for each firm is not really an option).

I really appreciate any help.

Thanks so much!

Lucas B.

Pointer matrixes and objects to which they point

$
0
0
I'm still a novice when it comes to using pointer matrixes, but I'm finding them increasingly useful in Mata programming exercises. This posting is not so much a query but rather an alert—prompted by programming errors I've made—to others who may also be relatively new to using pointer matrixes.

The basic message can be conveyed by a simple example. Consider this code:
Code:
mata

pmat1=pmat2=J(2,1,NULL)
A=(1,2)\(3,4)
for (j=1;j<=2;j++) {
 A=j:*A
 pmat1[j]=&A
 pmat2[j]=&(A:+0)
}
*pmat1[1]
*pmat1[2]
*pmat2[1]
*pmat2[2]

pmat1
pmat2

end
which results in
Code:
: *pmat1[1]
       1   2
    +---------+
  1 |  2   4  |
  2 |  6   8  |
    +---------+

: *pmat1[2]
       1   2
    +---------+
  1 |  2   4  |
  2 |  6   8  |
    +---------+

: *pmat2[1]
       1   2
    +---------+
  1 |  1   2  |
  2 |  3   4  |
    +---------+

: *pmat2[2]
       1   2
    +---------+
  1 |  2   4  |
  2 |  6   8  |
    +---------+

:
: pmat1
                    1
    +------------------+
  1 |  0x7f8c07804aa8  |
  2 |  0x7f8c07804aa8  |
    +------------------+

: pmat2
                    1
    +------------------+
  1 |  0x7f8c199b9d78  |
  2 |  0x7f8c0945be08  |
    +------------------+
The reasoning is (I believe) straightforward: pointing to the matrix A is not the same thing as pointing to the (unnamed) matrix A:+0.

I suspect this behavior is well known to the pointer experts among you, but hopefully those new to pointers might learn from the errors I've made.

xtgee how to create a linear graph with confidence interval for the adjusted model

$
0
0
I am running xtgee to assess the association between change in heart dimension (pch_rveda) and time from enrollment (tfe) expecting that the heart will get bigger as time goes by. I like to plot this linear association (test for nonlinearity was not significant) between rveda and tfe using the numbers from adjusted model. I cannot find any command that would do that and my search here was not successful. I apologize if this is very simple question but I have not done this before. I use Stata/SE 16.0.

Thank you

. dataex newid date pch_rveda rveda blrveda blage gender racecat site tfe

----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte newid int date float pch_rveda double rveda float(blrveda blage) byte gender float(racecat site tfe)
1 17195          . 18.89 18.89 31.712526 1 0 1         0
1 17391          .     . 18.89 31.712526 1 0 1  .5366192
1 17643          .     . 18.89 31.712526 1 0 1 1.2265568
1 18077          .     . 18.89 31.712526 1 0 1 2.4147854
1 18281          .     . 18.89 31.712526 1 0 1  2.973307
1 18465  1.9057735 19.25 18.89 31.712526 1 0 1  3.477068
1 18742          .     . 18.89 31.712526 1 0 1 4.2354527
1 18854   23.55744 23.34 18.89 31.712526 1 0 1 4.5420933
1 18938          .     . 18.89 31.712526 1 0 1  4.772074
1 19026          .     . 18.89 31.712526 1 0 1  5.013002
1 19246          .     . 18.89 31.712526 1 0 1  5.615332
1 19260          .     . 18.89 31.712526 1 0 1  5.653662
1 19285  17.151936 22.13 18.89 31.712526 1 0 1  5.722109
1 19296          .     . 18.89 31.712526 1 0 1  5.752222
1 19297          .     . 18.89 31.712526 1 0 1  5.754961
1 19428          .     . 18.89 31.712526 1 0 1  6.113619
1 19456          .     . 18.89 31.712526 1 0 1  6.190279
1 19457  11.964005 21.15 18.89 31.712526 1 0 1  6.193018
1 19486          .     . 18.89 31.712526 1 0 1  6.272417
1 19501          .     . 18.89 31.712526 1 0 1  6.313482
1 19590          .     . 18.89 31.712526 1 0 1  6.557154
1 20177   23.92801 23.41 18.89 31.712526 1 0 1   8.16427
1 21108   3.229225  19.5 18.89 31.712526 1 0 1  10.71321
2 18018          . 22.46 22.46  23.22245 1 0 1         0
2 18037          .     . 22.46  23.22245 1 0 1 .05201912
2 18038          .     . 22.46  23.22245 1 0 1 .05475807
2 18455          .     . 22.46  23.22245 1 0 1 1.1964417
2 18770          .     . 22.46  23.22245 1 0 1 2.0588646
2 19303  13.357084 25.46 22.46  23.22245 1 0 1  3.518139
2 19470          .     . 22.46  23.22245 1 0 1  3.975359
2 20079   19.41229 26.82 22.46  23.22245 1 0 1   5.64271
2 20478  23.152275 27.66 22.46  23.22245 1 0 1  6.735113
2 20541          .     . 22.46  23.22245 1 0 1  6.907598
2 21073          .     . 22.46  23.22245 1 0 1  8.364134
2 21227  28.361536 28.83 22.46  23.22245 1 0 1  8.785763
2 21738   20.21372    27 22.46  23.22245 1 0 1 10.184807
3 15985          . 46.81 46.81   57.1499 2 0 1         0
3 17867          .     . 46.81   57.1499 2 0 1  5.152634
3 18044  -7.199319 43.44 46.81   57.1499 2 0 1  5.637234
3 18045          .     . 46.81   57.1499 2 0 1  5.639973
3 18047          .     . 46.81   57.1499 2 0 1  5.645447
3 18051          .     . 46.81   57.1499 2 0 1  5.656399
3 18154          .     . 46.81   57.1499 2 0 1  5.938396
3 19170          .     . 46.81   57.1499 2 0 1  8.720051
3 19288          .     . 46.81   57.1499 2 0 1  9.043118
3 19928  16.513561 54.54 46.81   57.1499 2 0 1 10.795345
4 18324          . 30.96 30.96  44.72279 2 0 1         0
4 18358          .     . 30.96  44.72279 2 0 1 .09308624
4 18368          .     . 30.96  44.72279 2 0 1 .12046432
4 18784 -18.023252 25.38 30.96  44.72279 2 0 1  1.259411
4 19421  -4.877258 29.45 30.96  44.72279 2 0 1  3.003422
4 19960 -12.855294 26.98 30.96  44.72279 2 0 1  4.479122
4 20576          .     . 30.96  44.72279 2 0 1  6.165638
4 20583  1.1627936 31.32 30.96  44.72279 2 0 1  6.184803
4 20703          .     . 30.96  44.72279 2 0 1  6.513348
4 20898          .     . 30.96  44.72279 2 0 1  7.047226
4 21112          .     . 30.96  44.72279 2 0 1  7.633125
4 21143          .     . 30.96  44.72279 2 0 1  7.717999
4 21507          .     . 30.96  44.72279 2 0 1  8.714577
4 21627          .     . 30.96  44.72279 2 0 1  9.043121
4 21636  -7.073641 28.77 30.96  44.72279 2 0 1   9.06776
4 21693          .     . 30.96  44.72279 2 0 1   9.22382
4 21747          .     . 30.96  44.72279 2 0 1  9.371662
5 17801          . 33.87 33.87  26.38193 1 0 1         0
5 18007          .     . 33.87  26.38193 1 0 1 .56399727
5 18350          .     . 33.87  26.38193 1 0 1 1.5030804
5 18508   -7.82403 31.22 33.87  26.38193 1 0 1 1.9356613
5 18700          .     . 33.87  26.38193 1 0 1 2.4613285
5 18921          .     . 33.87  26.38193 1 0 1  3.066393
5 19015          .     . 33.87  26.38193 1 0 1 3.3237514
5 19106          .     . 33.87  26.38193 1 0 1  3.572897
5 19227          .     . 33.87  26.38193 1 0 1  3.904177
5 19316          .     . 33.87  26.38193 1 0 1 4.1478443
5 19417          .     . 33.87  26.38193 1 0 1  4.424368
5 19498          .     . 33.87  26.38193 1 0 1 4.6461334
5 19499          .     . 33.87  26.38193 1 0 1 4.6488724
5 19589          .     . 33.87  26.38193 1 0 1  4.895279
5 19666          .     . 33.87  26.38193 1 0 1  5.106092
5 19750          .     . 33.87  26.38193 1 0 1  5.336073
5 19757 -4.3105965 32.41 33.87  26.38193 1 0 1  5.355236
5 19933          .     . 33.87  26.38193 1 0 1  5.837099
5 20128          .     . 33.87  26.38193 1 0 1  6.370981
5 20139          .     . 33.87  26.38193 1 0 1  6.401094
5 20310          .     . 33.87  26.38193 1 0 1  6.869268
5 20412          .     . 33.87  26.38193 1 0 1  7.148531
5 20506          .     . 33.87  26.38193 1 0 1  7.405886
5 20524 -14.083257  29.1 33.87  26.38193 1 0 1  7.455168
5 20614          .     . 33.87  26.38193 1 0 1  7.701574
5 20699          .     . 33.87  26.38193 1 0 1  7.934294
5 20716          .     . 33.87  26.38193 1 0 1  7.980837
5 20717          .     . 33.87  26.38193 1 0 1  7.983572
5 20730          .     . 33.87  26.38193 1 0 1  8.019167
5 20779          .     . 33.87  26.38193 1 0 1  8.153322
5 20828          .     . 33.87  26.38193 1 0 1  8.287474
5 20865          .     . 33.87  26.38193 1 0 1  8.388777
5 20888          .     . 33.87  26.38193 1 0 1  8.451746
5 20912          .     . 33.87  26.38193 1 0 1  8.517454
5 21038          .     . 33.87  26.38193 1 0 1  8.862425
5 21115          .     . 33.87  26.38193 1 0 1  9.073236
5 21159          .     . 33.87  26.38193 1 0 1  9.193705
end
format %tdnn/dd/CCYY date
------------------ copy up to and including the previous line ------------------

Listed 100 out of 816 observations
Use the count() option to list more


Here is the results form the full dataset
. xtgee pch_rveda blrveda blage gender racecat site tfe, vce(robust)

Iteration 1: tolerance = 3.4858818
Iteration 2: tolerance = .03297166
Iteration 3: tolerance = .00098127
Iteration 4: tolerance = .00002973
Iteration 5: tolerance = 9.010e-07

GEE population-averaged model Number of obs = 236
Group variable: newid Number of groups = 64
Link: identity Obs per group:
Family: Gaussian min = 1
Correlation: exchangeable avg = 3.7
max = 16
Wald chi2(6) = 20.58
Scale parameter: 300.659 Prob > chi2 = 0.0022

(Std. Err. adjusted for clustering on newid)

Robust
pch_rveda Coef. Std. Err. z P>z [95% Conf. Interval]

blrveda .020357 .2438373 0.08 0.933 -.4575553 .4982694
blage -.2007481 .1445924 -1.39 0.165 -.4841439 .0826477
gender 10.51604 3.931522 2.67 0.007 2.810397 18.22168
racecat -3.87816 6.476657 -0.60 0.549 -16.57217 8.815854
site 2.771269 4.190216 0.66 0.508 -5.441404 10.98394
tfe 1.091036 .3858063 2.83 0.005 .3348693 1.847202
_cons -10.64515 13.29898 -0.80 0.423 -36.71067 15.42037


.

-mi estimate- used for analysis of imputed variables only?

$
0
0
I have used multiple imputations on 2 variables, say a and b (both are independent variables)

Specifically, for some simplified model, there is no inclusion of a and b in that model.

I tried for 2 estimate with and without mi estimate and found that there is minimal difference in the 2 estimates which are shown below:

Code:
 mi estimate: clogit Response $As $Ps, group(N_ID) vce(cluster UniqueID)
Multiple-imputation estimates
Conditional (fixed-effects) logistic regression

                                                Imputations       =          5
                                                Number of obs     =     29,292
                                                Average RVI       =     0.0000
                                                Largest FMI       =     0.0000
DF adjustment:   Large sample                   DF:     min       =   3.37e+59
                                                        avg       =   3.37e+59
                                                        max       =          .
Model F test:       Equal FMI                   F(  11, 5.8e+60)  =     113.38
Within VCE type:       Robust                   Prob > F          =     0.0000

                             (Within VCE adjusted for 549 clusters in UniqueID)
-------------------------------------------------------------------------------
     Response |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
  d_Softdrink |  -2.623978   .2845676    -9.22   0.000     -3.18172   -2.066236
    d_Juice25 |  -1.231593   .1890147    -6.52   0.000    -1.602055    -.861131
   d_FlavMilk |   .4261079   .2758369     1.54   0.122    -.1145225    .9667383
   d_Juice100 |    .058627   .2359539     0.25   0.804    -.4038341     .521088
 d_LowFatMilk |  -.7098718   .2183783    -3.25   0.001    -1.137885   -.2818582
  p_Softdrink |  -.8919193   .1083205    -8.23   0.000    -1.104223   -.6796151
    p_Juice25 |  -.7150817   .0572729   -12.49   0.000    -.8273345    -.602829
   p_FlavMilk |  -.9742113   .0732781   -13.29   0.000    -1.117834   -.8305888
p_BottleWater |  -.3403531   .0318025   -10.70   0.000    -.4026847   -.2780214
   p_Juice100 |  -.7869841   .0553303   -14.22   0.000    -.8954296   -.6785386
 p_LowFatMilk |   -.631621   .0536054   -11.78   0.000    -.7366856   -.5265565
-------------------------------------------------------------------------------
Code:
 clogit Response $As $Ps, group(N_ID) vce(cluster UniqueID)
Iteration 0:   log pseudolikelihood = -13386.653  
Iteration 1:   log pseudolikelihood = -13164.407  
Iteration 2:   log pseudolikelihood = -13163.939  
Iteration 3:   log pseudolikelihood = -13163.939  

Conditional (fixed-effects) logistic regression

                                                Number of obs     =     56,682
                                                Wald chi2(11)     =     796.80
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -13163.939               Pseudo R2         =     0.2223

                              (Std. Err. adjusted for 549 clusters in UniqueID)
-------------------------------------------------------------------------------
              |               Robust
     Response |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
  d_Softdrink |  -2.831667   .3159078    -8.96   0.000    -3.450835   -2.212499
    d_Juice25 |  -1.307012   .2572536    -5.08   0.000    -1.811219   -.8028038
   d_FlavMilk |   .4484063   .3388518     1.32   0.186     -.215731    1.112543
   d_Juice100 |  -.2139086   .3353719    -0.64   0.524    -.8712255    .4434084
 d_LowFatMilk |   -.971099   .2941376    -3.30   0.001    -1.547598   -.3945998
  p_Softdrink |  -.8375092   .0893627    -9.37   0.000    -1.012657   -.6623616
    p_Juice25 |  -.6484222   .0764521    -8.48   0.000    -.7982655   -.4985789
   p_FlavMilk |  -.9374873   .0865078   -10.84   0.000    -1.107039   -.7679352
p_BottleWater |  -.3049194   .0426047    -7.16   0.000    -.3884231   -.2214156
   p_Juice100 |  -.6535157   .0789469    -8.28   0.000    -.8082488   -.4987827
 p_LowFatMilk |  -.5586918   .0708329    -7.89   0.000    -.6975217   -.4198619
-------------------------------------------------------------------------------
Is it true that mi estimate should be used only when I run analysis which includes a and/or b? Should I just need to estimate with estimation commands such as clogit, clogithet, logit, etc. without having mi estimate: clogit ... for estimation excluding a and b? The reason why I am confused is that after multiple imputation, I see datasets were imputed for all variables, not just a and b, but it keeps the missing proportion unchanged for the ones as in the original dataset.

Thanks!

Local macro in &quot;forvalues&quot; function

$
0
0
My first post here, but have used this forum extensively in the past to trouble shoot. However, can't find a solution to my stata problem, if there is one.

I have a series of variables from DX1, DX2, DX3 ... DX25.

Variables DX1 - DX25 have a diagnostic code, which is a string value.

I'm trying to recreate a new variable that searches DX1-25, and pulls the diagnostic codes I am interested in.

Write now, I have this code:

gen AKI=0
forvalues j=1/25{
replace AKI=1 if inlist(DX`j', "5845", "5846", "5847", "5848", "5849")
}

But the problem is, for the inlist function, I can't include more than 5-10 string variables in the "inlist" function. For some new variables I am trying to generate, it will be based on 100s of diagnostic codes. So I thought if I use a macro with my string variables predefined, then I could avoid the 5-10 limit.

I've tried creating the following macro and incorporating it into my loop:

local aki "5845" "5846" "5847" "5848" "5849"

gen AKI=0

forvalues j=1/25{
replace AKI=1 if inlist(DX`j', "`x'")
}

And it doesn't work. The result is "0 real changes made"

Any help you might have would be tremendously appreciated. Thank you.

Identify excluded observations from mixed

$
0
0
I'm working with longitudinal panel data and running models using the mixed command. In prepping the data, I excluded individuals who did not complete at least two time points of data collection (each person should've done 2 or 3 time points, not 0 or 1). Further, I thought I cleaned so subjs missing the covariates of interest were also dropped.

However, when I run my mixed model I get:
Code:
Number of observations = 7,499 
Number of groups = 3,290 

Obs per group: 
min = 1 
avg = 2.3 
max = 3
I would've expected it to be 3,290 "groups" (aka subjects) but then min of 2 observations per group (per person) and max of 3.

Is there a simple way I can identify (in this relatively large dataset) for which subjects only 1 of the time points was being included in the model?


reshape or stack?

$
0
0
Dear All, I have this data set:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int year long(cn au ca de)
2001  151741 31187  43187 33906
2002  183919 33752  49119 33979
2003  163873 26612  34369 28577
2004  190235 41981  50518 37242
2005  198897 46394  54464 39821
2006  271657 42549  53821 38764
2007  268654 51762  57179 39529
2008  329204 58199  60236 40309
2009  972123 57147  60138 39533
2010 1630735 62254  64739 42446
2011 1784185 60067  67545 44644
2012 2586428 63597  70614 45054
2013 2874702 65777  72693 46533
2014 3987152 78674  88601 52507
2015 4184102 76122  90666 54954
2016 3511734 82361 106197 59798
2017 2732549 90892 117687 65983
end
I wish to reshape this dataset into the long (standard) format of panel data. I can use -xpose- and then -reshape- command to this end (please see below), but wonder if there is a simpler way to do so.
Code:
xpose, clear v

drop in 1
ren _varname code

reshape long v, i(code) j(year)
replace year = year+2000
ren v tourist

How to evaluate the appropriateness of instruments with xtivreg?

$
0
0
I need to estimate an instrumental variable model with GLS random-effects (RE) model: xtivreg, re

The problem: this command does not provide the Kleibergen-Paap and Hansen J statistics that xtivreg2 provides.

PS: xtivreg2 does not estimate RE models.

Therefore, is there some way to obtain these statistics?

What are the postestimation options to derive useful evaluation statistics for xtivreg?

Thank you

How to build a logic to test the validity of observations in Stata?

$
0
0
Hi Statalisters,

I am new to Stata and have some questions about data management.

In the following sample, I have a data with three variables, CONM, DATE, and SALES. What I want to do is to build a logic that can detect the validity of the observations. That is, for each company in each year, its smallest sales must take up at least 10% of the total sales.
For example, for company 2U INC, in 2012, its total sales is 43.6+8.4=52.0, if the smallest sales 8.4 takes up at least 10% of the total sales, 52.0, we can say the result is "valid" (can be denote as dummy variable 1 or anything else). For the same company 2U INC, in 2015, its total sales is 65.2+23.8+17.6=106.6, and the smallest sales 17.6 still takes up more than 10% of the total sales, the result is "valid".
The detect procedure should be categorized based on both company and date.

Can Stata work this logic out properly? Is there any good suggestions about this issue?

Advanced big thanks here.


CONM DATE SALES
2U INC 12/31/2012 43.586
2U INC 12/31/2012 8.382
2U INC 12/31/2013 13.3
2U INC 12/31/2013 57.358
2U INC 12/31/2014 60.631
2U INC 12/31/2014 15.433
2U INC 12/31/2015 65.2
2U INC 12/31/2015 23.8
2U INC 12/31/2015 17.6
2U INC 12/31/2016 22.1
2U INC 12/31/2016 36.7
2U INC 12/31/2016 71
2U INC 12/31/2017 28.3
2U INC 12/31/2017 77.6
2U INC 12/31/2017 48.2
2U INC 12/31/2018 54.2
2U INC 12/31/2018 42.7
2U INC 12/31/2018 86.9
360 SECURITY TECHNOLOGY INC 12/31/2010 12.167
4LICENSING CORP 12/31/2010 5.646
4LICENSING CORP 12/31/2011 5.556
4LICENSING CORP 12/31/2012 2.494
4LICENSING CORP 12/31/2012 0.333
6D GLOBAL TECHNOLOGIES 12/31/2010 6.91
6D GLOBAL TECHNOLOGIES 12/31/2010 6.687
6D GLOBAL TECHNOLOGIES 12/31/2010 3.344
6D GLOBAL TECHNOLOGIES 12/31/2010 2.675
6D GLOBAL TECHNOLOGIES 12/31/2011 3.369
6D GLOBAL TECHNOLOGIES 12/31/2011 2.305
6D GLOBAL TECHNOLOGIES 12/31/2011 3.724
6D GLOBAL TECHNOLOGIES 12/31/2011 3.724
6D GLOBAL TECHNOLOGIES 12/31/2012 2.073
6D GLOBAL TECHNOLOGIES 12/31/2013 2.122
6D GLOBAL TECHNOLOGIES 12/31/2013 1.061
6D GLOBAL TECHNOLOGIES 12/31/2013 0.299
6D GLOBAL TECHNOLOGIES 12/31/2013 1.061

Grouping values within a variable?

$
0
0
In a prior post I looked at https://www.statalist.org/forums/for...hin-a-variable I tried to derive an answer, but no response, so I thought I create a new post. I am working with panel data that is based on observations at the county level. I want to run a regression analysis but need to create control and treatment groups based on whether states expanded Medicaid during the 2010 health care reform or did not. I have 37 states adopted medicaid and 14 did not adopt Medicaid.

I have a variable statenameabr (state name abbreviation) which I changed from string to numeric. In the prior post, it was recommend to use the -inlist- command. I thought I would use that as well since the -inlist- command can 2 and 255 arguments for reals but limited for string var.

I suggested the following and not sure if that would work.

Code:
encode statenameabr, gen (statenameabr1)
I am proposing the following to generate a 0/1 dummy for statemcaidexp where 1 = 37 states that expanded medicaid and 0 = 14 states that did not expand.

Code:
gen statemedicaid = 1 if inlist (statenameabr1, "AL", "AS", "AZ", "CA",....)

My end goal is:
1. I can look at the distribution of education, income, insurance status of people in states that have expanded medicaid compare to those that did not. I am thinking once I create the statemcaidexp variable i should be able to do that.
2. Of the states in each group a further stratification of those to create a control and treatment group within each of the groups created in #1.

Here are some of the variables I have in my data set. For goal #2, I would use the variable -ever_had_fqhc- and -wanted-
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte year float(n_county wanted ever_had_fqhc) str2 statenameabr
15 1 1 1 "AL"
18 1 1 1 "AL"
17 1 1 1 "AL"
16 1 1 1 "AL"
 8 1 1 1 "AL"
13 1 1 1 "AL"
 6 1 1 1 "AL"
14 1 1 1 "AL"
12 1 1 1 "AL"
10 1 1 1 "AL"
 4 1 1 1 "AL"
11 1 1 1 "AL"
17 2 1 1 "AL"
18 2 1 1 "AL"
16 2 1 1 "AL"
15 2 1 1 "AL"
 6 2 1 1 "AL"
12 2 1 1 "AL"
 4 2 1 1 "AL"
11 2 1 1 "AL"
14 2 1 1 "AL"
 8 2 1 1 "AL"
10 2 1 1 "AL"
13 2 1 1 "AL"
17 3 1 1 "AL"
15 3 1 1 "AL"
18 3 1 1 "AL"
16 3 1 1 "AL"
13 3 1 1 "AL"
 8 3 1 1 "AL"
 6 3 1 1 "AL"
14 3 1 1 "AL"
11 3 1 1 "AL"
12 3 1 1 "AL"
10 3 1 1 "AL"
 4 3 1 1 "AL"
17 4 1 1 "AL"
18 4 1 1 "AL"
15 4 1 1 "AL"
16 4 1 1 "AL"
 4 4 1 1 "AL"
 8 4 1 1 "AL"
14 4 1 1 "AL"
 6 4 1 1 "AL"
10 4 1 1 "AL"
12 4 1 1 "AL"
13 4 1 1 "AL"
11 4 1 1 "AL"
18 5 1 1 "AL"
17 5 1 1 "AL"
15 5 1 1 "AL"
16 5 1 1 "AL"
10 5 1 1 "AL"
12 5 1 1 "AL"
13 5 1 1 "AL"
 4 5 1 1 "AL"
11 5 1 1 "AL"
14 5 1 1 "AL"
 6 5 1 1 "AL"
 8 5 1 1 "AL"
15 6 1 1 "AL"
17 6 1 1 "AL"
16 6 1 1 "AL"
18 6 1 1 "AL"
13 6 1 1 "AL"
 6 6 1 1 "AL"
 8 6 1 1 "AL"
14 6 1 1 "AL"
11 6 1 1 "AL"
10 6 1 1 "AL"
 4 6 1 1 "AL"
12 6 1 1 "AL"
17 7 1 1 "AL"
16 7 1 1 "AL"
18 7 1 1 "AL"
15 7 1 1 "AL"
12 7 1 1 "AL"
14 7 1 1 "AL"
13 7 1 1 "AL"
10 7 1 1 "AL"
 8 7 1 1 "AL"
 4 7 1 1 "AL"
 6 7 1 1 "AL"
11 7 1 1 "AL"
17 8 1 1 "AL"
18 8 1 1 "AL"
15 8 1 1 "AL"
16 8 1 1 "AL"
 8 8 1 1 "AL"
11 8 1 1 "AL"
14 8 1 1 "AL"
12 8 1 1 "AL"
13 8 1 1 "AL"
 6 8 1 1 "AL"
10 8 1 1 "AL"
 4 8 1 1 "AL"
15 9 1 1 "AL"
16 9 1 1 "AL"
17 9 1 1 "AL"
18 9 1 1 "AL"
end
label values n_county n_county
label def n_county 1 "Alabama Autauga", modify
label def n_county 2 "Alabama Baldwin", modify
label def n_county 3 "Alabama Barbour", modify
label def n_county 4 "Alabama Bibb", modify
label def n_county 5 "Alabama Blount", modify
label def n_county 6 "Alabama Bullock", modify
label def n_county 7 "Alabama Butler", modify
label def n_county 8 "Alabama Calhoun", modify
label def n_county 9 "Alabama Chambers", modify
label var year "Reshaped Year variable panel data"
label var n_county "group(statename countyname)"
label var wanted "1 if n_county started out with 0 FQHCs and then got 1 at some point in time and "
label var ever_had_fqhc "1= for all obs of any n_county that has had 1 or more FQHC's at any point in tim"
label var statenameabr "State Name Abbreviation"

New to statalist

$
0
0
Hi All,

This is my first post in statalist. Currently I am using panel data set for my study. I have district level data for five years (3 Districts and each district has data from 2014 to 2018). I set my xtset command as follows,

xtset District Year

But I got an error called,

repeated time values within panel
r(451);

Please be kind enough to give your comments to solve this problem.

Thank you
Viewing all 72868 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>