Quantcast
Channel: Statalist
Viewing all 72811 articles
Browse latest View live

Formatting table using estout

$
0
0
I am using the estout command to prepare my tables in Latex. Since I have lot of tables and I change them ofthen, I would like to automate everything so I minimize manual edit. Here is an example of the command I am using:

estout model* using "C:\Users\asmobari\Box Sync\Projects\MP_Project\WriteUp\tabfig_MP\model1. tex", ///
keep(t post post_T _cons) varlabels(post Post t Treatment post_T DiD _cons Constant) ///
cells(b(star fmt(2)) se(par fmt(2))) stats(r2 N, labels("R-squared""Obs")) ///
numbers collabels(none) mlabels(,titles) ///
style(tex) replace

This produces me the following Latex file:

& (1) & (2) & (3) & (4) & (5) & (6) & (7) & (8) & (9) \\
& Male & Female &All Students & Male & Female &All Students & Male & Female &All Students \\
Treatment & 0.35 & -5.11 & 0.05 & 21.10* & 19.09 & 26.07* & -3.20 & 1.64 & -1.75 \\
& (8.16) & (14.72) & (7.33) & (9.87) & (16.02) & (9.62) & (8.37) & (4.46) & (6.68) \\
Post & -8.29 & -10.94 & -7.35 & 19.15 & 21.87 & 22.84 & -10.82 & 4.94 & -5.94 \\
& (7.66) & (14.41) & (6.51) & (12.62) & (16.14) & (11.98) & (5.83) & (4.27) & (5.49) \\
DiD & 7.81 & 0.95 & 3.68 & -14.75 & -29.20 & -27.56 & 6.88 & 5.31 & 0.20 \\
& (11.41) & (18.35) & (10.45) & (13.89) & (22.02) & (14.02) & (9.41) & (8.85) & (10.01) \\
Constant & 195.25***& 187.05***& 190.88***& 67.82***& 60.15***& 62.49***& 40.92***& 22.63** & 52.96***\\
& (5.92) & (12.15) & (5.48) & (8.98) & (10.84) & (8.55) & (6.51) & (6.21) & (5.98) \\
R-squared & 0.02 & 0.04 & 0.02 & 0.16 & 0.06 & 0.14 & 0.03 & 0.04 & 0.02 \\
Obs & 59.00 & 51.00 & 71.00 & 59.00 & 51.00 & 71.00 & 59.00 & 51.00 & 71.00 \\ However, I want to have horizontal line between the Constant and R-squared and a line space between R-squared and Obs. Manually I can do it as bellow:
& (1) & (2) & (3) & (4) & (5) & (6) & (7) & (8) & (9) \\
& Male & Female &All Students & Male & Female &All Students & Male & Female &All Students \\
Treatment & 0.35 & -5.11 & 0.05 & 21.10* & 19.09 & 26.07* & -3.20 & 1.64 & -1.75 \\
& (8.16) & (14.72) & (7.33) & (9.87) & (16.02) & (9.62) & (8.37) & (4.46) & (6.68) \\
Post & -8.29 & -10.94 & -7.35 & 19.15 & 21.87 & 22.84 & -10.82 & 4.94 & -5.94 \\
& (7.66) & (14.41) & (6.51) & (12.62) & (16.14) & (11.98) & (5.83) & (4.27) & (5.49) \\
DiD & 7.81 & 0.95 & 3.68 & -14.75 & -29.20 & -27.56 & 6.88 & 5.31 & 0.20 \\
& (11.41) & (18.35) & (10.45) & (13.89) & (22.02) & (14.02) & (9.41) & (8.85) & (10.01) \\
Constant & 195.25***& 187.05***& 190.88***& 67.82***& 60.15***& 62.49***& 40.92***& 22.63** & 52.96***\\
& (5.92) & (12.15) & (5.48) & (8.98) & (10.84) & (8.55) & (6.51) & (6.21) & (5.98) \\
\midrule
R-squared & 0.02 & 0.04 & 0.02 & 0.16 & 0.06 & 0.14 & 0.03 & 0.04 & 0.02 \\
\addlinespace
Obs & 59.00 & 51.00 & 71.00 & 59.00 & 51.00 & 71.00 & 59.00 & 51.00 & 71.00 \\
But I wonder if I can give this instruction in my stata code so I don't have to open my latex file each time I change my regression. I would be glad if somebody could help me with it. Thanks in advance!


Sensitivity and specificity

$
0
0
Dear Statalister,

Currently, I am working with a simple dataset but I am not sure whether the basic coding is correct. If I run the commands below, will the values listed as sensitivity and specificity be the actual sensitivity and specificity values? Or do I have to interchange the 0 and 1 values in the foreign variable?

Best
Ian


* Dataset
Code:
  
webuse auto.dta, clear
* Logistic regression command
Code:
  
logit foreign price, vce(cluster make)
HTML Code:
. * Logistic regression
. logit foreign price, vce(cluster make)

Iteration 0:   log pseudolikelihood =  -45.03321  
Iteration 1:   log pseudolikelihood = -44.947363  
Iteration 2:   log pseudolikelihood =  -44.94724  
Iteration 3:   log pseudolikelihood =  -44.94724  

Logistic regression                             Number of obs     =         74
                                                Wald chi2(1)      =       0.20
                                                Prob > chi2       =     0.6558
Log pseudolikelihood =  -44.94724               Pseudo R2         =     0.0019

                                  (Std. Err. adjusted for 74 clusters in make)
------------------------------------------------------------------------------
             |               Robust
     foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       price |   .0000353   .0000791     0.45   0.656    -.0001198    .0001904
       _cons |  -1.079792   .5537941    -1.95   0.051    -2.165209    .0056241
------------------------------------------------------------------------------

* Determination of Youdens index
Code:
       * Sensitivity and specificity graphs
            lsens , gensens(sens_price) genspec(spec_price)
       * Variable for Youden Index
            gen float youden_price = sens_price + spec_price-1
       * Youden index determination
            egen float max_price = max(youden_price)
       * Cut-off value, sensitivity and specificity
            list sens_price spec_price c.price ///
            if youden_price == max_price
HTML Code:
     +-----------------------------+
     | sens_p~e   spec_p~e   price |
     |-----------------------------|
 32. | 0.590909   0.653846   5,379 |
     +-----------------------------+




savespss (from SSC, by Sergiy Radyakin)

$
0
0
I'm thinking of using Sergiy Radyakin's -savespss- program as my workhorse for a project I do with some other people who use only SPSS. I do a lot of work with this group and it's important that the transfer of my output to SPSS be seamless for them.

In the past I have used StatTransfer for this purpose, but currently they will not allow a single-user license to be used over a VPN connected to a server that is connected to a large network.

I've tested -savespss- out briefly on a couple of typical data sets, and the only problem I have encountered is that Stata dates are automatically converted to datetime variables when exported to SPSS. That's something I can work around and live with. But I'm wondering if any Forum members have experience with this program and can advise me of any quirks or problems I should be aware of. I'd appreciate any advice.

(I've read the thread at https://www.statalist.org/forums/for...ata-convertion which pointed out one problem that got resolved quickly. But that thread is a couple of years old, and so I'm wondering what else has turned up.)

Added: I'm also aware that SPSS does not support strLs, but we don't use those in this work anyway.

xtpoisson with fixed effects

$
0
0
Hello all,

I have a panel dataset of 3 million observations, with each observation detailing the annual number of prescriptions and patients for a physician for a given year for certain classes of medications. The years of available data are 2013-2017. There is also some physician demographic information including gender, specialty, years in practice, state, and number of group practice members.
For example, an observation may say "Dr. John Smith, year 2013, M, orthopedic surgery, 100 opioid prescriptions, 200 patients". The next row may say "Dr. John Smith, year 2014, M, orthopedic surgery, 80 opioid prescriptions, 170 patients". The code I used to define the panel data is

Code:
xtset npi year
(where npi is the provider id)

I am interested in using the xtpoisson command to look at the within provider change in opioid prescribing over time, with number of patients (bene_count) seen per provider per year as the exposure. The code that I have used is:

Code:
xtpoisson opioid_claim_count i.year i.provider_sex i.provider_state i.specialty i.years_experience i.group_members, exposure(bene_count) fe vce(bootstrap, reps(200)) irr 

Code:
opioid_claim_count |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
              year |
             2014  |   .9545114   .0002171  -204.70   0.000      .954086     .954937
             2015  |   .8897576   .0002035  -510.75   0.000     .8893589    .8901565
             2016  |   .8511887   .0001957  -700.96   0.000     .8508053    .8515723
             2017  |   .7981606   .0001862  -966.50   0.000     .7977958    .7985256

    ln(bene_count) |          1  (exposure)
------------------------------------------------------------------------------------
For the purpose of brevity, I did not copy and paste the results from the categorical variables, although all were omitted from the regression (except for provider_state) as they are time invariant (i.e. gender, specialty, etc).

I just wanted to confirm a few things:

1) Is it OK to use year as the main independent variable with total patients per year seen (bene_count) as the exposure to look at within provider change in opioid prescribing (dependent variable) each year?

2) If so, is the correct interpretation to say something like "opioid prescriptions relative to total number of patients decreased by ~ 5% each year per provider"

3) Is there a good way to use the margins command to get an accurate number of opioid prescriptions per provider? I've tried using the following code after the xtpoisson:

Code:
 
margins year, atmeans

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        year |
       2013  |   4.801408   .0134106   358.03   0.000     4.775124    4.827692
       2014  |   4.754852    .013413   354.50   0.000     4.728563    4.781141
       2015  |   4.684602    .013413   349.26   0.000     4.658313    4.710891
       2016  |   4.640287    .013413   345.95   0.000     4.613998    4.666576
       2017  |   4.575963   .0134132   341.15   0.000     4.549673    4.602252
------------------------------------------------------------------------------
But I'm not sure what to do with these margin numbers nor how to interpret them. Are they ln transformed? I've read some other threads on this forum that using margins after fixed effects to calculate a dependent count variable can be somewhat inaccurate. I'm happy to just use the IRR if this is the case.

Thank you!

Venkat

Goodness of fit indices unavailable when using SEM with vce(robust), what's the reason?

$
0
0
Hello All,

I was able to find several posts asking why we can't get goodness of fit indices (RMSEA, CFI, TLI...) when use vce(robust) in SEM models. However, I haven't been able to find conclusive information on this topic and I am wondering if there any recent developments on this or a work around to get those fit indices. I have to use vce(robust) because my variables are not normally distributed, but I also need the fit indices to justify why I am using certain models.

Here are some posts that I found on the topic:

https://www.stata.com/statalist/arch.../msg00800.html
https://www.statalist.org/forums/for...of-fit-indices
https://www.statalist.org/forums/for...uster-clustvar

Thank you for your time and assistance with this,

Patrick

Minimum distance function / Stochastic Frontier Analysis

$
0
0
Hi there: Does anyone know how to estimate a minimum distance function in Stata? I have the parameters and the covariance matrix from an unrestricted production Stochastic Frontier function (one output and two inputs) and would like to estimate the following:
Br = argmin(Br - Bu)*IM*(Br - Bu) subject to fi(xi, Br) >= 0 for i=1,2 ; with IM the inverse of the cov matrix, Bu the Betas from the unrestricted model, Br the Betas from the restricted model and fi the derivative of the (frontier) production function (with respect to each of my two inputs). How can I to do this in Stata ?
Thank you all,
Juan

Calculating month between two dates

$
0
0
Dear All,
May data set contains a variable named month1 for the first month and year1 for the first year and month2 for the second month and year2 for the second year. there is no specific day on the month. In this case, how do I generate a new variable which contains both month and year and how can I calculate the month difference between two dates?

Thank you for your help

Modelling longitudinal data

$
0
0
Hello members,
pleaseRegression for follow-up studies i have a data on follow up study, data was collected for the outcome variables BMI and covariates for baseline, follow-up one and follow-up two, please what is the best way to model BMI change from this data.
Thanks in anticipation for your response

Modelling dose with multiple balancing score. Imbens (2000)

$
0
0
Good day statlisters , please I am interested in applying propensity using modelling dose of treatment Imbens (2000) my treatment variable is education having four categories No Edu, Primary Edu, Secondary Edu and Higher Edu. I have tried using teteffet ipw but I am not convinced I have done the right thing
pls I need a step by step procedures to executing this in Stata and also I need to know the important output that I need to report and how to report them.
Thanks in anticipation for your kind response.

couting unique observation

$
0
0
id sales profit year size_group
a 36 9 1991 1
a 48 17 1992 1
a 25 7 1993 2
b 65 18 1991 1
b 30 8 1992 2
b 45 20 1993 1
Dear all
I have the above panel dataset for demonstration purpose and I would like to count the unique/ distinct ids in the size_group==1, during the year 1991. For example, in the year 1991, there are 2 unique ids(a,b) that comes under size_group==1.
I tried
Code:
count if size_group==1
But it is not what I really want. What should be the command?
Any help in this regard will be highly helpful

Create consecutive days with unequal number of obs per day

$
0
0
Dear Stata folks,

I have a dataset with patient_ids, days and a measurement(outcome). Patients may have more than one measurements per day. The number is unequal between patients and days. Let's say that it is the BP measurements.
The days are in %td format but not consecutive, since some patients may non have measured their BP a day and then measure it the next days. Patients are advised to do these measurements for 28 days but some did it for less and few others did so for more.
I want to create a consecutive day variable so as to have 0 measurements when they have not measured their BP, instead of having only the measurements for the days that they did measure.
Any help please? Whatever I saw in different posts doesn't work for my case.
Thank you in advance.

Issue with xtologit (non concave routine)

$
0
0
Hallo everybody
I ask for your help to manage an issue as described in the object of this post.
I am using xtologit command to perform some estimations on a really wide dataset. I have a dummy that split the sample in 2 periods (before and after the crisis)... Now, when I run the command using the whole sample, everything works fine... When I use only the crisis sample, everything works fine (too). However, when I use only the pre-crisis period, the "non concave" routine start; yesterday I didn't manage to find a final iteration. I solved stopping the program. I tried also using just a couple of covariates and the issue remains. Why can I complete the task with a huge (complete) dataset and not using just an half of it?
Thanks

Panel specification- choosing the best model

$
0
0
Hi,
I try to model panel data with N=28 and T=15 (tsset number year). I have one binary variable in my explanatory variables. More, I want to add time dependency.
1) What is better- using time dummies (i.year) or time trend? If time trend, is it enough to add only variable year which can be treated as trend variable?
2) I try: a) pooled regression b)xtreg, fe c)xtreg, re d)xtgls. Is it good to add time dummies or trend in xtgls?
3) My data has heteroskedasticity, serial correlation and cross-correlation. Which method is better- xtreg, vce(robust) or xtgls? In xtgls how can I know which options choose corr(ar1) or corr(psar1)?

Thanks!

How do I interpret a log level and log log model when my independent variable is already a percentage?

$
0
0
Hi everyone

I have a regression in which I would like to estimate the effect of migrant share (from 0% to 100%) on years of schooling. Because of skewness, I transformed migrant share to log. As my explanatory variable is already a percentage, I do not know how to interpret the coefficient. And I would like to know if someone could help me with that.

Regression: reg yearsofschooling ln(migrantshare)
The beta (coefficient) that I get is 6.8.

Finally, how would it change if I use a log-log model? reg ln(yearsofschooling) ln(migrantshare)
The coefficient, in this case, is 1.38


Thanks a lot!!!

Estimate dynamic discrete choice (2 periods)

$
0
0
Hello,

I need to estimate a 2 period choices model. I understand the basic idea of estimating choice model when there are 2 periods after reading the book "Discrete Choice Methods with Simulation" by Kenneth E. Train, but I have no idea how to proceed it in Stata.
I tried to searched on youtube and google but could not find anywhere instructing to do dynamic discrete choice model in STATA.

Anyone knows where could I find good instructions on estimating dynamic discrete choice models in STATA?

Thanks

Plotting marginal effects, confidence intervals after bootstrap

$
0
0
Dear listers,

I have searched the manual and dug through the list but have come up empty. So apologies if this is either easy or obvious.

Here is the problem. I am estimating a model using bootstrap resampling. After estimation I want to graph the marginal effect and the bias-corrected confidence interval across values of my variable of interest, diff, in the model below

Here is the code



capture program drop strap
program strap, rclass

**Step 1: Estimate difference
reg lnsmooth i.year##c.(lndist contig smct comlang_off colony) L10.joint_migration lntotalpop,
predict yhat2
reg lnsmooth i.year##c.(lndist contig smct comlang_off colony) lntotalpop,
predict yhat1

gen diff=yhat2-yhat1

**diff is my variable of interest

areg prio l.(jointdem2 capratio1 joint_migration diff lntotalpop ideal_diff lngdp_diff ) if politically_relevant==1 , abs(dyadnum)


drop diff yhat1 yhat2

exit
end


gen newdyad=dyadnum

tsset newdyad year


#delimit ;

bootstrap coeff=
r(PPdiff) , reps(25) seed(12345) cluster(dyadnum) idcluster(newdyad) nodrop:strap saving(ci,replace) ;

#delimit cr

estat bootstrap

**********************************************
estat gives me the estimated coefficient of diff and the bias corrected standard error. What I want to do now is graph the estimate (yes, it is linear) and the confidence (which is not) across values of diff. Everything I have tried has failed. Any ideas?
Thanks,

David



Sampling in a cross-sectional stepped-wedge

$
0
0
Hi all,

I am aiming to simulate stepped-wedged data for a cross-sectional design (new set of individuals are sampled from each cluster at each step or cross-over point) according to the following model with a continuous outcome:

Yijk = b0 + bj + dXijk + ui + eijk

Let Yijk denote the continuous outcome with clusters i = 1,…,I containing individual j = 1,...J. Define Xijk as the intervention indicator wherein Xijk = 1 if cluster k is receiving the intervention and Xijk = 0 if cluster k is in the control condition. b0 is defined as the fixed effect intercept, bj fixed effect for time as a covariate, and dfixed intervention effect. Cluster-level random intercepts were assumed to follow a normal distribution with a mean of zero and known variance ui ~ N (0, sd). Finally, residuals are normally distributed as well eijkN(0, sd).

How would I sample a new set of individuals from each cluster at each timepoint?

Any help is much appreciated.

Jack.

xtgls options in Stata

$
0
0
Hi,

Does someone know when should we use corr(ar1) or corr(psar1)? Any example from literature? I try to find something, I read a lot of papers but I dont see any explanation for using ar1 or psar1. Stata help is not helpful.

Thanks!

Keep if or Drop If

$
0
0
Hello,

I am trying to select a couple of cases in a dataset - and perform a set of operations/changes on those cases. In SPSS, I can use SELECT IF to temporarily restrict operations to cases fiting that criteria. After saving the operations, I can remove the restrictions to work on the full dataset.

In STATA, I think I can temporarily use KEEP IF to temporarily restrict operations to data satisfying the IF condition. But once I am done with operations on those temporarily selected cases, how can I remove the restrictions and save changes to the full dataset?


Thanks for any advance - cY

VIF test of multi-collinearity

$
0
0
Dear list members,

after running some -ivregress2 2sls- and -ivprobit- regressions, I am unsure which specifications have excessive multi-collinearity. I understand that running the -vif- command after the regressions should tell me but I am still unsure about the following:

(1) Am I right in assuming that any regressor that gets a VIF>10 should be dropped, including if it gets a value of just say 10.50?

(2) Other than after -reg-, after both -ivregress2 2sls- and after -ivprobit- I get the error that I cannot use -vif- but only -vif, uncentered-, which compared to the simple -vif- after -reg- seems to always yield higher estimates. Am I right in assuming that then indeed the uncentered version is appropriate and I need to drop any regressor yielding a VIF>10 there?

(3) I tried to export the VIF results into -outreg2- with the option -addstat(VIF,e(vif))- but got a syntax error and did not find a more suitable syntax for this in the help file examples. How do I instead get the VIF results to be displayed directly in my Excel regression table?

Thank you so much!
PM
Viewing all 72811 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>