Quantcast
Channel: Statalist
Viewing all 72832 articles
Browse latest View live

Missing Wald Chi Squared in multilevel model

$
0
0
Dear Statalist,

I am encountering a missing Wald Chi Squared statistic problem in a multilevel logit model running Stata 14. The data is cross-sectional, with respondents (500-1003 obs per group) nested in 9 countries. The dependent variable is binary and I am attempting to run melogit command of the form:
Code:
melogit y x1- x18 [pweight=weightvar] || country: , or
I was wondering if the problem had to do with my degrees of freedom, due to my 18 variables and only 9 country clusters. Something like what Clyde Schechter pointed out here with clustered errors here: https://www.statalist.org/forums/for...ssing-wald-chi

However, the problem appears to come more directly from the use of sampling weights, which are calculated based upon national demographic data from the 9 countries. When I run the model without the weights, I get the Wald chi2 & prob>chi2 test statistics with the 18 variables and 9 country clusters. Once the weights are included, these statistics do not display, but the model converges and I get odds ratios in the resulting table. Also, the LR test vs. logistic model is significant, indicating that a multilevel model is warranted for the unweighted model.

Any diagnostic thoughts would be great and advice is obviously welcome.

Sincerely,

Eric

Compile / Export all table values form -estat phtest, detail-

$
0
0
Dear all,

I am running several Cox PH survival models and use Schoenfeld residuals as a first layer of tests for the proportional-hazard assumption. In the tests, I am not only interested in the global test results, but would like to report all values of each individual variable. The Problem: As I am running at least one dozen tests for several models I would like to / will need to compile and export all tables in to *.tex files automaticially to stay efficient (just like in compling regression tables via -estout-, say). However, it seems like I can not access these values from Stata's storage (see -ereturn list- in the code below).

Question: Is there a solution to fetch all values reported in -estat phtest, detail- and compile / export then into a (neat) table, at best in .tex format?

Here a sample code and data set for my issue:

Code:
version 14

webuse drugtr

stcox drug age

estat phtest, detail

ereturn list // see for info

// compile entire table to *.tex
Thanks a lot for your help in advance,
Michael

Estimate choice model - Data issue

$
0
0
Hello,

I am trying to estimate the schooling decision in a 2-period model. In t=1, student decide whether to go to high school. In t=2, decide whether to go to college.
In the data set, there is one variable "school" which = 1 if student go to high school/college; and =0 if not for period 1; and =1 if student go to college, = 0 if not for period 2. Another variable is "period", which equals 1 for period 1, and equals 2 for period 2.
There are also other variables describing the "qualities" of students.
I need to estimate the utility from going to high school in period 1. But the problem is that the data set is not separated for two periods.
Then if I do regression: "logit school .....", then it will mess up both periods.

How can I estimate each period separately?

If any one has an idea, plz kindly let me know. Thanks a lot!

Using Suest to test coefficients across ARIMA models

$
0
0
Hello,
I am running several ARIMA models and I want to test the equivalence of the coefficients with the test command. However, I get the following error:
first was estimated with a nonstandard vce (opg) where "first" is the name of my first model. To be clear, here is the code I am running.


quietly arima lmp d1.lp if fp==1, arima(1,0,2)
estimates store first

quietly arima lmp d1.lp if sp==1, arima(2,0,0)
estimates store second

quietly arima lmp d1.lp if tp==1, arima(1,0,0)
estimates store third

quietly arima lmp d1.lp, arima(1,0,2)
estimates store whole

suest first second third whole

Thanks....Any substitutes if it doesn't work?
Joshua

Determining lag length in a panel dataset.

$
0
0
Hi all,

I'm using -xtscc- command to estimate a macroeconomic mode. This dataset set has 133 countries and 33 years (unbalanced panel).
My question pertains to lag length used in a fixed effect model.

Variables: dlpccarb: log of per capita carbon emissions, dlrgdp: log of per capita GDP, dlpopden: log of population density, frleg: institutional integrity, fr_lrgdp: interaction between institutional integrity and gdp, lfoss: log of fossil fuels, renew: renewables (% of eneregy consumption)

Code:
xtscc dlpccarb L.dlpccarb dlrgdp dlrgdp2 D.frleg D.frleg2 dlpopden  D.fr_lrgdp D.fr2_lrgdp2 D.renew D.lfoss period*, fe lag(7)
xtscc dlpccarb L.dlpccarb dlrgdp dlrgdp2 D.frleg D.frleg2 dlpopden  D.fr_lrgdp D.fr2_lrgdp2 D.renew D.lfoss period*, fe lag(6)
My results from both these models are pretty similar. However, standard error with lag 6 are greater than with lag 7.

Generally, to decide on lag length we use AIC BIC. But, in the post estimation command of XTSCC I couldn't find the option of -estat ic- which we use otherwise. (Question 1). How do I decide which is more appropriate lag length?)

I also thought of determining lag length based on -xtunitroot fisher-
Code:
. xtunitroot fisher lrgdp, dfuller trend lags(7)
(551 missing values generated)

Fisher-type unit-root test for lrgdp
Based on augmented Dickey-Fuller tests
--------------------------------------
Ho: All panels contain unit roots           Number of panels       =    133
Ha: At least one panel is stationary        Avg. number of periods =  32.37

AR parameter: Panel-specific                Asymptotics: T -> Infinity
Panel means:  Included
Time trend:   Included
Drift term:   Not included                  ADF regressions: 7 lags
------------------------------------------------------------------------------
                                  Statistic      p-value
------------------------------------------------------------------------------
 Inverse chi-squared(266)  P       349.1656       0.0005
 Inverse normal            Z         2.5666       0.9949
 Inverse logit t(659)      L*        1.5204       0.9356
 Modified inv. chi-squared Pm        3.6057       0.0002
------------------------------------------------------------------------------
 P statistic requires number of panels to be finite.
 Other statistics are suitable for finite or infinite number of panels.
------------------------------------------------------------------------------
Code:
. xtunitroot fisher lrgdp, dfuller trend lags(6)
(551 missing values generated)

Fisher-type unit-root test for lrgdp
Based on augmented Dickey-Fuller tests
--------------------------------------
Ho: All panels contain unit roots           Number of panels       =    133
Ha: At least one panel is stationary        Avg. number of periods =  32.37

AR parameter: Panel-specific                Asymptotics: T -> Infinity
Panel means:  Included
Time trend:   Included
Drift term:   Not included                  ADF regressions: 6 lags
------------------------------------------------------------------------------
                                  Statistic      p-value
------------------------------------------------------------------------------
 Inverse chi-squared(266)  P       313.2160       0.0247
 Inverse normal            Z         2.7447       0.9970
 Inverse logit t(654)      L*        1.9702       0.9754
 Modified inv. chi-squared Pm        2.0471       0.0203
------------------------------------------------------------------------------
 P statistic requires number of panels to be finite.
 Other statistics are suitable for finite or infinite number of panels.
------------------------------------------------------------------------------
In this case, two tests suggest reject unit root and two suggest- do not reject. (Question 2). Could you please tell me which one should be the most appropriate test among the 4 (inverse chi sq, inverse normal, inverse logit and modified inv. chi-sq) to decide on lag length.

How to Create a Twoway Graph with a few lines of quantile Plot

$
0
0
Good morning everyone,

I am a newbie,
Could anyone please help me with how to create a Twoway Graph with a few lines of quantile Plot.

It should be look like this

Thank you,
DavidArray

Changing data based on condition

$
0
0
Hi,

I am using Stata 15 on Windows 10 OS. I have data collected in several rounds as demonstrated in the example below

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id round var1 var2) str3 var3 int var4 str3 var5
1 1 1 25 "Yes" 2500 "Yes"
1 2 1 23 "Yes" 2500 "Yes"
2 1 1 60 "Yes" 1000 "No" 
2 2 2 60 "No"  1000 "No" 
3 1 2 75 "Yes" 3500 "Yes"
3 2 2 75 "Yes" 3500 "No" 
end
The goal is to modify any discrepancy in earlier round in cases where there's discrepancy between rounds. For instance, for id 1 change var2 in round 1 to 23, id 2 change var1 and var3 to 2 and No respectively. I would appreciate a simple way out in achieving this task.

Thanks in advance!

Best,
Stephen.

about odds,why the two command get the different results?

$
0
0
a list of numbers,as follows
  drug
  used unused
case 55 128
control 19 164
when use command cci 55 128 19 164 ,i get a odds= 3.708882,95%CI(2.039667,6.941302)
but when use command logit,or
the results like this

logit,or

Logistic regression Number of obs = 366
LR chi2(1) = 22.73
Prob > chi2 = 0.0000
Log likelihood = -242.32729 Pseudo R2 = 0.0448

------------------------------------------------------------------------------
y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 3.708882 1.079555 4.50 0.000 2.096434 6.561524
_cons | .7804878 .0920515 -2.10 0.036 .6194049 .9834621
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

the odds is same,but the 95%CI is different,why?

Estimate utility from going to school - Discrete choice model.

$
0
0
Hello,

I need to estimate utility from going to school in a discrete choice model.

In the dataset, I have some variables: "school" = 1 if a student go to school, = 0 if not; “dist” is distance to the closest higher education institution in kilometers, “parentcollege” is a dummy=1 if at least one of the child’s parents went to college, and "ability in math".

I know that for such binary choice model, I could do logit estimation. But it is to estimate the probability that a student go to school, right? I do not know how to estimate the utility from going to school?

If someone has an idea how to estimate the utility, plz kindly help me.

Thank you!!!


Alternative to tobit regression for left-censored biomarker variable

$
0
0
Hi,

I'm a novice when it comes to stats so apologies if the question is a bit stupid!

I'm currently trying to do multivariate analysis on a biomarker variable, which is my DV. This variable is left-censored due to the limit of detection on the assay. The variable is non-normal, even after log-transformation. For these reasons, linear regression is inappropriate. I have explored tobit regression but due to a high degree of heteroscedasticity, the model is very inconsistent. I've also tried quartile regression with ordinal regression, however, the test for parallel lines is significant so I don't feel this method is appropriate either. Could anyone advise me on an alternative method? I'm currently using SPSS but also have access to STATA.

Thanks,

Claire

pscore or pscore 2

$
0
0
can anyone help me with pscore 2 command or provide me details on the authors who generated pscore2 command

weighted Kappa

$
0
0
Hi I was wondering if anyone could help?
I have tried the help sections and looking online on other places with help about how to get state to do a weighted Kappa score.

Myself and another colleague had rated guidelines on a 7 point scale -1 being poorest and 7 being best across 23 questions, When I did the kappa (unweighted) theres a significance disagreement, but actually in many of them the difference is 1 or 2 ie. scoring Q 14 a 4 and him scoring it a 5. My question is using a quadratic weight of 1.00, 0.89, 1.00, 0.56, 0.89, 1.00, 0.00, 0.56, 0.89, 10.00 should it be able to generate a weighted kappa I cant seem to get it to? im not a statistician so apologies if this seems very easy

Problem with plotting a decision tree using Stata 16's Stata/Python integration

$
0
0
Dear all,

I have a problem with the Stata/Python integration. I would like to plot a tree after using the "DecisionTreeClassifier()" from the Scikit-Learn Python library. When I run the code (see below) into Python it works perfectly, but when I run the same code into Stata it comes up with this error:

Traceback (most recent call last):

File "<stdin>", line 1, in <module>
File "/anaconda3/lib/python3.7/site-packages/pydotplus/graphviz.py", line 1797, in <lambda>
lambda f=frmt, prog=self.prog: self.create(format=f, prog=prog)
File "/anaconda3/lib/python3.7/site-packages/pydotplus/graphviz.py", line 1960, in create
'GraphViz\'s executables not found')
pydotplus.graphviz.InvocationException: GraphViz's executables not found
(1 line skipped)


It seems Stata does not find and executable file when using graphviz.

The Stata code I run is this one (the Python code is the same except for the line "import sfi").

-------------------------------------------------------------------------
python:
# Load libraries
import sfi
from sklearn.tree import DecisionTreeClassifier
from sklearn import datasets
from IPython.display import Image
from sklearn import tree
import pydotplus
# Load data
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Create decision tree classifer object
clf = DecisionTreeClassifier(random_state=0)
# Train model
model = clf.fit(X, y)
# Create DOT data
dot_data = tree.export_graphviz(clf, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names)
# Draw graph
graph = pydotplus.graph_from_dot_data(dot_data)
# Show graph
Image(graph.create_png())
end

-------------------------------------------------------------------------


Any help?

Thanks in advance.

Best,

Giovanni









--

Dr. Giovanni Cerulli

IRcRES-CNR

Phone: 003949937846

Mobile: 00393475283966

Survival anlysis with interval censorship

$
0
0
Hi experts,

I am working on survival analysis with interval censorship. In our dataset, an object are required to attend a regular checkup and if it fails to showup at a particular checkup, we know that it dies. However we don't know exactly when it dies but rather an interval between the last attended checkup and the recent unattended checkup.

I am aware of stinreg that is specifically designed to handle this kind of dataset but we need to make some distributional assumption.

I am just wonder whether it is possible to use any semi-parametric or non-parametric for the case?

What happen if I use stcox and specify the death at the recent unattended checkup? How stcox deals with interval censorship?

Thanks for your help!

Mi predict basesurv

$
0
0
I am having problems trying to get the baseline survivor function after mi predict.
I am using the following code:
mi predict basesurv using miest, basesurv

and get the following error:
option basesurv not allowed r(198)

Any suggestions?

Thanks!

Display variables with specific number

$
0
0
Hi there,
I want to know all variables containing a specific number. For example, I want to know all the variables that contain -222 in the dataset. How should I do this in stata?

How to report a constant term in a probit model?

$
0
0
Hi all,

I have a probit regression and I present the results based on marginal effects. I was told that I needed to report the constant term as well.

Is there a way to do that? Stata provides me with a number when I use

Code:
margins, dydx(_cons)
but I highly doubt that it has a meaning. Should I report the constant term as it is or is there another way to follow to make it meaningful?

Thanks!

Difference-in-difference with same group but three time periods

$
0
0
Hello Statalist community.

I have a question regarding a difference-in-difference regression I want to run. I assume that it is rater trivial, however, I have not yet been able to confidently solve my problem.

In my research I am investigating the impact of a policy, which came into effect in 2005. The diff-in-diff regression I developed for the overall assessment his the following:

(1) Pit = alpha ETSi + beta post + gamma ETSi * post + deltai + epsilont + zeta,

where Pit is the patent output for a firm i in year t; ETSi is a dummy equal to one for a firm that becomes regulated in 2005; post is dummy equal to one for the post-treatment period; and ETSi * post is the interaction effect; deltai measures any firm fixed effects; epsilont measures common shock to firms; and zeta is the error term. The main coefficient of interest is gamma, which measures the policy effect onto the patent output of firms.

Now I want to extend this formula to assess a policy refinement which came into action in 2008 and am wondering how to extend the model. I am interested in particular in assessing:
  1. The impact of phase 1 of the policy (2005-2007)
  2. The impact of phase 2 of the policy (2008-2012)
  3. The phase difference, i.e. is there a significant difference in the impact of phase 1 versus phase 2
I have read quite some articles and posts now, but in each of them the extension of the diff-in-diff always considered multiple time periods (>2) and multiple groups (>2). In my case I study the same groups (=2) over multiple periods (=3; period 1 is the pre-phase, period 2 is phase 1 and period 3 is phase 2).

My question is now: How do I extend this model? Can I just add a dummy and another interaction term (e.g. phase1 and phase2 for the first and second period as depicted below)?

(2) Pit = alpha ETSi + beta phase1 + etaphase 2 + gamma ETSi * phase1 +theta ETSi * phase2 + iota ETSi * phase1 * phase2+ deltai + epsilont + zeta

I assume that this is not possible but also do not know how to further continue. Any help would be appreciated!

Thank you
Lennart

Expression too long error while finding synthetic control

$
0
0
Dear all,

I am using Stata 15.0 to find a synthetic control group using synth command. No matter how short my expression is, I receive the message "Expression too long" r(130).

The dataset consists of 540 time periods and 83 units (44,820 observations overall).

My shortest try was:

Code:
. synth ln_births urbanization, trunit(36) trperiod(510) counit(1 7)

------------------------------------------------------------------------------------------------------------------
Synthetic Control Method for Comparative Case Studies
------------------------------------------------------------------------------------------------------------------

First Step: Data Setup
------------------------------------------------------------------------------------------------------------------
expression too long
r(130);
What can be the problem?

Thank you!

Conditional Logistic Regression

$
0
0
Would anybody know if a software add-on similar to the Conditional Logistic Regression functionality in Stata with Dependent, Independent and Group Variable inputs, can be purchased for Excel?
Thanks
Viewing all 72832 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>