Missing Wald Chi Squared in multilevel model

October 23, 2019, 2:05 pm

≫ Next: Compile / Export all table values form -estat phtest, detail-

Dear Statalist,

I am encountering a missing Wald Chi Squared statistic problem in a multilevel logit model running Stata 14. The data is cross-sectional, with respondents (500-1003 obs per group) nested in 9 countries. The dependent variable is binary and I am attempting to run melogit command of the form:

Code:

melogit y x1- x18 [pweight=weightvar] || country: , or

I was wondering if the problem had to do with my degrees of freedom, due to my 18 variables and only 9 country clusters. Something like what Clyde Schechter pointed out here with clustered errors here: https://www.statalist.org/forums/for...ssing-wald-chi

However, the problem appears to come more directly from the use of sampling weights, which are calculated based upon national demographic data from the 9 countries. When I run the model without the weights, I get the Wald chi2 & prob>chi2 test statistics with the 18 variables and 9 country clusters. Once the weights are included, these statistics do not display, but the model converges and I get odds ratios in the resulting table. Also, the LR test vs. logistic model is significant, indicating that a multilevel model is warranted for the unweighted model.

Any diagnostic thoughts would be great and advice is obviously welcome.

Sincerely,

Eric

↧

Compile / Export all table values form -estat phtest, detail-

October 23, 2019, 2:06 pm

≫ Next: Estimate choice model - Data issue

≪ Previous: Missing Wald Chi Squared in multilevel model

Dear all,

I am running several Cox PH survival models and use Schoenfeld residuals as a first layer of tests for the proportional-hazard assumption. In the tests, I am not only interested in the global test results, but would like to report all values of each individual variable. The Problem: As I am running at least one dozen tests for several models I would like to / will need to compile and export all tables in to *.tex files automaticially to stay efficient (just like in compling regression tables via -estout-, say). However, it seems like I can not access these values from Stata's storage (see -ereturn list- in the code below).

Question: Is there a solution to fetch all values reported in -estat phtest, detail- and compile / export then into a (neat) table, at best in .tex format?

Here a sample code and data set for my issue:

Code:

version 14

webuse drugtr

stcox drug age

estat phtest, detail

ereturn list // see for info

// compile entire table to *.tex

Thanks a lot for your help in advance,
Michael

↧

Estimate choice model - Data issue

October 23, 2019, 2:28 pm

≫ Next: Using Suest to test coefficients across ARIMA models

≪ Previous: Compile / Export all table values form -estat phtest, detail-

Hello,

I am trying to estimate the schooling decision in a 2-period model. In t=1, student decide whether to go to high school. In t=2, decide whether to go to college.
In the data set, there is one variable "school" which = 1 if student go to high school/college; and =0 if not for period 1; and =1 if student go to college, = 0 if not for period 2. Another variable is "period", which equals 1 for period 1, and equals 2 for period 2.
There are also other variables describing the "qualities" of students.
I need to estimate the utility from going to high school in period 1. But the problem is that the data set is not separated for two periods.
Then if I do regression: "logit school .....", then it will mess up both periods.

How can I estimate each period separately?

If any one has an idea, plz kindly let me know. Thanks a lot!

↧

Using Suest to test coefficients across ARIMA models

October 23, 2019, 3:28 pm

≫ Next: Determining lag length in a panel dataset.

≪ Previous: Estimate choice model - Data issue

Hello,
I am running several ARIMA models and I want to test the equivalence of the coefficients with the test command. However, I get the following error:
first was estimated with a nonstandard vce (opg) where "first" is the name of my first model. To be clear, here is the code I am running.

quietly arima lmp d1.lp if fp==1, arima(1,0,2)
estimates store first

quietly arima lmp d1.lp if sp==1, arima(2,0,0)
estimates store second

quietly arima lmp d1.lp if tp==1, arima(1,0,0)
estimates store third

quietly arima lmp d1.lp, arima(1,0,2)
estimates store whole

suest first second third whole

Thanks....Any substitutes if it doesn't work?
Joshua

↧

Determining lag length in a panel dataset.

October 23, 2019, 4:52 pm

≫ Next: How to Create a Twoway Graph with a few lines of quantile Plot

≪ Previous: Using Suest to test coefficients across ARIMA models

Hi all,

I'm using -xtscc- command to estimate a macroeconomic mode. This dataset set has 133 countries and 33 years (unbalanced panel).
My question pertains to lag length used in a fixed effect model.

Variables: dlpccarb: log of per capita carbon emissions, dlrgdp: log of per capita GDP, dlpopden: log of population density, frleg: institutional integrity, fr_lrgdp: interaction between institutional integrity and gdp, lfoss: log of fossil fuels, renew: renewables (% of eneregy consumption)

Code:

xtscc dlpccarb L.dlpccarb dlrgdp dlrgdp2 D.frleg D.frleg2 dlpopden  D.fr_lrgdp D.fr2_lrgdp2 D.renew D.lfoss period*, fe lag(7)
xtscc dlpccarb L.dlpccarb dlrgdp dlrgdp2 D.frleg D.frleg2 dlpopden  D.fr_lrgdp D.fr2_lrgdp2 D.renew D.lfoss period*, fe lag(6)

My results from both these models are pretty similar. However, standard error with lag 6 are greater than with lag 7.

Generally, to decide on lag length we use AIC BIC. But, in the post estimation command of XTSCC I couldn't find the option of -estat ic- which we use otherwise. (Question 1). How do I decide which is more appropriate lag length?)

I also thought of determining lag length based on -xtunitroot fisher-

Code:

. xtunitroot fisher lrgdp, dfuller trend lags(7)
(551 missing values generated)

Fisher-type unit-root test for lrgdp
Based on augmented Dickey-Fuller tests
--------------------------------------
Ho: All panels contain unit roots           Number of panels       =    133
Ha: At least one panel is stationary        Avg. number of periods =  32.37

AR parameter: Panel-specific                Asymptotics: T -> Infinity
Panel means:  Included
Time trend:   Included
Drift term:   Not included                  ADF regressions: 7 lags
------------------------------------------------------------------------------
                                  Statistic      p-value
------------------------------------------------------------------------------
 Inverse chi-squared(266)  P       349.1656       0.0005
 Inverse normal            Z         2.5666       0.9949
 Inverse logit t(659)      L*        1.5204       0.9356
 Modified inv. chi-squared Pm        3.6057       0.0002
------------------------------------------------------------------------------
 P statistic requires number of panels to be finite.
 Other statistics are suitable for finite or infinite number of panels.
------------------------------------------------------------------------------

Code:

. xtunitroot fisher lrgdp, dfuller trend lags(6)
(551 missing values generated)

Fisher-type unit-root test for lrgdp
Based on augmented Dickey-Fuller tests
--------------------------------------
Ho: All panels contain unit roots           Number of panels       =    133
Ha: At least one panel is stationary        Avg. number of periods =  32.37

AR parameter: Panel-specific                Asymptotics: T -> Infinity
Panel means:  Included
Time trend:   Included
Drift term:   Not included                  ADF regressions: 6 lags
------------------------------------------------------------------------------
                                  Statistic      p-value
------------------------------------------------------------------------------
 Inverse chi-squared(266)  P       313.2160       0.0247
 Inverse normal            Z         2.7447       0.9970
 Inverse logit t(654)      L*        1.9702       0.9754
 Modified inv. chi-squared Pm        2.0471       0.0203
------------------------------------------------------------------------------
 P statistic requires number of panels to be finite.
 Other statistics are suitable for finite or infinite number of panels.
------------------------------------------------------------------------------

In this case, two tests suggest reject unit root and two suggest- do not reject. (Question 2). Could you please tell me which one should be the most appropriate test among the 4 (inverse chi sq, inverse normal, inverse logit and modified inv. chi-sq) to decide on lag length.

↧

How to Create a Twoway Graph with a few lines of quantile Plot

October 23, 2019, 7:50 pm

≫ Next: Changing data based on condition

≪ Previous: Determining lag length in a panel dataset.

Good morning everyone,

I am a newbie,
Could anyone please help me with how to create a Twoway Graph with a few lines of quantile Plot.

It should be look like this

Thank you,
DavidArray

↧

Changing data based on condition

October 23, 2019, 11:15 pm

≫ Next: about odds，why the two command get the different results？

≪ Previous: How to Create a Twoway Graph with a few lines of quantile Plot

Hi,

I am using Stata 15 on Windows 10 OS. I have data collected in several rounds as demonstrated in the example below

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id round var1 var2) str3 var3 int var4 str3 var5
1 1 1 25 "Yes" 2500 "Yes"
1 2 1 23 "Yes" 2500 "Yes"
2 1 1 60 "Yes" 1000 "No" 
2 2 2 60 "No"  1000 "No" 
3 1 2 75 "Yes" 3500 "Yes"
3 2 2 75 "Yes" 3500 "No" 
end

The goal is to modify any discrepancy in earlier round in cases where there's discrepancy between rounds. For instance, for id 1 change var2 in round 1 to 23, id 2 change var1 and var3 to 2 and No respectively. I would appreciate a simple way out in achieving this task.

Thanks in advance!

Best,
Stephen.

↧

about odds，why the two command get the different results？

October 23, 2019, 11:39 pm

≫ Next: Estimate utility from going to school - Discrete choice model.

≪ Previous: Changing data based on condition

a list of numbers，as follows

	drug
	used	unused
case	55	128
control	19	164

when use command cci 55 128 19 164 ，i get a odds= 3.708882，95%CI（2.039667，6.941302）
but when use command logit,or
the results like this

logit,or

Logistic regression Number of obs = 366
LR chi2(1) = 22.73
Prob > chi2 = 0.0000
Log likelihood = -242.32729 Pseudo R2 = 0.0448

------------------------------------------------------------------------------
y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 3.708882 1.079555 4.50 0.000 2.096434 6.561524
_cons | .7804878 .0920515 -2.10 0.036 .6194049 .9834621
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

the odds is same，but the 95%CI is different，why？

↧

Estimate utility from going to school - Discrete choice model.

October 24, 2019, 12:58 am

≫ Next: Alternative to tobit regression for left-censored biomarker variable

≪ Previous: about odds，why the two command get the different results？

Hello,

I need to estimate utility from going to school in a discrete choice model.

In the dataset, I have some variables: "school" = 1 if a student go to school, = 0 if not; “dist” is distance to the closest higher education institution in kilometers, “parentcollege” is a dummy=1 if at least one of the child’s parents went to college, and "ability in math".

I know that for such binary choice model, I could do logit estimation. But it is to estimate the probability that a student go to school, right? I do not know how to estimate the utility from going to school?

If someone has an idea how to estimate the utility, plz kindly help me.

Thank you!!!

↧

Alternative to tobit regression for left-censored biomarker variable

October 24, 2019, 1:53 am

≫ Next: pscore or pscore 2

≪ Previous: Estimate utility from going to school - Discrete choice model.

Hi,

I'm a novice when it comes to stats so apologies if the question is a bit stupid!

I'm currently trying to do multivariate analysis on a biomarker variable, which is my DV. This variable is left-censored due to the limit of detection on the assay. The variable is non-normal, even after log-transformation. For these reasons, linear regression is inappropriate. I have explored tobit regression but due to a high degree of heteroscedasticity, the model is very inconsistent. I've also tried quartile regression with ordinal regression, however, the test for parallel lines is significant so I don't feel this method is appropriate either. Could anyone advise me on an alternative method? I'm currently using SPSS but also have access to STATA.

Thanks,

Claire

↧

pscore or pscore 2

October 24, 2019, 2:08 am

≫ Next: weighted Kappa

≪ Previous: Alternative to tobit regression for left-censored biomarker variable

can anyone help me with pscore 2 command or provide me details on the authors who generated pscore2 command

↧

weighted Kappa

October 24, 2019, 3:08 am

≫ Next: Problem with plotting a decision tree using Stata 16's Stata/Python integration

≪ Previous: pscore or pscore 2

Hi I was wondering if anyone could help?
I have tried the help sections and looking online on other places with help about how to get state to do a weighted Kappa score.

Myself and another colleague had rated guidelines on a 7 point scale -1 being poorest and 7 being best across 23 questions, When I did the kappa (unweighted) theres a significance disagreement, but actually in many of them the difference is 1 or 2 ie. scoring Q 14 a 4 and him scoring it a 5. My question is using a quadratic weight of 1.00, 0.89, 1.00, 0.56, 0.89, 1.00, 0.00, 0.56, 0.89, 10.00 should it be able to generate a weighted kappa I cant seem to get it to? im not a statistician so apologies if this seems very easy

↧

Problem with plotting a decision tree using Stata 16's Stata/Python integration

October 24, 2019, 3:39 am

≫ Next: Survival anlysis with interval censorship

≪ Previous: weighted Kappa

Dear all,

I have a problem with the Stata/Python integration. I would like to plot a tree after using the "DecisionTreeClassifier()" from the Scikit-Learn Python library. When I run the code (see below) into Python it works perfectly, but when I run the same code into Stata it comes up with this error:

Traceback (most recent call last):

File "<stdin>", line 1, in <module>
File "/anaconda3/lib/python3.7/site-packages/pydotplus/graphviz.py", line 1797, in <lambda>
lambda f=frmt, prog=self.prog: self.create(format=f, prog=prog)
File "/anaconda3/lib/python3.7/site-packages/pydotplus/graphviz.py", line 1960, in create
'GraphViz\'s executables not found')
pydotplus.graphviz.InvocationException: GraphViz's executables not found
(1 line skipped)

It seems Stata does not find and executable file when using graphviz.

The Stata code I run is this one (the Python code is the same except for the line "import sfi").

-------------------------------------------------------------------------
python:
# Load libraries
import sfi
from sklearn.tree import DecisionTreeClassifier
from sklearn import datasets
from IPython.display import Image
from sklearn import tree
import pydotplus
# Load data
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Create decision tree classifer object
clf = DecisionTreeClassifier(random_state=0)
# Train model
model = clf.fit(X, y)
# Create DOT data
dot_data = tree.export_graphviz(clf, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names)
# Draw graph
graph = pydotplus.graph_from_dot_data(dot_data)
# Show graph
Image(graph.create_png())
end

-------------------------------------------------------------------------

Any help?

Thanks in advance.

Best,

Giovanni

--

Dr. Giovanni Cerulli

IRcRES-CNR

Phone: 003949937846

Mobile: 00393475283966

↧

Survival anlysis with interval censorship

October 24, 2019, 4:25 am

≫ Next: Mi predict basesurv

≪ Previous: Problem with plotting a decision tree using Stata 16's Stata/Python integration

Hi experts,

I am working on survival analysis with interval censorship. In our dataset, an object are required to attend a regular checkup and if it fails to showup at a particular checkup, we know that it dies. However we don't know exactly when it dies but rather an interval between the last attended checkup and the recent unattended checkup.

I am aware of stinreg that is specifically designed to handle this kind of dataset but we need to make some distributional assumption.

I am just wonder whether it is possible to use any semi-parametric or non-parametric for the case?

What happen if I use stcox and specify the death at the recent unattended checkup? How stcox deals with interval censorship?

Thanks for your help!

↧

Mi predict basesurv

October 24, 2019, 5:50 am

≫ Next: Display variables with specific number

≪ Previous: Survival anlysis with interval censorship

I am having problems trying to get the baseline survivor function after mi predict.
I am using the following code:
mi predict basesurv using miest, basesurv

and get the following error:
option basesurv not allowed r(198)

Any suggestions?

Thanks!

↧

Display variables with specific number

October 24, 2019, 5:52 am

≫ Next: How to report a constant term in a probit model?

≪ Previous: Mi predict basesurv

Hi there,
I want to know all variables containing a specific number. For example, I want to know all the variables that contain -222 in the dataset. How should I do this in stata?

↧

How to report a constant term in a probit model?

October 24, 2019, 5:57 am

≫ Next: Difference-in-difference with same group but three time periods

≪ Previous: Display variables with specific number

Hi all,

I have a probit regression and I present the results based on marginal effects. I was told that I needed to report the constant term as well.

Is there a way to do that? Stata provides me with a number when I use

Code:

margins, dydx(_cons)

but I highly doubt that it has a meaning. Should I report the constant term as it is or is there another way to follow to make it meaningful?

Thanks!

↧

Difference-in-difference with same group but three time periods

October 24, 2019, 8:04 am

≫ Next: Expression too long error while finding synthetic control

≪ Previous: How to report a constant term in a probit model?

Hello Statalist community.

I have a question regarding a difference-in-difference regression I want to run. I assume that it is rater trivial, however, I have not yet been able to confidently solve my problem.

In my research I am investigating the impact of a policy, which came into effect in 2005. The diff-in-diff regression I developed for the overall assessment his the following:

(1) P_it = alpha ETS_i + beta post + gamma ETS_i* post + delta_i+ epsilon_t+ zeta,

where P_itis the patent output for a firm i in year t; ETS_iis a dummy equal to one for a firm that becomes regulated in 2005; post is dummy equal to one for the post-treatment period; and ETS_i* post is the interaction effect; delta_imeasures any firm fixed effects; epsilon_tmeasures common shock to firms; and zeta is the error term. The main coefficient of interest is gamma, which measures the policy effect onto the patent output of firms.

Now I want to extend this formula to assess a policy refinement which came into action in 2008 and am wondering how to extend the model. I am interested in particular in assessing:

The impact of phase 1 of the policy (2005-2007)
The impact of phase 2 of the policy (2008-2012)
The phase difference, i.e. is there a significant difference in the impact of phase 1 versus phase 2

I have read quite some articles and posts now, but in each of them the extension of the diff-in-diff always considered multiple time periods (>2) and multiple groups (>2). In my case I study the same groups (=2) over multiple periods (=3; period 1 is the pre-phase, period 2 is phase 1 and period 3 is phase 2).

My question is now: How do I extend this model? Can I just add a dummy and another interaction term (e.g. phase1 and phase2 for the first and second period as depicted below)?

(2) P_it = alpha ETS_i + beta phase1 + etaphase 2 + gamma ETS_i* phase1 +theta ETS_i* phase2 + iota ETS_i* phase1 * phase2+ delta_i+ epsilon_t+ zeta

I assume that this is not possible but also do not know how to further continue. Any help would be appreciated!

Thank you
Lennart

↧

Expression too long error while finding synthetic control

October 24, 2019, 8:14 am

≫ Next: Conditional Logistic Regression

≪ Previous: Difference-in-difference with same group but three time periods

Dear all,

I am using Stata 15.0 to find a synthetic control group using synth command. No matter how short my expression is, I receive the message "Expression too long" r(130).

The dataset consists of 540 time periods and 83 units (44,820 observations overall).

My shortest try was:

Code:

. synth ln_births urbanization, trunit(36) trperiod(510) counit(1 7)

------------------------------------------------------------------------------------------------------------------
Synthetic Control Method for Comparative Case Studies
------------------------------------------------------------------------------------------------------------------

First Step: Data Setup
------------------------------------------------------------------------------------------------------------------
expression too long
r(130);

What can be the problem?

Thank you!

↧

Conditional Logistic Regression

October 24, 2019, 8:38 am

≫ Next: Falsification test for difference and difference with panel data

≪ Previous: Expression too long error while finding synthetic control

Would anybody know if a software add-on similar to the Conditional Logistic Regression functionality in Stata with Dependent, Independent and Group Variable inputs, can be purchased for Excel?
Thanks

↧