Marginal effects for hurdle negative binomial

December 14, 2016, 6:46 am

≫ Next: Rergerssion graph quadratic relationship

≪ Previous: Truncated variable labels and string variables

Dear Statalisters,

I am running a hurdle negative binomial model on a counting deprivation index. I want to calculate the marginal effects (of say, log absolute income, relative income or/and social class) on the deprivation counts. I tried to use margins [margins, dydx(*) predict(equation(#2))] after the user-written command hnblogit but that does not work -I just got the same regression results.

What would you recommend me for an efficient way of calculating marginal effects (ideally for all the predictors) for a negative binomial hurdle model?

Many thanks in advance,

Selcuk

↧

Rergerssion graph quadratic relationship

December 14, 2016, 7:48 am

≫ Next: Creating Dates in Months between Two Dates

≪ Previous: Marginal effects for hurdle negative binomial

I am writing a regression of the form:
y=b1+b2x+b3x^2+...+bkxk+e

I then want to to plot the quadratic relationship between y and x, holding all other variables at their average values. What command should I use for this?

↧

Creating Dates in Months between Two Dates

December 14, 2016, 7:52 am

≫ Next: Coding several variables at one time

≪ Previous: Rergerssion graph quadratic relationship

Hello everybody,

I have a rather sophisticated question which I can't seem to answer myself - and therefore would like to have your input.
I have a list of credit loans - all identified by one specific ID (Var1). Furthermore, I have one Startdate (Var2) and one Enddate (Var4) for each loan in question.
Now, what I need to eventually calculate the CF for each specific loan is to GENERATE a Variable "Time" (Var4) in which STATA generates the missing dates in MONTHS for every loan in question.
For example:

ID	Start Date	End Date	Time (Var4)
1	01/2016	05/2016	01/2016
1			02/2016
1			03/2016
1			04/2016
1			05/2016
2	01/2016	04/2016	01/2016
2			02/2016
2			03/2016
2			04/2016
3	01/2016	02/2016	01/2016
3			02/2016

... and so forth, for every loan with a UNIQUE ID.

Thank you very much in advance!
Sincerely,
Nicolas

↧

Coding several variables at one time

December 14, 2016, 8:37 am

≫ Next: Difference in differences analysis with fixed effects

≪ Previous: Creating Dates in Months between Two Dates

I have a dataset with following format-

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long ID byte(CSEDPD CSEDPN CSEDNA CSEDPL CSEDML CSEDDS CSEDAD CSRVLR CSRVRR CSRVRI CSRVDR CSRVDI CSRVCR CSCPLT CSCPRT CSCPM CSCPCR CSCPNO CSDSCG CSDSCR CSDSDA CSDSDD CSDSDL CSDSDS CSBCMA CSBCPTV CSBCCV CSBCPV)
10001 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1
10005 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
10006 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 2 2 1 1 1 1 1 1 1 1
10007 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
10010 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
10011 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
10019 1 3 1 3 1 1 1 1 1 1 1 1 1 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1
10021 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 5 5 5
10022 1 1 1 1 1 1 1 1 1 5 1 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 6 1
end

The first variable called ID has unique hospital IDs. Then, there are 28 variables that represent IT functionalities of each hospital. Each hospital should get 2 points if a particular IT functionality was coded as 1; 1 point if it was coded as 2; and 0 points if it was coded as 3,4,5.6.
Is there a quick and efficient way to do this? All I can think of is generating 28 new variables and finding a summated score.

Thank you!
Soumya

↧

Difference in differences analysis with fixed effects

December 14, 2016, 9:03 am

≫ Next: How to interpret the DID result (beginner)

≪ Previous: Coding several variables at one time

Hi everybody,

I'm currently trying to conduct a special diff-in-diff analysis with stata. My research is about sovereign credit ratings and their impact on cross-border M&A volumes. One of the goals of my study is to test if the impact of sovereign credit ratings on M&A changed as a result of the crisis. So I dont want to test if the volume or the ratings changed because of the crisis, but the impact of the ratings on the M&A volume. I think that with a common diff-in-diff analysis this cannot be tested, right? So Im really struggeling with this and hope to find some answers here.

Thanks in advance for you help!

↧

How to interpret the DID result (beginner)

December 14, 2016, 9:51 am

≫ Next: coefplot

≪ Previous: Difference in differences analysis with fixed effects

Hello!

I have run the DID analysis however I don't understand how to read the result of this. So, I would like to find out about the effect of mount eruption to the tourism in city 5 as the treatment compared with 4 other cities (city 6-9) as the control. I have the data from 2007-2014 where at 2010 (end) the mountain was erupted.

I have made : gen time = (year>=2011) & !missing(year) and gen treated = (city>5) & !missing(city)

Hereby I attached the result. I don't know how to interpret this and your help would means a lot. Thank you

↧

coefplot

December 14, 2016, 10:55 am

≫ Next: [nl] fit a regression with interval inequality constraints

≪ Previous: How to interpret the DID result (beginner)

Hello, I would like to do a grafic where in the axis of the Y corresponds to the dependent variables of 22 models and in the axis of the X corresponds the confidence interval ONLY of the variable POBRELOGIT (This is an independent variable: Takes the value 1 when the persons are poor and 0, when not). The models are logits.

I have tried to use the command “coefplot” after the logit, but in the axis Y, it locates all the independent variables and I don´t want that. Furthermore, I believe that this command only works when the dependent variable of the models are the same. In my case, I have 22 dependent variable different but the independent variable are the same. The dependents variables are 1 when it has received a program social and 0 when not. For example:

logit BPG pobrelogit income edad
logit BVD pobrelogit income edad
logit BCP pobrelogit income edad
logit BTP pobrelogit income edad

I need to graph the behavior of the variable POBRELOGIT in the different logistic models in an only graphic. Is it possible to do that?

↧

[nl] fit a regression with interval inequality constraints

December 14, 2016, 11:04 am

≫ Next: Fine and Gray stratified

≪ Previous: coefplot

Hello everyone

I am trying to fit a linear regression with an inequality as in http://www.stata.com/support/faqs/st...l-constraints/

However, I want to impose that a < 0.
Mathematically it would make sense to say that (-a) > 0.

In my case, a = alpha + 0.0025

This would mean that alpha + 0.0025 < 0 and, therefore, (-alpha) - 0.0025 > 0. How do I fit this?

Thank you in advance

↧

Fine and Gray stratified

December 14, 2016, 11:37 am

≫ Next: 3 categories not present in output despite creating dummies

≪ Previous: [nl] fit a regression with interval inequality constraints

Is it possible to perform a fine and gray subhazard model stratified for one or more variable?...
for example: stcrreg x1 x2 x3, compete(status==1) strata(........)

↧

3 categories not present in output despite creating dummies

December 14, 2016, 12:46 pm

≫ Next: Gravity model using Paneldata from CEPii

≪ Previous: Fine and Gray stratified

I have the following dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long MCRNUM double(READM_30_AMI pos_overall) float(fin_ehr_scr ownership bedcode teach_status) byte mhsmemb float(rural_urban_codes Sigma_mkt_shr_sq) double percapita_inc_2013 float(mcd_pct mcr_pct) double siteid
30014 16.7 .6236115758388308    . 0 1 1 1 0 .06709579 40030  21.34023 49.66715 341330
30038 17.6 .6049491002900073    . 0 2 0 1 0 .26497364 40030 22.755203  46.3112    183
30087   16 .6614742692441345    . 0 2 0 1 0 .26497364 40030 23.122984 45.58777    184
30092 16.2 .6025510664834064    . 0 1 1 1 0 .06709579 40030 21.340525 49.66676 403730
30123 16.2 .7163663401606037    . 0 0 1 1 0 .26497364 40030 25.094004 41.75737    185
43300    .  .570111087839279 .875 0 2 0 0 0 .12786885 47854  60.80886 .9391912 340755
50026 15.9 .6929756214657257    1 0 2 0 1 0  .7438314 51384 37.781467 45.57914 401624
50090    . .8017101010624738    . 1 0 1 0 0  .9580735 50312 19.672886 61.59871 403759
50100 16.6 .7503125383685314    1 0 2 0 1 0 .10615902 51384 18.643398 48.38553 401627
50100 16.6 .6895736756822302    1 0 2 0 1 0 .10615902 51384 18.643398 48.38553 401625
end

In my original dataset , I have created propensity weights. And I run the following commands:
svyset[pw= pwt_new]
svy:regress READM_30_AMI pos_overall fin_ehr_scr i.ownership i.bedcode teach_status mhsmemb i.rural_urban_codes Sigma_mkt_shr_sq percapita_inc_2013 mcd_pct mcr_pct

However, in the output, rural_urban_codes appears only once. It has 3 categories in the data and I expect all three categories to show in the results. As you can see, I have created dummy variables by including and i. before the variable name. Despite doing that, the output does not show results for all three categories. Please respond and thank you!

↧

Gravity model using Paneldata from CEPii

December 14, 2016, 1:21 pm

≫ Next: Question about global/local macro

≪ Previous: 3 categories not present in output despite creating dummies

Hi,

This is my first time using STATA and I have a hard time getting the hang of if. I want to preform a gravity model using panel data from CEPII website.
My data is generated like this:
Array

What I am wondering is following: How to I organize my data so I can tell STATA it is panel data so I can add fixed effects for time, export and import? I want iso_o to be the panelvar and Year as time. Also how can I test for heteroskedasticity?

Thank you in advance.

Kind regards,
two incompetent bachelor students

↧

Question about global/local macro

December 14, 2016, 1:37 pm

≫ Next: reshaping data

≪ Previous: Gravity model using Paneldata from CEPii

Hi guys,

I am trying to use a loop to assign value from variable (b0 to b30) to a new variable fcast_value, lag_date is a variable too.

foreach i = 2(1)_N {
global lag_temp = lag_date[`i']
replace fcast_value = b$lag_temp if id == `i'
}

I got error message below:

lag_date not found

However if I run :

disp lag_date[`i']

the result will be :

3

It means that stata knows what value is in lag_date[`i']. Why can't I assign this value to global macro lag_temp?

Thanks!

↧

reshaping data

December 14, 2016, 2:22 pm

≫ Next: Variables order in mixlogit

≪ Previous: Question about global/local macro

Hi,

I have the following daily dateset over different locations and departments and two questions (Below is a small sample):

Code:

input str10 date int location int department int question str10 answer
date location department question answer
2016-01-04   1   1   1   Awesome
2016-01-04   1   1   1   Awesome
2016-01-04   1   1   1   Good
2016-01-04   1   1   1   Good
2016-01-04   1   1   1   Good
2016-01-04   1   1   1   SoSo
2016-01-04   1   1   1   SoSo
2016-01-04   1   1   2   1
2016-01-04   1   1   2   2
2016-01-04   1   1   2   3
2016-01-04   1   2   1   Good
2016-01-04   1   2   1   Good
2016-01-04   1   2   1   Good
2016-01-04   1   2   1   SoSo
2016-01-04   1   2   1   SoSo
2016-01-04   1   2   2   1
2016-01-04   1   2   2   5
end

People everyday in these departments answered two questions. The second question has numerical values, but the other question has qualitative responses (Awesome, good, so so, etc).

I need to collapse over (day location department) for question 2 to find the daily average for each department. But, I don't know how to approach first question. I was thinking maybe I can make data wide for that question, so that I know what share of people in department 1 in location 1 answered "Awesome" in one column, what share answered "so so"in another column, etc. In other word, I want my data columns become day location department question2 awesome good soso. So that, I end up having only one line of observation for each department. something like:

Code:

clear
input str10 date int location int department int question2 float awesome float good float soso
date location department question2 awesome good soso
2016-01-04   1   1   1    0.3 0.4 0.2
2016-01-04   1   2   3   0  0.67 0.33
end

Is it possible at all?

I really appreciate your help.

↧

Variables order in mixlogit

December 15, 2016, 6:17 am

≫ Next: Graph of individual growth curves and mean population trajectory, xtline?

≪ Previous: reshaping data

Hi,
I am using the mixlogit command to analyze the data of a choice experiment, because the IIA hypothesis is not respected and because I want to explore the heterogeneity of preferences. I used the following syntax:
mixlogit choice, group(idchoice) id(id) rand(BAU objectif seuil payment)
This estimation works well. However, I realized that if I run the following command in which I just change the order of variables in the rand option:
mixlogit choice, group(idchoice) id(id) rand(objectif BAU seuil payment
The estimation works as well but there are quite large nurmerical difference, although the conclusions remain the same in terms of preference.

Why are the results different? There is no recommendation on the order of variables in the help file for mixlogit. Is there a best practice on the order of variables in the rand option? Does it reveal another problem in my analysis?
Thank you
Best regards

↧

Graph of individual growth curves and mean population trajectory, xtline?

December 15, 2016, 6:40 am

≫ Next: coefplot: Variable labels with time series operators and a macro

≪ Previous: Variables order in mixlogit

Dear all,

I am doing growth models using Stata mixed command. Each individual case in my data has 4 repeated observations, each one year appart. Not all individuals have the same age at entry study however. What I want is a graph showing individual trajectories AND the average trajectory of the sample.

I have already tried the following approaches :

(1) Using twoway graph :

xtmixed y age age_2 || id: age, mle cov(un)
predict y_fitted, fitted
sort id age
twoway (connected y_fitted age, connect msymbol(i) lpattern(dash) (ascending)) || (qfit y age, lwidth(medthick))

At first glance, this code seemed to work just fine, the 4 estimated data points within a person were correctly connected. And there is a curve for the average population. However, some trajectories were connected even though these observations did not belong to the same person. This happened to some cases, but not all of them.

(2) Then I tried the xtline command :

xtline y_fitted, overlay t(age) i(id) legend(off) scheme(s2mono)

It works perfectly for the individual trajectories. BUT, how do I add to this graph the average population curve?

Thanks a lot!

Valérie

↧

coefplot: Variable labels with time series operators and a macro

December 15, 2016, 7:20 am

≫ Next: Collect rolling correlations using statsby

≪ Previous: Graph of individual growth curves and mean population trajectory, xtline?

Say I run a model with time series operators lagging each of two independent variables, listed in a macro, and store the estimates to generate a coefficient plot of just one of the varaibles:

Code:

clear all
input float(year panelid yvar xvar1 xvar2)
1999 1 3 2 3
2000 1 5 2 7
2001 1 6 3 9
2002 1 5 3 1
1999 2 4 8 2
2000 2 8 8 4
2001 2 9 8 8
2002 2 9 8 9
1999 3 2 9 1
2000 3 1 9 3
2001 3 3 7 4
2002 3 4 6 5
end
lab var xvar1 "Ind. Var. 1"
lab var xvar2 "Ind. Var. 2"
local ivars xvar1 xvar2
xtset panelid year
qui eststo: xtreg yvar l.(`ivars')
coefplot, keep(l.(`ivars'))
coefplot, keep(l.xvar1)
coefplot, keep(*`ivars')

You should see that the first two coefplot commands cannot find any coefficients, while the final command only plots the first coefficient even though (I think) the syntax should produce plots for both (when the coefplot command includes a macro with multiple selected variables, it still plots only the first variable). Also, Stata includes "L." with the variable label.

I have two questions:

How should I keep selected variables in the coefplot command when I've used a combination of macros and time series operators?
How should I remove "L." from the coefficient labels?

As for question 2, for example, esttab includes a sub() option, where for regression output I use sub(L. ""), and that removes the "L." from all variable labels. Does coefplot have such an option?

↧

Collect rolling correlations using statsby

December 15, 2016, 7:21 am

≫ Next: order of independent variables in -esttab- command

≪ Previous: coefplot: Variable labels with time series operators and a macro

Dear all,

I have an unbalanced panel dataset of stock and market returns where permno is the panel variable and date is the time variable. The data has been tsset. The sample of interest covers the period from January 1996 through October 2012.

Now I'm trying to estimate 60-months rolling correlations between the stock and market returns. Rolling over 60 months is done by looping. I would like to use the -statsby- command to collect correlation coefficients for the different stocks (permno).

So far, my code looks like this (later I will append all files):

Code:

    * 3. Correlations
    // Daily sample: months from 01Jan1996-31Oct2012
    local k = 1996*12+1     // k=Jan1996
    local l = 2012*12+10    // l=Oct2012
        forvalues i=`k'(1)`l'{
        disp `i'
        quietly use date permno stock_ret market_ret if year(date)*12+month(date)<=`i'&year(date)*12+month(date)>`i'-60 using "C:\...", clear
        if     (r(N)>=200){
            quietly statsby rho_s_m = r(rho), by(permno) saving(C:/.../corr`i', replace): corr stock_ret market_ret
            }
                }

However, tow issues arised when running the code.
1. It takes a lot of time. After 30 minutes, the very first file "corr23953" hasn't been finished, yet.

2. When choosing the "noisily" option, every now and then there is the message

"no observations
captured error running (correlate stock_ret market_ret), posting missing values"

popping up.

My questions are: What did I do wrong? Is there a more efficient way to collect rolling correlations in this case?

Help is highly appreciated.

Best,
Christopher

↧

order of independent variables in -esttab- command

December 15, 2016, 8:40 am

≫ Next: Are there any hidden consequences of using/not using variable prefixes?

≪ Previous: Collect rolling correlations using statsby

the following code shows two "interesting" independent variables (mpg and displacement) whose impact on price I want to investigate with two control variables length and weight.

Code:

sysuse auto.dta, clear

estimates clear
reg price mpg weight length
estimates store reg1
reg price displacement weight length
estimates store reg2

esttab *

The output of this esttab is, however:

Code:

--------------------------------------------
                      (1)             (2)  
                    price           price  
--------------------------------------------
mpg                -86.79                  
                  (-1.03)                  

weight              4.365***        4.613**
                   (3.74)          (3.30)  

length             -104.9*         -97.63*  
                  (-2.64)         (-2.47)  

displacement                        0.727  
                                   (0.10)  

_cons             14542.4*        10440.6*  
                   (2.47)          (2.39)  
--------------------------------------------
N                      74              74  
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

As you can see, the variables mpg and displacement are visually separated, counterintuitive to the content and interpretation (this is a valid concern, no?). What can I do to have these models

in the same table
with the variables that I identify as "interesting", but that are not in each of the models, at the top

i.e. like this:

Code:

--------------------------------------------
                      (1)             (2)  
                    price           price  
--------------------------------------------
mpg                -86.79                  
                  (-1.03)                  

displacement                        0.727  
                                   (0.10)  

weight              4.365***        4.613**
                   (3.74)          (3.30)  

length             -104.9*         -97.63*  
                  (-2.64)         (-2.47)  

_cons             14542.4*        10440.6*  
                   (2.47)          (2.39)  
--------------------------------------------
N                      74              74  
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

I am aware that I can put them next to each other at the bottom of the table, by typing

Code:

sysuse auto.dta, clear

estimates clear
reg price weight length mpg
estimates store reg1
reg price weight length displacement
estimates store reg2

esttab *

which gives

Code:

--------------------------------------------
                      (1)             (2)  
                    price           price  
--------------------------------------------
weight              4.365***        4.613**
                   (3.74)          (3.30)  

length             -104.9*         -97.63*  
                  (-2.64)         (-2.47)  

mpg                -86.79                  
                  (-1.03)                  

displacement                        0.727  
                                   (0.10)  

_cons             14542.4*        10440.6*  
                   (2.47)          (2.39)  
--------------------------------------------
N                      74              74  
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

However, I would find it very odd to find these interesting variables wedged between the control variables and the number of observations, r^2, etc. (no?), so what can I do?

↧

Are there any hidden consequences of using/not using variable prefixes?

December 15, 2016, 9:56 am

≫ Next: Data Formatting (reshape?)

≪ Previous: order of independent variables in -esttab- command

Hello,

I couldn't find this topic elsewhere, so I thought that this may be a good place to ask. When I was learning to use Stata, I was under the impression that Stata had a time-saving feature - specifying variable types with "i." or "c.". These things didn't change the model, but they were helpful in a variety of situations. But when I was working today, I noticed that while the coefficients didn't change, the intercept for the model changed (see below).

. regress srh4 age sex

Source | SS df MS Number of obs = 1347
-------------+------------------------------ F( 2, 1344) = 30.39
Model | 39.2533833 2 19.6266917 Prob > F = 0.0000
Residual | 868.127463 1344 .645928172 R-squared = 0.0433
-------------+------------------------------ Adj R-squared = 0.0418
Total | 907.380846 1346 .674131387 Root MSE = .8037

------------------------------------------------------------------------------
srh4 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0095944 .0012341 -7.77 0.000 -.0120154 -.0071733
sex | -.0261193 .0439863 -0.59 0.553 -.1124086 .06017
_cons | 3.436525 .0929329 36.98 0.000 3.254215 3.618834
------------------------------------------------------------------------------

. regress srh4 c.age i.sex

Source | SS df MS Number of obs = 1347
-------------+------------------------------ F( 2, 1344) = 30.39
Model | 39.2533833 2 19.6266917 Prob > F = 0.0000
Residual | 868.127463 1344 .645928172 R-squared = 0.0433
-------------+------------------------------ Adj R-squared = 0.0418
Total | 907.380846 1346 .674131387 Root MSE = .8037

------------------------------------------------------------------------------
srh4 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0095944 .0012341 -7.77 0.000 -.0120154 -.0071733
|
sex |
female | -.0261193 .0439863 -0.59 0.553 -.1124086 .06017
_cons | 3.410405 .0676436 50.42 0.000 3.277707 3.543104
------------------------------------------------------------------------------

In this case age was a continuous variable (18-89), sex was binary (Male/Female), and srh4 was a four-point continuous scale. After some playing around, I realized that the i. prefix made the sex variable dummy (0/1) which is why the constant changed. I had never noticed the issue before because I had routinely dummy coded variables, but just happened to not do so in this case.

But this got me curious, are there other variable prefixes that produce substantive changes in a Stata model? Is there a "best practice" in using variable prefixes?

Cheers,

David.

↧

Data Formatting (reshape?)

December 15, 2016, 10:01 am

≫ Next: Appropriate way of calculating growth rates in a interval panel regression

≪ Previous: Are there any hidden consequences of using/not using variable prefixes?

Hello, All.

I have been trying to format my data in a particular way but has not been able to do so. You help will be highly appreciated. So the problem basically is that I have a data that looks something like this..

Array

I want my data to look something like this..
Array

This could be done using the channel code, where each channel code represents different type of aid. How can this be done? Anyone?

Thanks,
BA

↧

Latest Images