Quantcast
Channel: Statalist
Viewing all 72781 articles
Browse latest View live

confidence intervals for hazard / survival functions obtained by logistic regression

$
0
0
Hello there,

it's a bit related to my older thread, so now I am using the approach for discrete time survival analysis described for instance here and here. This means I use an expanded dataset with one observation for each time an individual is at risk of experiencing the event and then run logit / cloglog regressions.

I can reproduce the graphs and tables provided by Stephen Jenkins for my dataset. However, I'd like to include confidence bands around my hazard and survivor rates, similar to e.g. the built in features from the sts graph, ci command.

1) How do I get confidence intervals out of the logit and cloglog regressions to include them in the graphs? is this even possible or does this require quantile regression?

2) here it says about late entry: "To fit models without frailty, you must drop all intervals prior to each subject’s entry to the study. For example, if entry is in period ei, you drop it if t < ei." I am not sure whether I understand this correctly; there are not observations with intervals for individuals before they enter the study anyway?

3) how do I address left-censoring in this setup? I have the information "how old" individuals were before they became at risk, but as pointed out before I want to concentrate the analysis only on the last 12 quarters that the event could occur.

thank you!


Odd diagonal line emerge when I draw a two-way line graph

$
0
0
Hi, I encountered a strange situation when I want to draw a two-way line graph using Stata. Besides the two lines which I desired, there is also a third diagonal line across the plot region, which I though was due to the second line. After carefully check my code, I still could not find the source of my problem. Could someone have a look and provide some guide, thanks in advance. I put the original data and code in the attachment.

Code:
 # delimit ;
twoway 
       (line ownersh_ownership2013        xlabel if xlabel>=20&xlabel<=58 ,lwidth(medthick))
       (line ownersh_ownership2011        xlabel if xlabel>=20&xlabel<=58 ,lwidth(medthick))  
    ,     
   xlabel(20 "20" 22 "22" 24 "24" 26 "26" 28 "28" 30 "30" 32 "32" 34 "34" 36 "36" 
          38 "38" 40 "40" 42 "42" 44 "44" 46 "46" 48 "48" 50 "50" 52 "52" 54 "54"
          56 "56" 58 "58"  ,
           angle(0))
   ylabel(.2 "20" .3 "30" .4 "40" .5 "50" .6 "60" .7 "70" .8 "80" .9 "90",grid)           
   title("")                           
   ytitle("家庭住房拥有率(%)",size(small)) // Homeownership
   xtitle("年龄",size(small))              // cohorts 
   legend (order(1 "2013年调查" 2 "2011年调查" ) 
           size(mall) region(lcolor(white)) col(1) ring(0) pos(5) )           
  scheme(s1color)
  saving(ownersh_by_wave,replace);  
# delimit cr
Array

Stata implementation for Cragg / Blundell's (true!) double hurdle models or bernoulli/lognormal mixture models

$
0
0
Hi everyone,

this is my first post on statalist - if there are rules/customs/traditions I am not adherring to just let me know and I will try to edit my post in such way.

I am currently dealing with a data set/regression where my desired DV is the amount of funding (in USD) obtained by startups. Naturally the DV contains a lot of zero's (~55%) and has a continuous distribution for y>0. Finding a suitable regression model for this particular type of data has been quite challenging.

Having read the relevant literature for some time now I think I managed to find two criteria I need to take into account:
  • Source of zeros: The zero values of my DV are of two types. "Unobserved" zeros for startups that simply do not want to participate in the market for funding (irrespectively of the "price") and "observed" or "true" zeros for startups that are looking for funding but do not receive any. My current understanding is that each requires a different stochastic model to estimate.
  • Correlation of participation and consumption decision: Should the model assume indepdendence or dependence between the decision to participate in the market and the decision how much each startup receives (if any) -> current hypothesis is that the model should assume dependence
(You can find further details about this in a question I posted on CrossValidated: http://stats.stackexchange.com/quest...-type-2-models)

In addition to estimating a tobit and tobit type 2 (Heckman) model which both do not account for both types of zeros I identified three potential models as the "perfect models" but haven't found any
stata implementations (and before coding this myself I thought asking here may help):
  • Cragg's Double Hurdle Model (indepedent), specifically equations (5) and (6) from his original paper (not his two part single hurdle model which often is referred to as "Double Hurdle"). The available craggit and churdle commands to unfortuantely (at least to my knowledge) "only" fit the single hurdle alternative with 1. probit, 2. truncated normal (Cragg, J.G., 1971, Some statistical models for limited dependent variables with applications to the demand for durable goods, Econometrica 39, 829-844.)
  • Blundell's double hurdle model which is essentially cragg's double hurdle model but assuming dependence between both hurdles (http://sites.psu.edu/scottcolby/wp-c...obit-model.pdf with application in Blundell, R.W., J. Ham and C. Meghir, 1986, Unemployment, and female labour supply, UCL economics discussion paper, forthcoming in the Economic Journal.)
  • Bernoulli/Lognormal Mixture Model for Censored Data described in Moulton, Lawrence H., and Neal A. Halsey. “A Mixture Model with Detection Limits for Regression Analyses of Antibody Response to Vaccine.” Biometrics, vol. 51, no. 4, 1995, pp. 1570–1578. www.jstor.org/stable/2533289)
I am grateful for any information that could lead me in the right direction with regard to the implementation of the 3 models in stata.

Greetings from Hamburg,
Jan

Editable graph from stata to excel format

$
0
0
Hi everyone,

I'm trying to export a graph (from stata) to excel, I want that this graph could be modified or edited in excel and has excel format (i.e, get an excel graph like the ones you create directly in there, from stata graph).
I already tried putexcel with .png and .wmf formats: the former non-editable and the later (after ungrouping) not in excel proper format.

Any thougts on how to get what I'm looking for?


Thanks in advance

Is there an in-script command to save .do files?

$
0
0
Hello:

I'm trying to figure out a way to archive local copies of my .do scripts as I run them. At the end of every one of my do files, I want to run a line of code that kicks out a version of the script stamped with the day I was working on it. My current code looks like this:


Code:
local today: di %tdCYND date(c(current_date),"DMY")

<body of script>

<dosavecommand> "`archive'\file name `today'.do", replace

Is there a command built into STATA that allows for me to save the do file?

Thank you!


Examples of textbook with good walkthrough examples?

$
0
0

Hello.

Looking for textbooks that contains both stata codes and datasets. Do you guys have some suggestions for textbooks?

Thank you for input on this.

Critical values in Stata

$
0
0
Hello.

For different tests, is it possible to access what the critical values for each test is based on some function in stata?

Thanks in advance.

Collapse by group and whole sample

$
0
0
How can I use collapse, to get mean for groups and the whole sample?

This code only gives me the mean wage for every category, but I need it for the whole sample:

collapse (mean) wage, by(category)

Can it be done with only one command or I have to do it in more steps and then merge the databases?

Margins and marginsplot vs marginscontplot

$
0
0
Hi! I run a linear mixed model with splines for age and. The model contains a 3-way interaction with c.agespline*##c.cov1##c.cov2. I want to make a plot with age vs cov1 for different values of cov2. Questions: is it possible to use margins when there is an interaction containing splines? I thought about using the marginscontplot (user written, p royston). However, this command does not seem to handle 3-way interactions. Or? And, finally, is margins, dydx() equivalent to marginscontplot, margopts (dydx())?

Panel data analysis: Random effects model - controlling for year and industry possible?

$
0
0
Hello everyone!

I am new to the forum and have been going through a lot of past post, but I just haven't found the right answers to the questions, I am facing as part of my master thesis project. I will be very grateful for any help and hints!

I have a data set with roughly 1200 observations across 99 companies and 13 years, so panel data. My two main IV's are "task-related diversity" and "relations-oriented diversity" (on the TMT) on my DV "short-term orientation". After intensive research and consultation with my supervisor, we came to the conclusion that a random, or fixed effects model (determined by the Hausmann test) would be the most suitable analysis. Results of the Hausman test for my data indicated random effects to be the right model. In addition, our supervisor suggested to use the command vce(robust) to account for heteroskedasticity and auto correlation.

So far so good, if I don't want to control for industry differences by including the industry dummies (11 industries -1, so 10 of them), everything works "fine" (insignificant but seem reasonable). However, for both some IV's (ROA or number of employees), as well as for the DV (short-termism), there may be differences across industries (certainly the means differ, I checked that with an ANOVA), so I want to control for that. So I tried to options:

1) Including industry dummies in the xtreg re vce(robust): This results in getting no test statistic (Wald Chi2), which may be because industry for a certain company doesn't change over time. So this isn't really an option.

2) My supervisor suggested to industry-adjust the performance IV's, which would be mainly ROA and then don't include any dummies. This does not seem very logical to me, because then I still don't account for industry differences on my DV, nor any other variables in the model. In addition, using the variable ROAt-3industryadjusted turns out highly insignificant, while ROAt-3 not adjusted is highly significant.

What would be possible options for me to stick to OLS random effects model (if possible as it seems the right choice) while controlling for year and industry clusters?

Please check my attached screenshots, I think this will make things a lot clearer. I am very lost and would highly appreciate any help! I hope I have explained things sufficiently.


1) Random effects model including industry dummies; no test statistics available (missing WaldChi2)
Array

2) Random effects model with industry dummies but without command vce(robust)

Array

3) Random effects mode without industry dummies, instead ROA industry-adjusted (as suggested by my supervisor):
Array

Rename variable with its own label

$
0
0
Dear all,

I am struggling with the following problem:
I have several datasets I want to append, but the variable names differ from one dataset to the next.
My variables are called very uniformative v1 to v300 and the information is caught in the lables, e.g. v1 ~ "Interviewdate". I would like to rename all variables by its label, but do not find the right command.

I have found the following loop, but I am not sure how to adapt the variable names:

foreach v of varlist _all{
local x= variable label `v'
rename `v' `x'
}

, but STATA tells me "variable not found". If I drop the word "variable", it tells me "label not found".

Could anybody tell me how to define it better, that it works for my case?

I would be really happy!

Best
Nadine

Esttab or Estout to output (in quality tables) the results of several factor analysis from a loop

$
0
0
Hello Everyone,

I am using Stata 14.2/IC and running a factor analysis with the same variables from 15 cycles of data collection. I know that the user written program
estout
by Ben Jenn (available at ssc) has the capacity to output results from factor analysis in several different formats. However I haven't been able to find examples of how to output results from several factor analysis into EXCEL sheets.
I have the following loop to conduct the factor analysis

Code:
levelsof Cycle, local(Cycles)

foreach Cycle of local Cycles {
factor m1r m13r m16r m19r m23r m2r m7r ///
m11r m18r m22r m26r m29r m3r m4r m8r m9r m12r m14r m25r m5r m10r m17r m20r ///
m24r m27r m6r m15r m21r m28r m30r if Cycle == "`Cycle'",factors(5)
  }

and I am trying to use the following code to output the results in EXCEL
Code:
esttab using "excel.csv", ///
cells("L[Factor1](t) L[Factor2](t) L[Factor3](t) L[Factor4](t) L[Factor5](t) Psi[Uniqueness]") ///
nogap noobs nonumber nomtitle
However, this code only outputs the results of one data collection cycle, and I want the results of every data collection cycle outputted in different sheets of the same EXCEL document.

Do you folks have any recommendations on where I can find an example code as a guide?

Thank you
Patrick

Help with Reshaping Data: Long to Wide ?

$
0
0
Hello, Good Morning. I need some help in converting this data from long to wide format
so that I can have one record for each respondent.

The dataset has the following types of variables:

Resp: is the ID for each respondent

Tie: is the ID for each of the respondents contacts. Note each respondent can have a different number of non-overlapping contacts.

TieType: an indicator of the type of relationship the respondent has with their contact
(1=sex; 2=social; 3=family; 4=boss). So respondent 1 for example has two types of relationship with Contact_1:
they both a sex and a social relationship.

Gender_Resp & Gender_Tie: Indicates the gender of both respondent and their contact (tie)

Resp Tie TieType Gender_Resp Gender_Tie
1 1 1 1 (Male) 1 (Male)
1 1 2 1 (Male) 2 (Female)
1 3 3 1 (Male) 1 (Male)
1 4 4 1 (Male) 2 (Female)
2 6 1 1 (Male) 1 (Male)
2 8 2 1 (Male) 2 (Female)
3 34 1 2 (Female) 1 (Male)
3 15 1 2 (Female) 2 (Female)
3 100 2 2 (Female) 1 (Male)
3 100 2 2 (Female) 1 (Male)

Now, I want to create the following types of variable per each respondent.

-- N_Contacts: The absolute number of contacts they have: So Respondent_1
will have 3 contacts (#1, 3, 4); same as Respondent_3 with 3 distinct contacts

-- Mixed_Rel: The extent to which the respondent engages in a mixed relationship
(defined as sex + social) with his/her contacts. So, Resp_1 has a sex+social relationship
with 1 out of 3 his contacts; while Resp_2 and Resp_3 do not engage in any mixed relationship with
any contact.

-- Prop_SameGender: I want to find out the proportion of a respondents contacts that have same gender as he/she.
So, Resp_1 has same-gender relationship with 2 out of his 3 contacts (with contacts 1 & 3).


I would be very grateful for some pointers to create this summary file per respondent.

Thanks in advance.. Yy


Multi-level mixed-effects ordered logit model - not concave

$
0
0
Hi everyone,

this is my first post, so please forgive me if I make any formal mistakes.

I'm using Stata 13 and I want to fit a random coefficient ordered logit model with two levels, being country and individual. The dependent variable is a scale from 1 to 3.

Code:
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
membership~w |     29204    2.411485     .726094          1          3
The random effects are at the country(28) level and there are no independent variables, since I just want to see the variance components. The command I run is
Code:
meologit membership_view || country:
However, Stata cannot find a solution, even with the difficult option it always returns (not concave).

Code:
Fitting fixed-effects model:

Iteration 0:   log likelihood = -28232.469  
Iteration 1:   log likelihood = -28232.469  

Refining starting values:

Grid node 0:   log likelihood = -27965.498

Fitting full model:

Iteration 0:   log likelihood = -27965.498  (not concave)
Iteration 1:   log likelihood = -27633.807  (not concave)
Iteration 2:   log likelihood = -27515.899  
Iteration 3:   log likelihood = -27482.163  
Iteration 4:   log likelihood = -27478.088  
Iteration 5:   log likelihood = -27477.729  (not concave)
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
Iteration 6:   log likelihood = -27477.728  (backed up)
Iteration 7:   log likelihood =  -27477.63  (backed up)
Iteration 8:   log likelihood =  -27476.52  
Iteration 9:   log likelihood = -27476.469  (backed up)
Iteration 10:  log likelihood = -27476.443  (backed up)
This would be no problem in itself, but anova shows that there is significant variance at the country level:

Code:
. anova membership_view country

                           Number of obs =   29204     R-squared     =  0.0563
                           Root MSE      = .705677     Adj R-squared =  0.0554

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  867.130113    27  32.1159301      64.49     0.0000
                         |
                 country |  867.130113    27  32.1159301      64.49     0.0000
                         |
                Residual |  14529.0579 29176  .497979775   
              -----------+----------------------------------------------------
                   Total |   15396.188 29203  .527212547
Does anyone know why Stata is not able to fit the model? I have tried robust standard errors as well, but that doesn't make any difference.

Thanks in advance.

2 waves of data in one file, how do i split them up

$
0
0
I have 2 waves of data (year 1 and year 2) and I want to use the year 2 dependent variable (DV), with the year 1 independent variables (IV's).

How do I separate these out?

I currently I have run the following

encode started, generate (wave)
replace wave = 1 in 1/411
replace wave = 2 in 412/764
tab wave, gen(wave_)

So now I have 2 waves, but I don't know how to pick the wave 1 IVS and wave 2 DV

Thank you for your input

Kind Regards.
Thomas















Help with mediation analysis

$
0
0
Hi,

I am new to the form. I have a dataset in which I am examining the effects of one (new) blood pressure treatment versus another (old) treatment in a clinical trial. I am using artery thickness as the outcome measure. We have also measured blood pressure at regular intervals through the trial.

Could someone suggest a way for me to measure the degree to which the effect of the drug on artery thickness is mediated through reduction in blood pressure, and not through another unmeasured mechanism?

I have examined the effects on artery thickness across tertiles of blood pressure, however I would like to quantify this.

Install comand transcolorplot

$
0
0
Hi, Stata users!
I would like to use the command transcolorplot introduced by Van Kerm in this paper http://medim.ceps.lu/stata/transcolorplot03.pdf . Following his advice, I tried to install the command by writting on Stata
ssc install transcolorplot
but Stata replied
"ssc install: "transcolorplot" not found at SSC, type -findit transcolorplot-
(To find all packages at SSC that start with t, type -ssc describe t-)".
Could anyone inform me if there is other way to install this command?

Matching two variables

$
0
0
Hello everyone, I hope there is anybody who can help me.

I have got the following problem:

I have got a panel data set consisting of 18 waves. Every individual has got a number called "pid". The variable "mpid" shows the "pid" of the individual´s mother. Then I want to link these two. Finally I want to have a data set which consists just out of the individuals and their mothers (those marked for example with a dummy which is one if the individual is a mother and zero otherwise). All other individuals should be eliminated.

Stata question: how to use -margins- with -xtnbreg-

$
0
0
I'm now doing my final project, in which I like to study the interaction effect between level of urbanization (percentage of urban population) and level of democracy (polity2 score: -10 to 10) on strike incidence (country-year count). I run state like this after sorting panel data:

xtnbreg strike c.urban##c.polity2 (other control variables), fe

then I want the marginal effect of urban at different level of polity2, so I run:

margins, dydx(urban) \\Actually I tried all kinds of things here but nothing works.

But the state just keep saying "default prediction is a function of possibly stochastic quantities other than e(b)". I've been tortured for two days by this, searching lots of solutions but still no clue. I'm now posting my question here for suggestions and advice. Thank you all!

Problems with statsby and logit

$
0
0
Hello,

I am new to Stata and I have a question for which I can not find a solution online. Any help would be highly appreciated

I have to replicate a regression table where the regression coefficients are obtained by averaging the coefficients from annual logit regressions over a given sample period. My sample consists of 21,534 firm-year observations (panel data).

I tried to compute the values by using the statsby-command. Unfortunately, Stata freezes when I enter the following code:

use "UK_reg", clear

statsby _b, by(fyear) verbose nodots: logit payers pcrank mb growth prof1 ecem

list

sum _b_pcrank
sum _b_mb
sum _b_growth
sum _b_prof1
sum _b_ecem

However, if I replace "logit" by "reg", I obtain the desired results. How can I get my code working for "logit"?

Thank you very much in advance for your help.

Regards,

Adriana
Viewing all 72781 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>