Quantcast
Channel: Statalist
Viewing all 72807 articles
Browse latest View live

Suest after xtivreg2

$
0
0
Dear All,

I need to compare coefficients from 4 different regressions post xtivreg2. I am running:

quietly eststo: xtivreg2 cog x (vol=iv1) if (popfin==1), i(id) fe first
quietly eststo: xtivreg2 cog x (vol=iv2) if (popfin==1), i(id) fe first
quietly eststo: xtivreg2 cog x (vol=iv3) if (popfin==1), i(id) fe first
quietly eststo: xtivreg2 cog x (vol=iv4) if (popfin==1), i(id) fe first

I wish to compare the coefficients on vol from the four regressions above. I tried the following:

suest est1 est2 est3 est4, vce (cluster id)

but got the following error:
unable to generate scores for model est1
suest requires that predict allow the score option
r(322);


From researching further it seems (I am not 100% sure on this) suest requires predicted scores and xtivreg2 doesn't allow scores. So is there an alternative way to compare the coefficient on vol from the 4 estimates regressions above? Many thanks.

Sincerely,
Sumedha Gupta.

topic test

Simulating multivariate normal distribution with mean of e(B) and variance/covariance e(V)

$
0
0
Hi everyone,

I am working on implementing a parametric bootstrap simulation where I have B, a vector with several parameter estimates from a regression model, and e(V), a variance/covariance matrix. The book I'm using says that "each entry in the simulated sample is a random draw from a multivariate normal distribution with mean [e(b)] and variance/covariance [e(V)]." I've subbed in stata matrix names where appropriate.

How would I implement this type of analysis in stata? The bootstrap command seems to be inappropriate in this scenario since it is non-parametric.

Trouble determining which variables have significant interactions for an imputation model

$
0
0
Dear All,

I would be very grateful for your help. I am developing a chained imputation model using 14 categorical and continuous variables. I am imputing 6 variables with 2-20% missing data. I have read the Stata 13 manual and understand the Stata script for specifying the model. I am having trouble working out which variables have significant interactions between them. In the Stata imputation manual and recommended papers on multiple imputation there are many sections on how to put interactions into models but I can't find any advice on how to decide which variables have significant interactions in the first place and I am getting rather frustrated with it!

Many thanks for your time

Andrew Rosser

insignificant interaction term - should we still look at the marginal effects?

$
0
0
Dear Statalist,

I have a question that is maybe not so much about Stata as it is about statistics in general.


I run a multivariate regression model (prais) and include an interaction effect between two continuous variables. Let’s call them X and Z. (c.X#c.Z)
The output tells me the interaction term c.X#c.Z is not significant. P>(t) = 0.470

Can I/should I proclaim that there is no interaction, or should I still look at the marginal effects?

When I look at the margins, I get the following:
margins, dydx(X) at( Z=(0(1)10)) vsquish


------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
X |
_at |
1 | -.0272084 .0156325 -1.74 0.082 -.0578475 .0034307
2 | -.0252041 .0131056 -1.92 0.054 -.0508906 .0004824
3 | -.0231998 .0106948 -2.17 0.030 -.0441613 -.0022383
4 | -.0211955 .0084996 -2.49 0.013 -.0378544 -.0045366
5 | -.0191912 .0067341 -2.85 0.004 -.0323898 -.0059926
6 | -.0171869 .0058046 -2.96 0.003 -.0285638 -.00581
7 | -.0151826 .0061058 -2.49 0.013 -.0271497 -.0032156
8 | -.0131783 .0074905 -1.76 0.079 -.0278595 .0015028
9 | -.0111741 .0094961 -1.18 0.239 -.0297861 .007438
10 | -.0091698 .0118104 -0.78 0.438 -.0323177 .0139782
11 | -.0071655 .0142841 -0.50 0.616 -.0351618 .0208308
------------------------------------------------------------------------------

Or should I say, based on this, that the two variables actually interact; that at some values of Z (3-8), the effect of X on Y is actually moderated/influenced by Z.
I thought that after looking at the (insignificant) interaction term in the output, there is no need to investigate further.

Would the conclusion/interpretation change if _all 11_ values (not just 3-8 but 1-11) were significant, while the main interaction term in the multivariate regression remains insignificant?

Thank you,

Alex

Stepwise ZIP and ZINB models

$
0
0
Greetings everyone,

I wanted to ask if there is a way to perform stepwise (or, for that matter, any other variable selection method) zero-inflated poisson or negative binomial regressions, as the stepwise command is not supported by ZIp and ZINB.

Thank you in advance for any information and advice,
Thanos

Combining 5 dataset into one

$
0
0
Hi there!
I am having trouble trying to xtset my dta file in stata. I have five home countries, and each countries have data for 70 host countries from the year 2001 - 2012. after i stacked all the dataset (5 home countries), and xtset, the answer given was repeated time values within panel r(451). The example of the data is this :

year home country host country gdp export import
2001 1 1 44 55 45
2002 1 2 45 56 34
2003 1 3 45 66 43
2004 1 4 54 56 66
2001 2 1 67 98 76
2002 2 2 55 76 53
2003 2 3 67 35 13
2004 2 4 89 47 66
2001 3 1 45 56 77
2002 3 2 34 56 87
2003 3 3 7 44 67
2004 3 4 78 67 88

how can i arrange the data so that i can xtset country year?

Thank you so much

Matching Pairs and Indicating Common Characters

$
0
0
Dear All,

I am encountering a problem that I found really hard to solve. I am writing here for seek for some advices.
There are two datasets in my analysis. The first one contains interstate conflicts data. The data looks like (for example):
Conflict ID Involved Countries Side A or B
1 USA A
1 Canada B
1 Australia A
1 China B
1 Russia B
2 India A
2 Thailand B
2 Bangladesh B
(A: Conflict starter. B: The other side of the conflict)
My aim is to pair all the A and B side within each Conflict ID. Is there any quick way that I can use to pair them?

The other dataset includes Country's the certain information.
Country Characteristics
USA HAHAHA
USA LALALA
USA WAWAWA
Canada LALALA
Canada NONONO
China HAHAHA
China YAYAYA
Australia HAHAHA
Russia YAYAYA
My aim for this dataset is to identify whether two country share the same characteristics.

Finally, I would like to combine these two dataset to create a new dataset indicating a paired country with
a dummy indicating whether they share the same characteristic. The ideal data structure look like:
Conflict ID Pair ID Country Side Character Dummy
1 1 USA A 1
1 1 Canada B 1
1 2 USA A 1
1 2 China B 1
1 3 USA A 0
1 3 Russia B 0
I understand this is quite long and a bit confusing question! I really appreciate any help from here!
If there is anything unclear, I will be more than happy to explain!

Thank you in advance for your kind help! I look forward to hearing from you!

Best regards
Long







Counting number of times an observation appears within a group

$
0
0
Hi all, I am a new user to STATA and using a dataset with ~9000 observations and two main variables of interest: ir_no and ir_no_1. I am trying to count the number of times a given value of ir_no_1 appears for each ir_no. Here is a mock table of what the data looks like with the variable I am trying to make ("new_var").
ir_no ir_no_1 new_var
111 abc22 2
111 abc22 2
111 abc11 1
222 abc33 2
222 abc22 1
222 abc33 2
I tried the following code, but something must be incorrect since it gives me the total number of observations per group:
:
gen calc = .
​foreach i in ir_no_1 {
    by ir_no: egen calc2 = count(ir_no_1) if ir_no_1 == `i'
    replace calc = calc2 if ir_no_1 == `i'
    drop calc2
    }
Is there an easy way to do this? I looked around the forums for a quick solution, but have been unsuccessful. Thanks in advance for any advice.

help fixing my code

$
0
0
sort visitlink daystoevent
by visitlink: egen transfer_admit=1 if pt_int=1 & daystoevent>min(daystoevent)


visitlink is the patient ID
daystoevent is the number of days that help identify the chronological order of admissions
lost is length of hospital stay

I have noticed that some patients were admitted to one hospital and transferred to another hospital (both have different IDs) and I have identified the patients of interest (pt_int=1) based on the presence of some variables which were positive in both admissions. I am trying to figure out a code that would help identify the observation with the least daystoevent.


daystoevent los visitlink pt_int
15146 4 1888 1 <--This is the index admission
15150 17 1888 1 <--this is the transfer admission for the same patient notice daytoevent +los in the previous observation = to daystoevent here
13275 7 6334 1
18034 8 6676 1
13187 1 6687 0 <-- (previous admission to the index admission)
13244 4 6687 1
17238 30 9647 1

non-parametric propensity score matching

$
0
0
Can propensity scores generated through non-parametric measures be used for matching measures such radius matching or nearest-neighbour matching? I have seen the use of non-parametric propensity scores for reweighting but not for matching.

Delete each 10th observation:panel data

$
0
0
Dear Statalist member,

I have already asked this question but have not a response. Probably it was lost...

Let me ask you again hoping for some help as I am struggling to implement it several weeks.

I have panel data for 100 stocks. I have generated variable Trade_1. It takes value 1 for buy signal and -1 for sell.

My next step is to generate Trade_2. It is almost the same as Trade_1. The only difference is that I need to replace any signal with =. for the following 10 days.
In other words after a first buy/sell signal one should wait for the next 10 days and ignore any signals within this period.

I tried to do the following but it is wrong and does not work....

bysort _j (day_1): replace trade_2=1 if [f1.trade_1 +f2.trade_1 +f3.trade_1 +f4.trade_1 +f5.trade_1 +f6.trade_1 +f7.trade_1 +f8.trade_1 +f9.trade_1 +f10.trade_1 ]>0
bysort _j (day_1): replace trade_2=-1 if [f1.trade_1 +f2.trade_1 +f3.trade_1 +f4.trade_1 +f5.trade_1 +f6.trade_1 +f7.trade_1 +f8.trade_1 +f9.trade_1 +f10.trade_1 ]<0

May you please suggest how to delete each subsequent observation after the first signal?

Thank you.


Array

Country variables and fixed effects

$
0
0
Hi, members! I am using a cross-section data from a survey, where I am studying the effects of regime types on work ethic. My variables of interest are the regime type and individual controls (such as sex, age, etc.). When I create a dummy "country" to consider it a fixed effect, stata omitts some countries because of collinearity (that I believe that is collinear with the regime type). Can anyone explain me what should I do in order to control for unobserved heterogeneity?

Placebo tests

$
0
0
Hi everyone,

I have a question on how to run a placebo test in STATA.
I have read the following entry from the STATA Forum, but I am unsure the same framework applies to my case.

I am running a model to explain the role of experience on adoption of a strategy.
I have several control variables on the RHS, FE, and I included past decisions on the strategy to proxy the importance of DIRECT experience.
I also include past decisions on the strategy of neighbors to proxy the importance of INDIRECT experience
So, the basic model I am estimating is as follows :

Y_it = Y_it-1 + Y_jt-1 + Controls + Fixed Effects with i different from j

I would like to run a PLACEBO test to show that INDIRECT experience is really influencing adoption, but it is unclear if I am proceeding in a correct way.
I runned the following regression:

Y_it = Y_it-1 + Y_jt-1 + Controls + Fixed Effects + Y_it+1 + Y_jt+1 with i different from j

If INDIRECT experience at time t-1 matter, I should find that Y_jt+1 is not significant. Correct?

However, I believe the same cannot be done for the DIRECT experience (Y_it+1).
Being an endogenous choice, I found that Y_it+1 is significant.
But this seems reasonable because if I adopt the strategy at time t (because I have done in t-1, so that experience matter), I am also likely to adopt in t+1.

Am I correct in this?

Suggestions/comments would be extremely appreciated.

Many thanks

Fabio

How to use spmat command on panel data?

$
0
0
Hello Statalist users,

I am trying to create a spatial weighting-matrix based on the inverse distance with spmat command:

"spmat idistance W x y, id(id) dfunction(dhaversine)normalize(spectral)"

As a result I get the information: "Two or more observations have the same coordinates" (r498). The reason is probably the fact that my dataset contains data on the same country for various years.
Is there any way around this problem? Or any other command that is working on panel data?

Thank you for your help!


generate histograms with descriptive statistics

$
0
0
Hi,

I got a question about the histogram command in STATA which I cant solve by googleing it or reading the STATA help article: is it possible to create histograms where there is a box inside the histogram with statistics about the variable of interest like mean, median, st. dev., skewness, and kurtosis and if there is such an option, how would the code look like?

Thank you very much!

Cheers,

Kurt

Panel data cointegration with xtwest

$
0
0
Hi all.

When I run the xtwest command on my panel data only part of the tests show that there is cointegration.
When I use the "bootstrap" option the significance of the cointegration tests are stronger. Should I use this option?

Thank you!

Comparing regression coefficients from separate logistic regression models

$
0
0
I am conducting an analysis using annual cross-sectional data of doctor office visits to assess trends in prescriptions over time. I have modeled the trends in prescriptions by using logistic regression where the outcome is whether or not the medication was prescribed at the visit and the predictor is time. I am essentially examining trends in continuing and new prescriptions for the medication, and have an estimate of trends for each over time (e.g., I have two odds ratios for time each from separate models). I would like to compare the trends for the new and continuing prescriptions but I can't figure out a way to compare these trends because the odds ratios for time are each in separate models. Is there a way I can compare the odds ratio for time in the two models to see if there are differences in the trends?

Interpretation of Multinomial Logit Model

$
0
0
Dear Folks,

I am running a multinomial logit model for my research. I am creating a categorical variables (dummies) for industries and for advisor.

First of all, how do we calculate the probability as most of the text books use some calculation or newer version of Stata will give us that probabilities straight away in order for u to interpret. If that's the case then how do we interpret? I used this command for the marginal effects.
margins, dy/dx (*) at means predict (pr outcome (2)))


My independent variable are choice of performance measure to be used either (Outcome 1) ROA exclusively, (Outcome 2) ROE exclusively, (Outcome 3) ROE & ROA jointly and Outcome 4( neither ROE nor ROA


X variables consists of the control variables market capitalization, volatility in ROA, volatility in ROE, industry dummies and some of advisor dummies

margins, dydx(*) atmeans predict(pr outcome (2) dy/dx Std. z P>z [95% Conf. Interval]
return on asset volatility (ROAV) -0.04423 0.026128 -1.69 0.09 -0.09544 0.006978
Return on equity volatility (ROEV) -0.32427 0.19455 -1.67 0.096 -0.70558 0.057037
board committee lb 0.097636 0.071001 1.38 0.169 -0.04152 0.236794
nominatee committee % lnc -0.00397 0.001804 -2.2 0.028 -0.00751 -0.00043
Leverage lev -0.00052 0.000323 -1.6 0.109 -0.00115 0.000116
Price to book ptb 0.00661 0.003876 1.71 0.088 -0.00099 0.014206
Market Cpaitalization lmc -0.04825 0.011329 -4.26 0 -0.07046 -0.02605
Bain (Dummy Advisor) 0.095134 0.070328 1.35 0.176 -0.04271 0.232973
Mckinsey (Dummy Advisor) 0.250295 0.058871 4.25 0 0.13491 0.36568
BG (Dummy Advisor) 0.121386 0.074217 1.64 0.102 -0.02408 0.266848
Towers (Dummy Advisor) 0.118271 0.059905 1.97 0.048 0.000861 0.235682
Mercer (Dummy Advisor) 0.591537 0.135875 4.35 0 0.325226 0.857848
Pwc (Dummy Advisor) 0.119648 0.064528 1.85 0.064 -0.00683 0.246121
Food Service Industry (Dummy) 0.046763 0.03126 1.5 0.135 -0.01451 0.108032
Customer Service Industry (Dummy) 0.047415 0.017949 2.64 0.008 0.012236 0.082594
Car's Industry (Dummy) -0.20241 0.046227 -4.38 0 -0.29301 -0.11181
Genearl Retailers (Dummy) 0.117296 0.02383 4.92 0 0.070591 0.164001
Aerospace Industry (Dummy) 0.084182 0.018928 4.45 0 0.047084 0.12128
Minning Industry (Dummy) -0.16032 0.027304 -5.87 0 -0.21383 -0.1068
Agriculture Industry (Dummy) -0.14285 0.022954 -6.22 0 -0.18784 -0.09786
Food court Industry (Dummy) -0.13902 0.021214 -6.55 0 -0.1806 -0.09744


global ylist
global xlist roev roav lb lnc lev pth lmc Bain (dummy advisor) Mckinsey (dummy advisor ) BG (dummy advisor) Industries dummy)..... etc
* Multinomial logit model with base outcome the most frequent alternative
mlogit $ylist $xlist

margins, dydx(*) atmeans predict(pr outcome(1))
margins, dydx(*) atmeans predict(pr outcome(2))
margins, dydx(*) atmeans predict(pr outcome(3))
margins, dydx(*) atmeans predict(pr outcome(4))



How does it work with the interpretation of dummy Bain advisor? Is it relative to all other advisors ? Do we have to find probabilities or Stata calculates for us?
I also think we can run industry and time effects together?

Here goes the code


Thanks,

Paired Regression (Possibly with IV)

$
0
0
Dear All,

I am writing to ask an econometrics question. I am running a regression on paired observations
with some paired covariates (characteristics shared by the pair)
and individual-specific covariates (characteristics not shared).

The data looks like this:
(very similar to my previous posting on matching in case you happened to see it)
Pair ID Country GINI Same Character Dummy GDP Pop Dep. Var
1 USA 0.2 1 100 100 5
1 China 0.5 1 80 500 6
2 USA 0.2 0 100 100 3
2 Russia 0.3 0 60 200 2
3 China 0.5 0 80 500 5
3 India 0.5 0 50 300 2
...
(GINI as my key independent variable)

With the unit of observation being a country pair, the econometrics model in my mind is:
DEPVARp = a + GINIp + DUMMYp + GINIp*DUMMYp + GDPp + POPp + errorp
(p stands for pairs)

But I do not know:
1) Is the model econometrically right?
2) how can I implement it in Stata?
Can it just be a normal OLS regression with a robust error clustered at paired level?
e.g. reg DEPVAR GINI DUMMY GINI_DUMMY GDP POP, cluster(PAIR ID)
OR, there is some special command for this particular econometrics model?

Intuitively, the paired regression should be quite different from the simple OLS at country level,
but I really do not know what the proper model should be.


The follow-up question is related to Instrumental Approach.
If I have valid instrumental variable for GINI, how can I implement it into the paired regression?
Is it the same as the usual procedure?
i.e. predict in the first stage and use the predict value for 2nd stage?

If there is anything that is unclear, I am very happy to clarify!
Thank you very much in advance for your kind help! I look forward to hearing from you!

Best regards
Long



Viewing all 72807 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>