Quantcast
Channel: Statalist
Viewing all 72862 articles
Browse latest View live

Problem with append

$
0
0
Hi

I am trying to append multiple files together using the below code
Code:
 use "C:\Users\abc\file1.dta", clear

append using "C:\Users\abc\file2.dta", force
However, some of the labels of a specific variable of interest in the resulting appended file go missing.

Why is this?

Relative Risk Ratio Calculation Help

$
0
0
Hello

I would like to compare two relative risks to see if they are different from one another.

I have data on stunting (yes/no) and income status (high/low) for different rounds (R1 - R4) of a survey.
I would like to see if the gradient of stunting by income status changes in different rounds of the survey. This could be between R1 and R2, or R1 and R4 etc.


Example data:

Round 1
***************** Stunted*******Not Stunted****Total
Low Income 50 30 80
High Income 70 50 120

RR of stunting low income vs high income = 0.625/ 0.583 = 1.07

Round 2
****************** Stunted******** Not Stunted****Total
Low Income 40 40 80
High Income 30 60 90

RR of stunting low income vs high income = 0.5 / 0.33 = 1.515

RRR between R1 ad R2 = 1.07 / 1.515 = 0.706

I have been using the cs command to get RR with confidence intervals for each round, but need help comparing the RR between different rounds, to calculate CI and p value.

Ive even done it by hand but it took ages! Sorry for my ignorance - There must be a simple way to do this!!!

Many Thanks

Joe

Using different currencies in panel dataset

$
0
0
Is it correct to use national currency data in country panel data set?

cap of options allowed in Stata program - how to incorporate [if] and [pweight/] when using * to get around the problem

$
0
0
Dear statlists,

When I try to add one additional option into a Stata program, I get an "invalid syntax" error from Stata (14.1). I did a search online and find a post in Stata FAQ: http://www.stata.com/support/faqs/pr.../option-limit/. When I follow the guidance in the post, the extra option is allowed into the program. However, the [if] and [pweight/] I specified does not work in this case. How could I solve this issue? Below is a test code. Thanks much in advance!

Code:
clear
set obs 10
gen i = mod(_n,2) // 0 for half of obs, 1 for other half
gen w = runiform() // random weights

capture program drop mytest
program define mytest
    #delimit ;
    syntax [if] [in] [pweight/] [, opt1 opt2 opt3 opt4 opt5 opt6 opt7 opt8 opt9 opt10 opt11
     opt12 opt13 opt14 opt15 opt16 opt17 opt18 opt19 opt20 opt21 opt22 opt23
     opt24 opt25 opt26 opt27 opt28 opt29 opt30 opt31 opt32 opt33 opt34 opt35
     opt36 opt37 opt38 opt39 opt40 opt41 opt42 opt43 opt44 opt45 opt46 opt47
     opt48 opt49 opt50 opt51 opt52 opt53 opt54 opt55 opt56 opt57 opt58 opt59
     opt60 opt61 opt62 opt63 opt64 opt65 opt66 opt67 opt68 opt69 *]
    ;
    #delimit cr
    if "`exp'"!="" local pw "[`weight'=`exp']"
    local 0 `"`if' `in' `pw', `options'"'
    syntax [if] [in] [pweight/] [, opt70 opt71]
    di "`if'"
    di "`weight' `exp'"
    summ i
    
end

mytest if i==1 [pw=w], opt1 opt72

Opening graphs (.gph) generated in Stata 14.1 on Stata 12

$
0
0
Hello!

I am having some trouble with using Stata generated .gph files across different versions of Stata. The Stata at my University computer was just updated to version 14.1. However, on my home computer I have Stata 12. Often, I need to open graphs that I generate at University on my home computer, to edit the headers, etc.

I am having trouble opening in version 12 a .gph generated by version 14.1. This is reminiscent of a similar issue with datasets (.dta) in that one couldn't open in version 12 .dta files generated in version 13.1 using the "save" command.

To deal with the problem with using datasets across versions there is a "saveold" command. Is there something similar for graphs? How can I ensure graphs created in Stata version 14 can be opened in version 12?

Thanks for your help!
Best,
Sanket

Power analysis on factorial ANOVA

$
0
0
So conducting a power analysis on two groups is simple:

Code:
power two means m1 m2 sd1 sd2
but how about in a 2x2 design?

What could I do if I need more variables exceeding Stata limitation?

$
0
0
Hello!
I need to do a regression analysis and control the fixed effects of analysts and firms. But the total number of analysts and firms exceed 32,767 (Stata variables limits)
Because of this I cannot run a regression as follows:
xi:reg y x i.analyst i.firm

Do anyone know how can I do?

Many thanks in advance!

Labels when using append

$
0
0
hi,

i have a number of data sets of the same survey in different years. In each of these data sets I create a combination variable xy using the below code:
Code:
egen xy = group(x  y), label
so the new variable takes the labels of x and y. However, the underlying values differ in each year depending on which combinations are available. For example, in Year 1, there may be 350 combinations whereas in Year too only 340. Therefore, values 341-350 won't exist in Year 2.

I then need to append all the data sets together but when I do so my labels get mixed up and I am even missing some labels in each year. How can I solve this problem?

Thanks in advance.

S2S methodology for VECM using JMulti

$
0
0
Good morning,

I tried to use the two stage procedure for the VECM proposed by Lutkepohl which seems to have a better asymptotical property in the small sample compare to the Johansen approach. However, the former methodology is not implemented in STATA and so I need to use the free software JMulti. My first impression is that the software is not very stable and compare also to Stata using the same procedure (Johansen) I get different coefficient value and sign.
Is there any one who tried Jmulti?

Regards,
Francesca

How to display 4 digits data in summary of ES

$
0
0
When I perform a meta analysis, I find that the summary of ES(0.95ES) only display 3 digits data. However, since the results are so small that it only display 0.00, can someone help me to find out how to display 4-5 digits data (0.0000) in summary. Many thanks.

Propensity score matching

$
0
0
Dear Experts,

Can you please help me with the following issue.

I have a sample where some observations experience a shock, let's say "t". So, what I want to do is to create a control group given certain characteristics (x1, x2, x3, etc.) which will represent a treated sample. I run a probit model:

probit t x1 x2 x3 x4 i.industry i.year, vce(cluster id)

then "predict prop_score"

what I get: for each firm in the treated sample I get a firm with more or less similar characteristics. Finally, what I want to do is to run diff-in-diff given treated and control sample, say:

y= t post t*post x1 x2
where post is 1 if observation falls in one year after the treatment ( zero otherwise)

QUESTION:
1. is it right what I am doing?
2. How an I create a control sample which will have more matches, say for one firm (treated group) 2/3 control firms?

Please, help me with this issue

Prediction interval following logistic regression

$
0
0
There seems to be quite a lot of debate over this issue but I thought I'd try to get some comments specific to my situation. I have posted a similar question on StackOverflow (http://stackoverflow.com/questions/3...ssion-in-stata) but was recommended to come to Statalist.

I'm using Stata 12 on a Mac. Basically, I have a dataset that is collected from subjects every day (excluding weekends). The data is a simple binary response. Therefore, on each day there are positive (1) and negative (0) responses. My workflow usually includes data manipulation in Python and Pandas and then exporting to CSV ready to import into Stata. Before creating the CSV I calculated the odds (and ln odds) and probability of success on each date. The structure of the data then looks like:

Code:
      date  resp  freq  total        prob        odds       lnodds
2015-01-02     0    14     16   0.125       0.1428571    -1.94591
2015-01-02     1     2     16   0.125       0.1428571    -1.94591
2015-01-05     0    14     15   0.0666667   0.0714286    -2.639057
2015-01-05     1     1     15   0.0666667   0.0714286    -2.639057
The whole dataset covers about 18 months.

The data shows a clear annual seasonality and so I calculated sin(2*pi*date/365) and cos(2*pi*date/365) variables and ran the following command:

Code:
logit resp c.date c.sin c.cos [fw=freq]
After the logistic regression, I calculated the linear prediction using:

Code:
predict lrhat, xb
...and calculated the raw residuals as the ln odds minus the linear prediction.


I plotted the linear prediction against date and overlaid the ln odds for each day. There might be a bit of tweaking of the variables required but generally the prediction seemed to fit the data pretty well. The raw residuals also showed a normal distribution.

Array


To me, it seems reasonable that I should be able to calculate a prediction interval (not a confidence interval) around the linear prediction to estimate the ln odds of success on any given date (and hence convert to probability). However, Stata doesn't seem to allow this. The discussions I've read suggest that this is because any individual outcome can only be binary, 0 or 1. Which, undoubtedly is true. But the logit link function allows us to convert binary outcomes to odds in the first place so we can model the data with logistic regression. Why can't the same reasoning be applied to the post-estimation situation?

I've attempted to calculate a prediction interval (PI) by calculating standard deviation (SD) of the raw residuals and plotting the linear prediction ± 1.96*SD. This produces a plot where the ln odds of success falls outside the PI on only a small number of occasions (as expected). This produces a PI that is constant with time despite the variation for the first half of the data being greater than the second half, which is a bit of an issue.

Array

So, my three related questions are:
1. Why is this wrong?
2. Why doesn't Stata allow me to calculate PIs automatically?
3. How can I improve this plot?

First Difference Random Effects Model

$
0
0
Hello Statalist-Community,

I have a question regarding non-Stationarity. I'm running a Random Effects Modell with my DV (ecological sustainability score) and several independent Variables (GDP-Growth Rate, Natural disasters per year, colonial heritage, debt history). Basically, the model includes time-invariant variables which only vary between countries (e.g. coloinial heritage), time-variant variables which vary between countries and time variant variables which don't vary between countries.

My panel is unbalanced and has gaps:

tsset GeLae year
panel variable: GeLae (unbalanced)
time variable: year, 1973 to 2013, but with gaps
delta: 1 unit

I filled that gaps with the command tsfill.



In order to check for regression diagnostics I first checked for Stationarity. Therefore, I preformed the xtfisher-test for every variable (excluding time-invariant variables):

Here is an example:


Fisher Test for panel unit root using an augmented Dickey-Fuller test (0 lags)

Ho: unit root

chi2(48) = 16.6102
Prob > chi2 = 1.0000

This means that there is unit root because I have to reject the Ho. In order to achieve stationarity I transformed this variable using first differences (gen debt_histd = d.debt_hist) :

Fisher Test for panel unit root using an augmented Dickey-Fuller test (0 lags)

Ho: unit root

chi2(46) = 204.2429
Prob > chi2 = 0.0000

This p-value means using first difference-Transformation results in the stationarity of this Variable. However, do I have to transform my dependent Variable, which is already stationary as well, when i preform first differences on some non-stationary independent variables?

Also I have included variables in the model which rarely vary over time but differ greatly between differents countries, if I transform these Variables into first differences I loose the country-specific attributes of them.

| constid
const | 0 1 | Total
-----------+----------------------+----------
0 | 33 0 | 33
1 | 52 1 | 53
2 | 161 5 | 166
3 | 130 0 | 130
4 | 46 1 | 47
5 | 90 0 | 90
-----------+----------------------+----------
Total | 512 7 | 519

Here you can see that the values of const has been greatly reduced, which is because it hardly varies over time. See the xtline output I appened. My solution for this problem would be to treat them as time invariant and not transform them.


In addition, if I add the drift option the test also says that there is no unit root:

xtfisher debt_hist, drift


Fisher Test for panel unit root using an augmented Dickey-Fuller test (0 lags)

Ho: unit root

chi2(46) = 66.9268
Prob > chi2 = 0.0236



What does that mean? Is it if I controll for a drift, then the unit-root is not significant, which means I have to controll for a drift in my regression? If that is correct how ist this possible in Stata? By adding a time trend? (c.year?)


Another question I have is about natural Disasters, which are recorded yearly on a global scale. The frequency of natural disasters is growing over time, while ecological sustainability of energy projects in Africa is also rising over time. Do I have to add a time trend in order to correct for that?

Thanks in advance for your answers. I'm currently writing my master-thesis and i have to hand it in soon. I would ask my professor but unfortunately he is on vacation. It would really be great if you could help me clarify things.

Best regards

Sebastian





error after running lclogitml

$
0
0
Dear forum members:

I am trying to run the lclogit command by Pacifico and Yoo (2012) and facing problem running it.
I started with a basic conditional model which produced significant coefficients. Then I tried the "lclogit" command on the same model specification for 2 and 3 classes, both of which worked fine. But after lclogit when I used "lclogitml" I got the following error:

"(error occurred in ML computation)
(use trace option and check correctness of initial model)
equation p2_1 not found"

I tried using a lot of different model specifications and none worked past the error message.

Then I used the following:

use http://fmwww.bc.edu/repec/bocode/t/traindata.dta, clear lclogit y price contract local wknown tod seasonal, id(pid) gr(gid) ncl(3) lclogitml, iterate(10) And surprisingly I got the same error message as before. Since I know that the code works with this specific dataset, I am confused about what is causing the error. (I have already installed lclogitml and gllamm). Any suggestion/help would be greatly appreciated. Thanks Anwesha

Applying legend from value labels of different categorical variables in a loop

$
0
0
Dear all,

Stata version: 14.1, updated.


I have done some google search with this problem and could not find anything but I am sure it has been discussed before. If anyone can direct me to a relevant link or provide me a solution for the problem below, it will be great.

Problem: I am trying to create Kaplan-Meier graphs on several categorical variables (coded 0/1) in a loop. How can I make the legends to automatically extract the value labels of the categorical variables and apply in a loop. The codes are below and the red fonts require attention. Thanks in advance.

Code:
gr drop _all
loc m=1

foreach var of varlist base help drug rand gender past trt {
    sts graph,by(`var')  ///
    ylab(0.5 (.1) 1, nogrid ang(hor)) xlab(0(10)80) xtitle(" " "Survival weeks") ///
    legend(order(1 "Legen group-1 of 1st var" 2 "Legend group-2 of 1st var ") region(col(white))) ///
    plot1opts(lpatt(solid) col(black)) ///
    plot2opts(lpatt(shortdash) col(black)) ///
    graphregion(col(white)) title("")
gr copy x`i',replace
loc ++m
}

Stata's sem builder, gsem option, control variables

$
0
0
Hi, everyone,

I'm new at SEM and I wonder if it is advisable to include several control variables (such as sex, age, education, size of the organisation, activity sector and so on) as can be done in logit or ologit regressions, where you can easily fit fairly large numbers of both main independent and control variables, as long as you have a good and large sample?

As it is now, my model already contains 30 observed and 8 latent variables - 5 of which are mediators, 2 are moderating/intermediate dependent variables, and one is the final dependent variable - and 39 paths, so it looks quite complex already. Besides, when I estimate different parts of the model separately, it takes quite a long time to produce the outcomes (several hours in some cases). Therefore, I wonder whether I should exclusively focus on the paths in my model or if I may as well consider the effects of control variables simultaneosuly.

If so, could anyone recommend a good and reliable source where I can find the details on how to approach the inclusion of the control variables using the sembuilder on its gsem mode to draw path diagrams?

Thank you in advance.

Multi level modelling using Gllamm, meprobit and xtmelogit/meqrlogit

$
0
0
Dear Stata users,

I am working on a research project where I want to apply multi-level model. I need some help from you regarding gllamm ,xtmelogit and meprobit commands in Stata.

My dependent variable is binary and dataset has two levels : individual and country. I want to see how individual as well as country level variables affect the dependent variable. The data was collected for 5 years .

Please tell me if the following command is correct to analyse if country's unemployment rate (UnEmp), country's GDP Per capita(GDPpc) and country's Net Immigration(Net _Imm) affect the dependent variable using multi level approach to analyze contextual effects.

gllamm dependent variable individual level variables GDPpc UnEmp Net_Imm, i(country)link(probit) family(bin) adapt nip(20)

I also want to know if using the following xtmelogit/meqrlogit or meprobit instead of using gllamm command will make sense or not in terms of interpretation of results for the same data .

xtmelogit dependent variable Independent variables Svy_Year1 Svy_Year2 Svy_Year3 Svy_Year4||Country:UnEmp, var

Or

meprobit dependent variable Independent variables Svy_Year1 Svy_Year2 Svy_Year3 Svy_Year4||Country:UnEmp , covar(unstructured)


Thank you so much.




outreg2 regression output

$
0
0
Hi there

I need some help with outreg2 command.

I have 10 models and I want all of them in A4 size page (landscape layout).

Will sincerely appreciate if anyone can provide me with the code.

I use the following code as of now:

outreg2 [model33 model34 model35 model36 model37 model38 model39 model40 model41 model42] using "Stata files/ERRORTWO regression (MAIN MODEL).doc", word replace label title("REGRESSIONS for ERRORTWO (MAIN MODEL)") symbol(***,**,*) alpha(0.01,0.05,0.1) keep(var1 var 2 var3 var4 var5 var6 var7 var8 var9 var10 var11)

Thank you

Yahya

How can I install dropmiss in STATA?

$
0
0
I am very new to STATA, so may I know how to install the dropmiss option in STATA.

writing ado file with panel data estimator

$
0
0
Hi! Does anybody have an example of how to write an ado file with an estimator for panel data? (for example a fixed-effects estimator) I am new to writing ado files and I am wondering how to make Stata understand the time- and panel dimensions from tsset or xtset. I am interested in both Stata and Mata codes. All comments are welcome!
Viewing all 72862 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>