Quantcast
Channel: Statalist
Viewing all 73176 articles
Browse latest View live

Why I am getting a r199 error? // are comments are not command

$
0
0

. //ed03= are you indigenous? yes (1) or no (3)
/ is not a valid command name
r(199);

.
. // no will be our reference group
/ is not a valid command name
r(199);

.
. tab ed03

PART OF |
INDIGENOUS/ |
ETHNIC |
GROUP? | Freq. Percent Cum.
------------+-----------------------------------
1 | 3,006 12.81 12.81
3 | 20,458 87.19 100.00
------------+-----------------------------------
Total | 23,464 100.00

Calcoulate significant difference between betas

$
0
0
I have a null hypothesis that Beta1 = Beta2 = Beta 3 = Beta 4 = Beta 5 = 0.

I want to calculate whether there are significant risk-differences in the portofolio for big 5 audit firms. Using regression I have Risk as an dependent categorical variable (1 to 3), big 5 audit firms as 5 independent dummy-variables, and some controll-variable like total assets, leverage, age, loss, etc.

To answer the null hypothesis I think I have to use the f-test. But if I want to look at the significant difference between Beta 1 and Beta 2 for example, how can I do this?

Here is my regression result from STATA.In our dataset we have both companies that are audited by big 5 auditors and Non-big 5 auditors.



graph fractions of not all values from a categorial variable conditioned on X with proprcspline ?

$
0
0
Dear users,

I want to show a discontinuity in my dependend categorial variable for two values of the variable. But don't know if I am using the right command. I tried
proprcspline and
cmogram and catplot but I the graph didn't look the way I want it to.

The variables I have are
Y: educend (1=15 years; 2=16 years, 3=other age) Variable is: Age left full time education
X: yearb: Year of birth

The graph I want should look like the one I attached, just with my data.

My most promising stata command so far was:
proprcspline educend yearb if yearb>=1947 & yearb<=1967, xlab(1947(5)1967)

But I want to have an xline in it, which is not possible using
proprcspline. It didn't work when I used the command. And i don't want to have the Label of the 3rd category.

Any help would be really great!

I like this forum and I will contribute to you guys

$
0
0
Hi,

My name is Tom and I'm a healthcare system pharmacist. Besides, I am a computer technology guy who am interested in statistics. Have a good day.

Tom

Quick panel data question

$
0
0
Hello,

How can I take the first difference of SwedenP by dates?

Array

Without taking xtset-command.
I've tried:
by date, sort: generate SwedenD = SwedenP - SwedenP[_n-1]
But I understand that the code only generate by the date with that function.


Best regards

Anton

piecewise growth curve

$
0
0
Hi everybody,
I try to model a growth curve for my data.
I try to observe differences in the trajectory of depression regarding the genetic status a participants (0-carier or 1-not carier). depression was measured at baseline (1), 2 month after result (2), 6 month (3), 1 (4) and 2 years (5) after results. I have read all is possible on LGC, and regarding the non linearity of my data, a piecewise model is the best solution.
I have tried tu run such a model but i'm clearly not sure about the syntax i have used. I'm interesting about the slope change between time intervals, and particularly if there is differences in the slope changes between carier and not carier of the gene.
Unfortunately, i have a troubling result.
the variables are :
depression : continuous
gene status : carier (1), non carier (0): categorical
Time : 1, 2, 3, 4 &5 : categorical

first i use mkspline to define 4 knots :
mkspline time1 2 time2 3 time3 4 time4 = Time, marginal
second, I ran the model with xtmixed
xtmixed dep porteur##time1 porteur##time2 porteur##time3 porteur##time4 || ID: Time, mle
I obtain the following estimates with warning message of colinearity.

. xtmixed dep porteur##time1 porteur##time2 porteur##time3 porteur##time4 || ID: Time, mle
note: 1.time2 omitted because of collinearity
note: 2.time2 omitted because of collinearity
note: 3.time2 omitted because of collinearity
note: 1.porteur#1.time2 omitted because of collinearity
note: 1.porteur#2.time2 omitted because of collinearity
note: 1.porteur#3.time2 omitted because of collinearity
note: 1.time3 omitted because of collinearity
note: 2.time3 omitted because of collinearity
note: 1.porteur#1.time3 omitted because of collinearity
note: 1.porteur#2.time3 omitted because of collinearity
note: 1.time4 omitted because of collinearity
note: 1.porteur#1.time4 omitted because of collinearity

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log likelihood = -2278.8898
Iteration 1: log likelihood = -2277.0752
Iteration 2: log likelihood = -2277.0463
Iteration 3: log likelihood = -2277.0456
Iteration 4: log likelihood = -2277.0456

Computing standard errors:

Mixed-effects ML regression Number of obs = 631
Group variable: ID Number of groups = 193

Obs per group:
min = 1
avg = 3.3
max = 5

Wald chi2(9) = 51.69
Log likelihood = -2277.0456 Prob > chi2 = 0.0000

-------------------------------------------------------------------------------
dep | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
1.porteur | -1.000888 1.544808 -0.65 0.517 -4.028657 2.026881
|
time1 |
2 | -2.529787 1.084861 -2.33 0.020 -4.656075 -.4034987
3 | 1.761583 1.164203 1.51 0.130 -.5202119 4.043378
4 | 4.017521 1.201632 3.34 0.001 1.662365 6.372676
5 | -.4507877 1.278628 -0.35 0.724 -2.956852 2.055276
|
porteur#time1 |
1 2 | 4.780729 1.634244 2.93 0.003 1.577671 7.983788
1 3 | -3.651737 1.792905 -2.04 0.042 -7.165766 -.137707
1 4 | .4716759 1.868767 0.25 0.801 -3.19104 4.134392
1 5 | 1.41006 2.004701 0.70 0.482 -2.519083 5.339202
|
time2 |
1 | 0 (omitted)
2 | 0 (omitted)
3 | 0 (omitted)
|
porteur#time2 |
1 1 | 0 (omitted)
1 2 | 0 (omitted)
1 3 | 0 (omitted)
|
time3 |
1 | 0 (omitted)
2 | 0 (omitted)
|
porteur#time3 |
1 1 | 0 (omitted)
1 2 | 0 (omitted)
|
1.time4 | 0 (omitted)
|
porteur#time4 |
1 1 | 0 (omitted)
|
_cons | 12.6734 1.016518 12.47 0.000 10.68106 14.66574
-------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
ID: Independent |
sd(Time) | .2596632 1.007456 .0001294 521.1367
sd(_cons) | 7.785805 .5415579 6.793544 8.922994
-----------------------------+------------------------------------------------
sd(Residual) | 7.059303 .2545035 6.577701 7.576166
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 214.80 Prob > chi2 = 0.0000

I think that the time values in variable colons are curious. for example there is a 5 under time1 section. i don't understand this labelling.
second, I suppose that terms omitted are just redundant with the porteur#time2 for example, as time1-2 represent the slope for non cariers and porteur#time2 represent the slope for cariers.
what we see here is that slope changes between the previous knots are sometimes significant. for example, time1-2 is significant as well as porteur#time1-12. but that does not tell me if the two slopes changes are statistically and significantly different between each other.
further, graphing this is verry challenging and I don't have success to plot the two growth curves.
any help and feedback would be very helpful. I give you also some data...
thanks a lot
carole
Array

Meta-analysis, how to combine effect size of hazard function?

$
0
0
Hi,

Hazard function is a parameter defined by risk, time, and survival function. Because the risk is one of the component of the definition of hazard function, we might regard the hazard function as a "mean" (similar as the idea of risk as a mean). If the Cox PH model assumption were met, the hazard function would not change over time and we might estimate the overall effect size across individual studies. However, if we want to combine the effect size (hazard rate ratio / HR), we must know the variance of LN(HR) of each individual study. This is what I have not figured out. Guys, come on, and I would like to hear how to about to figure it out.

Tom

DID on panel data

$
0
0
Hi everyone,

I'm fairly new to stata and I am trying to find out how to run a DID with my panel data, but I ran into some problems as all examples differ from my problem (or at least I think).

To simply explain my setup: my dependent variable is post-merger accounting quality (of the newly formed entity) and my independent variables are pre-merger accounting quality of the target and pre-merger accounting quality of the acquirer.
I also have a dummy variable, namely commonauditor, and i want to separate the groups to determine what the influence on post-merger accounting quality is when both parties in the acquisition have the same auditor. So there is a control group without common auditors, and a treatment group with common auditors.
As my variables already depict time, time is irrelevant in my sample as it is deal focussed, therefore i am using panel data.

My current set-up is this, but I can't find out how to effectively insert the commonauditor variable to create separate outcomes for the control group and the treatment group
Code:
xtset commonauditor

xtreg AQpostmerger DQacquirer DQtarget controls, robust

// in other words
xtreg depvar indepvar1 indepvar2 controls ]//
I do not really know how and where to put in the treatment effect.
If anyone knows how to effectively do it, or has a link to a good example i'd be very thankful!


Cheers

Possible multicollinearity problem in variable coding - logit

$
0
0
Hi all,

I am building a logit model with binary dependent variable =1 for lost virginity at 15 or younger and =0 for lost virginity at 16 or older. My independent variable is relative age which is coded between 0 and 1 to indicate where the individual's birth month sits relative to the academic year (so are they old for their school year, or young...). I am using a dataset with many countries, over which the academic year runs differently, which my relative age variable takes account of - an individual born in August in England (and thus the youngest in the year) has value 1 and an individual born in December in Denmark (and thus the youngest in the year there) has value 1 also. There are 12 possible values for relative age (1 for each month).
I have other control variables in my logit model including gender, material deprivation indicators etc but also country (categorical variable, 28 possible values, 1 for each country) which I want to include to take account of cultural differences in age of virginity loss.

I am confused because when I add country to my regression the coefficient on relative age reduces massively and the p value for it becomes very large (jumping to 0.8 when it was 0.02 before). Why could this be?

Is the fact that I coded the relative age variable using country i.e. "replace relage=1/12 if monthbirth==1 & countryno==56002| monthbirth==1 & countryno==100000 | monthbirth==9 & countryno==442000| monthbirth==1 & countryno==56001| ......." causing a problem...maybe with multicollinearity??

Please help if you can! Thank you in advance!

Fuzzy match - names only

$
0
0
Hi,

I am trying to fuzzy match 2 datasets 2 name only. I do not have a number ID to match the 2 database. I have been trying to use "matchit". The results I'm currenlty getting are not convincing. Is there any ways to use this SSC without "ID1", which is the number ID?

Here is the code I have been running:
I have created 1 unique number per name in each dataset. They do not have anything in common which is why I do not want to use them
Code:
matchit mgrno mgrname using HFnames_MorningStar1.dta, idu(obs1) txtu(Name) sim(token) t(0) override
here is how the 2 dataset look like:
Array Array



I am just trying to fuzzy match the 2 dataset by mgrname and Name. Can anyone help? Thanks!

Testing Rationality

$
0
0
This is a general question regarding using Stata to test rationality using experimental data. Microeconomists will be falimiar with the Axioms of Revealed Preferences and how this tool is used to describe consumer choice behavior. Recent literature by economists such as Harl Varian has widely used these axioms in helping understand choice behavior.

Now I have experimental data from a dictator game for which I want to test subjects' rationality in altruism. The analysis is basically to see whether subjects adhere to the axioms of revealed preference.

As a start, I thought there could be someone out there with an idea or a direction to a code that can handle this multidimensional analysis. I can prepare a MWE just in case there is someone with an idea of how to go about it in Stata.

Random effects panel model with serial correlation

$
0
0
Does clustering robust standard errors in a random effects panel data model automatically solve the problem of serial correlation and heteroskedasticity?

Frmttable: preventing notes added by &quot;note&quot; option from spanning width of page in a Latex table, and left justifying notes.

$
0
0
Dear all,
I am using John Luke Gallup's "frmttable" command to create tex tables. An issue I'm having is that no matter the width of the table, any notes added below the table span the width of the page. I'm wondering if there is a way to specify the frmttable command so that added notes are the width of the table. I can use \ to break the note into multiple lines, but this is not optimal. Also, if possible, I would also like to left justify the note. I cannot figure out how to do so.
Here is some sample code:
Code:
frmttable using "$dir1/stata files/impact analysis revision 1/results/girls6_60.tex", ///
statmat(obs) substat(1) sdec(0) ///
sfmt(fc) tex fragment merge ct(""\"Observations"\"") ///
note("Notes: * p \textless 0.10, ** p \textless 0.05, *** p \textless 0.01. Standard errors in brackets estimated as in \citet{Young2016}." \ "All regressions include an intercept and indicators for fourteen strata.")

Combining results from separate t-tests into one table

$
0
0
Dear Sirs,

I am wondering if you can help me with the following problem; I have panel data grouped into 5 different groups and I want to combine the results (mean, p-values) from several one sample t-tests (ttest variablename == 0) into the same table. For regressions I have used the command "estimates store" after each regression and then the command "estout" in order to get all the results into one table, however I cannot use the command "estimates store" for t-tests. Does anyone have an idea how I can do this?

Thank you in advance!

Best regards,

Anna

Two models in one marginsplot

$
0
0
Dear Community,

This question has been asked similarly before, but I have not been able to use any of the answers for my purpose, so I will pose my question here and I hope that most of you would agree that there is a sufficient degree of novelty in my request.

I have a dataset where I am essentially regressing three different models twice. This means that I first run three regressions for independent variable one (direct effect and two interaction effects), then change the independent variable and run the models again (see below). I would now like to create three different boxplots, which each contain one of the models for both independent variables. Thus, each boxplot would contain the marginal effect of l_drought and l_sevdrought for one of the models. An example (if a bit different) is the graphic used by von Uexkull, Croicua, Fjelde and Buhaug (2016), http://www.pnas.org/content/113/44/12391

Array

Only in my case each of the plots should contain "Drought, Severe Drought", instead of "onset, incidence, onset, incidence". Of course, I do not expect a one to one recreation of the graph above. Any help that goes in the direction of what I described would be appreciated and I will then see how far I get.


Code:
*Model 1*
reg attacks l_drought l_excl l_crop l_nlight l_gdp
margins, dydx(l_drought) at ((means) l_nlight l_crop)


*Model 2*
reg attacks i.l_drought##c.l_crop l_excl l_nlight l_gdp
margins, dydx(l_drought) at ((means) c.l_crop l_nlight)


*Model 3*
reg attacks i.l_drought##c.l_nlight l_excl l_crop l_gdp
margins, dydx(l_drought) at ((means) l_crop c.l_nlight)


***SEVERE DROUGHT*****

*Model 1*
reg attacks l_sevdrought l_excl l_crop l_nlight l_gdp
margins, dydx(l_sevdrought) at ((means) l_nlight l_crop)


*Model 2*
reg attacks i.l_sevdrought##c.l_crop l_excl l_nlight l_gdp
margins, dydx(1.l_sevdrought) at ((means) l_nlight c.l_crop)


*Model 3*
reg attacks i.l_sevdrought##c.l_nlight l_excl l_crop 
margins, dydx(1.l_sevdrought) at ((means) c.l_nlight l_crop)
Thank you guys in advance!

Converting ISCO88 codes into the International Socioeconomic Index of occupational status

$
0
0
Hi,

I am using individual level data and I currently have a variable coded following the ISCO88 4 digit codes ( The International Standard Classification of occupations). I will be using this variable as a proxy for socioeconomic status and so I would like to assign ISEI (International Socio-economic Index of Occupational Status) scores to the ISCO88 codes. I can't seem to find the stata syntax to do this, I have checked the Harry Ganzeboom website and it appears to only have the SPSS syntax. Any help with this would be greatly appreicated.

Thank you.
Sherine

Import CSV File - Text Data with uneven quotation marks and delimiter symbols

$
0
0
Dear Statalist Users,

I want to import a ".csv" dataset which contains both numeric and string variables.
First I tried using
Code:
import delimited mydata.csv, delimiter(";") varnames(1)
The command works, but for some observations all variable values are stored within a simple variable. I browsed through the raw data with a text editor and I think it has to do with quotation marks ( " ) and the delimiter symbol ( ; ), which both can be found as strings for a certain string variable. Some string values look like this
"bet"yes".
Hence, I think that due to the uneven amount of quotation marks, Stata searches further for the line for the closing quotation symbol, which results in my problem.


Next, I tried
Code:
import delimited mydata.csv, delimiter(";") varnames(1) stripquotes(yes)
as I thought this might solve the issue, but it does not and the same problem occurs. Is this the case because I have not specified the -bindqoutes- option?


My next try was
Code:
import delimited mydata.csv, delimiter(";")  varnames(1) bindquote(nobind) stripquotes(yes)
This solves the first problem, but gives birth to another. As previously mentioned, I also have string values with the delimiter symbol it self in it, for instance "a;b".
Thus, it will now use the semicolon, which is supposed to be meant as a string as an delimiter. As a result, for observation with semicolons as a string, the variables values are shifted to the right and it results in one extra variable for each semicolon, which was meant to be a string value.


Is there a way to tackle both problems at once? I haven't found a solution yet and my next move would be to stick with the last command and reshift the affected variable values, after the import.


Best regards,
Ali

How to estimate Marginal effects after running Multivariate probbit in stata?

$
0
0
I run Mvprobit for three outcome variables and wanted to present my result with Average partial effects. i need support on how to do it in stata? i used the command Mvppred, but i couln't manage to get it. I would appereciate if anyone could help me on the procedures.

Interactions in logistic regression (moderation analysis)

$
0
0
Dear all,

Am I right in thinking that you would only test for interaction effects in the presence of main effects? So even if you decided in advance that you wanted to test for an interaction (between two IVs), if you found that there wasn't two main independent effects when they were both included in the model (they had significant effects on the outcome univariately but only one remained a significant effect when together), then you wouldn't go on to test for an interaction? I am doing a logistic regression by the way on the effect of life stress on depression, and I am considering including anxiety as a potential moderator of that relationship. But I have found that when life events and anxiety are predictors together, only the main effect of anxiety remains, hence my question above.

Many thanks in advance for all your help!!

How do you sum specific countries in panel data?

$
0
0
-collapse- lets you sum all countries, but I only want to combine a few countries by year in a panel dataset. For instance, I want to only sum West Germany and East Germany into "Germany," but not to sum the others. How do I do that? A sample dataset I copied from another thread:

Code:
input str32
country str26   category long              year long   ImportTons sum
"West Germany" 1        1988  .  0
"France" 24352354     1988  .  0
"East Germany" 14312412 1990  .  0
"Aaland Islands" 4123414     1990  .  0
Viewing all 73176 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>