Quantcast
Channel: Statalist
Viewing all 72781 articles
Browse latest View live

Marginal Effects multinomial logit model (Dummy Interpretation)

$
0
0
Dear All,

I am running a multinomial logit model for my research. I am creating a categorical variables (dummies) for industries and for institutional investors. My independent variable are choice of performance measure to be used either (Outcome 1) Sales exclusively, (Outcome 2) EBIT exclusively, (Outcome 3) EBIT and Sales jointly and Outcome 4( neither Sales nor EBIT)

I am looking at how volatility in that performance measure influences the firm to choose that performance measure of the company. The base category for institutional investors is G(fund managers) while base category for industry dummies is Aerospace.


I used this command for the marginal effects.
margins, dy /dx (*) at means predict ( pr outcome (2))) outcome two is EBIT exclusively.
EBIT as a performance measure Delta-method
dy/dx Std. Err. z P>z [95% Conf. Interval]
Salesvolatilty (Ratio) .0130772 .0156573 0.84 0.404 -.0176106 .0437649
Ebitvolatilty (Ratio) .3955645 .2199599 1.80 0.072 -.0355491 ).8266781
Board Size (log) -.1486419 .165472 -0.90 0.369 -.4729611 9percentage.1756773
% of non executive (percentage) .0014928 .002255 0.66 0.508 -.0029268 .0059125
log market cap (log) .0188508 .0152096 1.24 0.215 -.0109594 .048661
Debt ratio (log) .0010341 .0101053 0.10 0.918 -.0187719 .0208401
Industry cateogiral variable
Basic Materials -.2858876 .0524736 -5.45 0.000 -.3887339 -.1830412
Consumer Goods -.1871963 .0456612 -4.10 0.000 -.2766907 -.0977019
Consumer Services -.2985905 .0350081 -8.53 0.000 -.3672051 -.229976
Financials -.5662188 .0308298 -18.37 0.000 -.6266441 -.5057935
Healthcare -.3843539 .063049 -6.10 0.000 -.5079276 -.2607802
Oil & gas -.2774503 .0687093 -4.04 0.000 -.4121181 -.1427826
Real Estate -.1150948 .0668729 -1.72 0.085 -.2461633 .0159736
Utilities -.2752968 .0739901 -3.72 0.000 -.4203148 -.1302788
Institutional Investor categorial dummy
A .0503691 .0828992 0.61 0.543 -.1121103 .2128484
B .032212 .1358006 0.24 0.813 -.2339523 .2983763
C -.0577504 .1781715 -0.32 0.746 -.4069601 .2914594
D -.3119061 .1081034 -2.89 0.004 -.5237849 -.1000273
E 0.090909 0.042 2.12 0.034 0.08892 .17851
F 0.0712 0.044 1.79 0.074 -0.077 .16712

Can anyone please help me to interpret (the coefficient in term of percentages) DUMMIES specifically as it relative to the base category and also, how do we interpret RATIOS OR PERCENTAGES in marginal effects. I think marginal effect at mean is a good idea to go for.

For interpretation of dummy, can we interpret that the for an F investor, there is 7% probability of choosing EBIT as a performance measure relative to the base category that is G category.

But do we also talk about relative to other dependent variables?



Thanks,
Michael

Points in Multiple Regions

$
0
0
I have data of the form:

point region 1 region 2 region 3 ...

1 0 1 0
2 1 0 1
3 0 1 0
4 1 0 0
5 1 0 1
...

There are 13 regions and I want to assign each of the points with an ID corresponding to that region (1 - 13).
The problem is that some points are in multiple regions, so I will need to have copies of that point each with a distinct ID.
How do I do this?

EDIT: I expand the observations by the number of the rowtotal (i.e. the number of regions the point lies in). How do I generate the list of IDs of the regions the points is in and match it with the duplicates of the points? Any help would be very much appreciated, I'm really stuck on this!


Regards

Thomas

BMI calculation

$
0
0
Hello,

I have a question regarding the calculation of BMI. I have in my data set the variables weight and height but when i calculate the BMI i get the wrong outcomes (big percentage underweight people for example). My question is, what am i doing wrong? Do i need to generate new variables for weight or height or something like that?

Thank you very much in advance!

The variable height (var452) is devided as follows:

Code:
  Lengte in cm |      Freq.     Percent        Cum.
---------------+-----------------------------------
             5 |          1        0.01        0.01
             6 |          1        0.01        0.02
     tot 52 cm |          1        0.01        0.02
      58-62 cm |          1        0.01        0.03
      63-67 cm |          1        0.01        0.04
      68-72 cm |         10        0.08        0.12
      73-77 cm |          7        0.05        0.17
      78-82 cm |          1        0.01        0.18
      83-87 cm |          1        0.01        0.19
      88-92 cm |          3        0.02        0.21
      93-97 cm |          1        0.01        0.22
     98-102 cm |          6        0.05        0.27
    103-107 cm |         43        0.34        0.60
    108-112 cm |        187        1.47        2.07
    113-117 cm |        550        4.32        6.39
    118-122 cm |        940        7.38       13.77
    123-127 cm |      1,274       10.00       23.77
    128-132 cm |      1,035        8.12       31.89
    133-137 cm |      1,048        8.23       40.11
    138-142 cm |        757        5.94       46.06
    143-147 cm |        313        2.46       48.51
    148-152 cm |        201        1.58       50.09
    153-157 cm |        212        1.66       51.75
    158-162 cm |        595        4.67       56.42
    163-167 cm |        903        7.09       63.51
    168-172 cm |      1,361       10.68       74.19
    173-177 cm |      1,001        7.86       82.05
    178-182 cm |      1,014        7.96       90.01
    183-187 cm |        725        5.69       95.70
    188-192 cm |        309        2.43       98.12
    193-197 cm |        129        1.01       99.14
    198-202 cm |         30        0.24       99.37
203 cm of meer |          9        0.07       99.44
      onbekend |         71        0.56      100.00
---------------+-----------------------------------
         Total |     12,741      100.00


And the variable weight (var453) is devided as follows:

Code:
 Gewicht in kg |      Freq.     Percent        Cum.
---------------+-----------------------------------
        3-7 kg |          4        0.03        0.03
       8-12 kg |          3        0.02        0.05
      13-17 kg |          4        0.03        0.09
      18-22 kg |          3        0.02        0.11
      23-27 kg |          2        0.02        0.13
      28-32 kg |          1        0.01        0.13
      33-37 kg |          3        0.02        0.16
      38-42 kg |          9        0.07        0.23
      43-47 kg |         48        0.38        0.60
      48-52 kg |        249        1.95        2.56
      53-57 kg |        542        4.25        6.81
      58-62 kg |      1,211        9.50       16.32
      63-67 kg |      1,411       11.07       27.39
      68-72 kg |      1,756       13.78       41.17
      73-77 kg |      1,590       12.48       53.65
      78-82 kg |      1,710       13.42       67.07
      83-87 kg |      1,279       10.04       77.11
      88-92 kg |      1,048        8.23       85.34
      93-97 kg |        633        4.97       90.31
     98-102 kg |        488        3.83       94.14
    103-107 kg |        234        1.84       95.97
    108-112 kg |        143        1.12       97.10
    113-117 kg |         73        0.57       97.67
    118-122 kg |         50        0.39       98.06
123 kg of meer |         67        0.53       98.59
      onbekend |        180        1.41      100.00
---------------+-----------------------------------
         Total |     12,741      100.00

The code i used to calculate the BMI is the following:

Code:
gen bmi = (1000*var453/(var452*var452))
sum bmi

gen underweight= 1 if bmi <=18.5
replace underweight = 0 if bmi >18.5
gen normal_weight= 1 if bmi >=18.5 | bmi <=24.9
replace normal_weight = 0 if bmi <18.5 | bmi >24.9
gen overweight= 1 if bmi >=25 | bmi <=29.9
replace overweight = 0 if bmi <25 | bmi >29.9
gen obese= 1 if bmi >= 30
replace obese = 0 if bmi <30

gen bmi_categories=0
replace bmi_categories= 1 if underweight==1
replace bmi_categories= 2 if normal_weight==1
replace bmi_categories= 3 if overweight==1
replace bmi_categories= 4 if obese==1

label define bmi 0 "Missing" 1 "Underweight" 2 "Normal Weight" 3 "Overweight" 4 "Obese"
label values bmi_categories bmi

tab bmi_categories

Most efficient way of handling these two files?

$
0
0
Hi All,
If anyone could suggest a code or a different way to look at this problem I'm having:

Using a simplifed version of my sample:


I have one main file with rainfall amounts by clusters(c)

year c1 c2 c3 c4
1930 5 10 15 20
1931 10 20 30 40
1932 20 40 60 80
1933 1 2 3 4
1934 2 3 4 5
1935 7 8 9 10
1936 6 4 2 1



Then I have another file :
year c
1932 2
1936 3
1931 3
1932 4
1930 1
1934 2
1930 1


I'm trying to merge the rainfall values from the first file to the appropriate clusters and years in the second.
So I want my file when merged to look like:

year c Rainfall
1932 2 40
1936 3 2
1931 3 30
1932 4 80
1930 1 5
1934 2 3
1930 1 5

Is there a way to accomplish this?

LASSO (lars) analysis with first variable fixed.

$
0
0
Hi,

I have a quick question regarding the LASSO technique using the lars command in stata.
I am intereseted in the effect od education on health related quality of life. The variable Education_level is the first variable i want to include in my regression
When i use the lars method:

Code:
lars HRQoL Education_level smoking_behaviour extensive_drinker blood_pressure bmi_categories cancer COPD smoking_COPD diabetes blood_diabetes muskulo diabetes_muskulo age age2 gender marital_status, algorithm(lars)
I get an output where it says i should include Education_level at the 8th place (before muskulo and age for example.)
Is there a way to use the Lars method with education_level fixed as first added in the regression? I could not find anything like this on the web.

Thanks in advance!

Florian

Model misspecification - very large t-statistics

$
0
0
Hello,

I am running a fixed effects model using 4 variables. The variables show very significatn results (0.001) and standard errors below 0.5. However, when I control for autocorrelation and heteroscedasity the T statistics drop sharply from e.g. 10 to 2. I I was told that when using clustered robust standard errors and there is a large differences between the clustered and non clustered T statistics this could be a serious sign of model misspecficiation. Is this correct? And what can I do to solve this problem.

Zero inflated models

$
0
0
I have a question concerning zero inflated negative binomial models. I have a model I'm running, If I use negative binomial I have significance on several key variables. I tried running a zero inflated model. All goodness of fit shows a slightly better fit, however, absolutely nothing is significant. If I check marginal probabilities (Long's SPost program), I have the same significances as I did under the regular neg bin model. Can these marginal probabilites be trusted?

merge datasets; r(459); variable does not uniquely identify observations in the using data

$
0
0
Hello everyone,

I just startet to use stata and have a problem with merging 3 different datasets (with Stata 12).

I used the merge (m:1) command and got an error message r(459) "variable does not uniquely identify observations in the using data".

merge m:1 company_ID using Set 2.dta

merge m:1 company_ID using Set 3.dta

The Datasets look like this:

Set 1 (master):

company_ID job_request_date graduate_id
123 12.11.2014 57878
123 12.10.2014 78878
123 16.11.2014 99899
121 14.11.2014 55744
345 12.10.2014 55879
876 12.09.2014 55879
876 19.09.2014 14787
1000 19.09.2014 14787 (--> not available in Set 2)
. (missing) . (missing) 68994 (--> no job offer --> no company_ID)
... .....

Set 1 contains multiple obersevations for one company.


Set 2:

company_ID number_employees
123 100
121 50
345 600
876 800
... .....

Set 2 contains one observation for each company.


Set 3:

company_ID export_number
123 1
121 1
345 5
876 6
1000 1
... .....

Set 3 contains one observation for each company. Not every Company_ID of Set 3 is included in Set 2.


I want to add the information of Set 2 and Set 3 for each observation in Set 1:

Set merged:

company_ID job_request_date graduate_id number_employees export_number
123 12.11.2014 57878 100 1
123 12.10.2014 78878 100 1
123 16.11.2014 99899 100 1
121 14.11.2014 55744 50 1
345 12.10.2014 55879 600 5
876 12.09.2014 55879 800 6
876 19.09.2014 14787 800 6
1000 19.09.2014 14787 . (missing) 1
... .....

It is possible that the company_ID and the graduate_id are the same but since I define the company_ID as the keyvariable there should not be a problem?

I think that it might be a problem that the different sets contain company_IDs that are not in all ofthe other datasets?! I only know how to merge Data wih exel "vlookup". Add information if you find a matching pair f.e. company_ID = 123 in both files. Does it work the same with the merge command?


I hope you can help me with this problem.
Thank you!

xtabond2 doesn't return R2. How can I add R2 I calculate myself to the e-class results as e(r2)?

$
0
0
I'm running a dynamic panel model using "xtabond2". It doesn't return R2 (r-squared) so I need to calculate it myself.
The problem is I need to store this R2 in the e-class results so later on I can print out the results using "esttab".
How can I do this? I've been searching for solution for days. Please help.
Thank you very much.

Help with mixlogit

$
0
0
I am new to Stata and specifically to the mixlogit command. I am trying to work out a model and have two main questions:

1. How can I derive the marginal effect of an independent variable after estimating the model?
2. Is there a test I can run to prove statistically that the model obtaied is better than a simple conditional logit model? I tried to run lrtest, but it does not work when comparing clogit and mixlogit models.

Thanks in advance,

Carlos

stcompet

$
0
0
I'm trying to run a competing events survival model to analyse the time to school dropout and completion. I'm using Stata 14.1 and have installed stcompet but I seem to be getting an r(198) error code every time I run the stcompet command, which says "end_date> invalid name". The stset command prior to that ran without any errors/issues, so not sure why it stumbled here. I do have a few people who have multiple events (recurring dropouts), but since they exit once they experience the first event, I don't see this as the issue (or is it?..)

Can you please help me understand what I am doing wrong?

Here is the syntax so far:

stset end_date, failure(ever_drop==1) origin(time birth_date) enter(start_date) time0(start_date) exit(ever_drop==1) id(ident) scale(365.25)
stcompet cuminc=ci serror=se cihi=hi cilow=lo, compet1(2) by(sex)
end_date> invalid name
r(198);

Thank you!


How does Stata calculate the “predict varname, u” after xtreg random-effect?

$
0
0
What "predict varname, u" after xtreg with random-effects really do in Stata? How it works?
I mean, how the ("individual") random-error component u_i is extracted from the overall e_it error component? Any reference?

test post code

Export excel makes dropdown boxes disappear

$
0
0
Hello,

I'm using stata to export a list of around 5,000 health facilities to a pre-formatted excel workbook. The stata export is a sort of 'data' tab for the rest of the workbook. There are four other tabs that generate different calculations based on what is in the data tab.

On the final sheet where the results are displayed, I have some dropdown boxes for the user to manipulate that I created in excel's 'data validation' functionality. Strangely, after executing the 'export excel' function in stata, the dropdown boxes disappear in the resulting excel file.

My command is:

export excel id country siteid sitename sitelevel activationyr avg y2011-y2015 ///
disc yearssince using "global ue caseload model v6.xlsx", sheet("STEP 1_UE caseload data") firstrow(variables) sheetmodify

I've also tried using sheetreplace and sheet("STEP 1_UE caseload data", modify)

​Any help is most appreciated.

I'm using Stata/SE 14.0 for Windows (64 bit) and excel 2013

Double IF-function in ttest

$
0
0
Dear all,

I want to do a ttest on the variable CARA3 if the variable HFI > 0.64195 and HFI<0.119814
However, this function is not working:

ttest CARA3==0 if HFI < 0.64195 | HFI > 0.118914

Kind regards for the help,
Emiel Brak

Pandel Data: Within-firm variation using Xtreg. Split variable in two time periods. &quot;No observation&quot;

$
0
0
Hi!

I have a panel data ( firmID and year). I want to do a within-firm variation to analyse different measures.

It should look like this:

Var1 = size in period 1990 - 2000

Var2 = size in period 2000 - 2010

Var3 = cash
i.year* = year dummy control for fixed effects
Y = growth

Y = Var1 Var2 Var3 i.year*, fe i(firmID) vce(cluster firmID)

I got the error: "No observation"

Is that a result of Var1 and Var2?

Exlaining xtnbreg dispersion statistics in layman's terms

$
0
0
Hello Everyone,

For a paper, I have estimated a series of longitudinal negative binomial regression models using Stata's xtnbreg command. However, for reporting purposes, I have the following questions:

1) Unlike the output provided by nbreg, the xtnbreg model provides "/ln_r" and "/ln_s" as well as "r" and "s". Which of these statistics would you report for a journal publication?

2) Also, can someone please provide layman's interpretations of the xtnbreg dispersion statistics? Unfortunately, the stata manual simply reads "/ln_r and /ln_s for longitudinal negative binomial regression (the Stata command: xtnbreg) refer to ln(r) and ln(s), where the inverse of one plus the dispersion is assumed to follow a Beta(r, s) distribution"

***If anyone can provide the answer to the above questions, or point me in the right direction, it would be much appreciated.

Thanks much!
James




Data management - looking for command (seems to be somewhere between reshape and fillin)

$
0
0
I apologize for the janky subject, but I'm not sure how to express this in general terms. I've looked over the Data Management manual and other sources trying to find if there is a command that will do this, but no luck. I've got a dataset organized like this
id item quantity
001 abc 1
001 def 3
001 acf 7
001 kgi 4
002 qrs 1
002 tts 7
002 bhy 4
002 acf 3
003 lsq 6
004 def 5

And I would like to reorganize it so it looks like this:

id abc def acf kgi qrs tts bhy lsq
001 1 3 7 4 0 0 0 0
002 0 0 3 0 1 7 4 0
003 0 0 0 0 0 0 0 6
004 0 5 0 0 0 0 0 0

Initially I thought -reshape wide- would help, but I think that would just give me something like this instead (and not even sure how to get quantity in there):
id item1 item2 item3 item4
001 abc def acf kgi
002 qrs tts bhy acf
003 lsq
004 def

Then I checked out -fillin-, which appealed at first but I don't think is the right answer either. Not sure where to go next - perhaps a combination of commands, but last time I had a data management question I worked up a complicated series of commands only to find that -contract- did exactly what I needed in a single line. So I thought I'd ask here first. Is there a single command that I could use to transform my dataset as desired? Or, has someone faced a similar problem and found a series of commands that work efficiently together to effect this sort of manipulation?

Thanks very much,
Robert

Pairwise correlation

$
0
0
Hiya

I have a problem regarding the pairwise correlation. Following is the output format:


Var1 Var2 Var3 Var4
VAR1 1
VAR2 0.98* 1 0.9* 0.9
VAR3 0.09 0.02 1 0.34
VAR4 . . . 1


My query here is why there is a "." in VAR4?


Thank you

nested regressions -does the order matter?

$
0
0
Hi everyone,

I would like to apply a nested regression using the nestreg command
Code:
nestreg: reg riskperc (income sex age) (...)
.
I have seven blocks in total, each representing different theoretical dimensions that aim to explain the variance of the dependent variable (risk perception).

Now I am wondering if the order in which the blocks are integrated into the nested regressions matters?

Thank you for your time.
Andreas
Viewing all 72781 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>