Quantcast
Channel: Statalist
Viewing all 73374 articles
Browse latest View live

Is Durbin–Wu–Hausman test a valid test when using generated regressor without any declaration of IVs?

$
0
0
Dear All,

Recently I have an exactly the same question with an old faq on Stata about Durbin–Wu–Hausman test.
https://www.stata.com/support/faqs/s...-hausman-test/

I just copy and paste the faq here.
__________________________________________________ __________________________________________________ ________________________
Before estimating the following simultaneous equations,
z = a0 + a1*x1 + a2*x2 + epsilon1 (1) y = b0 + b1*z + b2*x3 + epsilon2 (2)
one should decide whether it is necessary to use an instrumental variable, i.e., whether a set of estimates obtained by least squares is consistent or not.

Davidson and MacKinnon (1993) suggest an augmented regression test (DWH test), which can easily be formed by including the residuals of each endogenous right-hand-side variable, as a function of all exogenous variables, in a regression of the original model. Back to our example, we would first perform a regression
z = c0 + c1*x1 + c2*x2 + c3*x3 + epsilon3 (3)
get residuals z_res, then perform an augmented regression:
y = d0 + d1*z + d2*x3 + d3*z_res + epsilon4 (4)
If d3 is significantly different from zero, then OLS is not consistent.
__________________________________________________ __________________________________________________ ___________________

My question is, the normal Durbin–Wu–Hausman needs a declaration of IV for z. In this case, it must be the x1 and x2. However, in my case, the z is a generated regressor and x1 x2 are a list of long variables even with tons of dummies, like equation (1).

When I try to test the endogeneity of z in equation (2), do I need to prove x1x2 are all uncorrelated with epsilon2 (it is the definition of IV) or just do as the posted faq suggested?



best,
Zhaohui

Bar Graph with dates

$
0
0
Dear Statalist,

I have a list of several countries with dates of COVID-19 lockdown. I would like to make a horizontal bar chart similar to the example below (please ignore other information in the graph).

I would appreciate any help.
Thank you!

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 countrycode str11 country float lockdown
"AUS" "Australia"   20200324
"MYS" "Malaysia"    20200318
"NZL" "New Zealand" 20200326
"SGP" "Singapore"   20200408
"THA" "Thailand"    20200324
"VNM" "Vietnam"     20200401
end
Array

Replicating Spatial Regression (with Spatially correllated disturbances) on Mata

$
0
0
Hey guys,

So I am trying to solve the following model in Mata:

y = x + u

u = rho*u + e

This regression is implemented by the official Stata command spregress which implement the methodologies first pioneered by Kelejian and Prucha (1999).

As per Kelejian and Prucha (1999), the rho parameter is estimated by the moment conditions:

Array

As per Kelejian and Prucha (1999), reworking the moments conditions lead to the following system of equations to be solved:

Array


Array

Where G is a 3x3 matrix and the second parameter in RHS is the vector of residuals, while alpha refers to a column vector of the parameters to be estimated: (rho, rho-squared and variance of error term e from above).

These estimators are derived from the minimization of :

Array


Implementing the above framework, via both Mata and the Stata gmm command as estimating the above system of equations, with respect to the 2 unknowns (described above) however yields slightly different results from the official Stata spregress routine, specifically for the parameter rho.

My question is thus for the spatial econometrics experts out there, is my estimation methodology wrong ( in that I'm jointly estimating the 3 equations above with respect to the paramters I described above) ?

I suppose something wronf with my code

$
0
0
I am running multiple regression with one categorical independent and six other variables. The dataset is from 1999 to 2017 (panel data).
Obtained results force me to hesitate in my code. Do shape of variables (wide or long) or their storage type can affect results of the regression??

Interpreting Dummy variable and constant

$
0
0
Hi everyone,

I want to study the effect of private universities systems with high tuition fees on income inequality in Europe.
Income inequality is measured by Gini coefficient of equivalised disposable income for the period 2008-2015 for all European countries except Croatia which I dropped due to a lack of data.
I used a dummy variable to account for university system. free is 0 and private is 1.

I found these results



I have two questions :


1) Is it correct to interpret the result of the dummy "free uni" as follow : High tuition fees increase GinieqDI by 0.679 ?
2) I have trouble interpreting the constant result : If all my depedent variables are equal to 0 then Bo = 154.90
Which would mean that on average the Gini coefficients for the observed countries is 157,90 ?
But that doesn't make sense ! Gini is between 0 and 100


Thank you for your help
Nour.

using egen's max function with a local

$
0
0
Hello!

I am Nithya. This is my first post, and i've tried to follow the advice on posting. But please excuse if I've missed something.

I have a string variable childgender with values=male,female. I am trying to create two separate variables countmale and countfemale with the number of male and female children in each household (hhid). I am The code i'm using is:

Code:
levelsof childgender, local(gender)
foreach sex of local gender{
    bysort hhid:egen count`sex'=count(childgender) if childgender=="`sex'"
    ta count`sex'
    egen n`sex'=max(count`sex'),by(hhid)
replace n`sex' =0 ==mi(n`sex')
}
The problem i'm having is with the line
Code:
egen n`sex'=max(count`sex'),by(hhid)
It works fine ifi use a proper variable name, but if I use max(count`sex'), instead of replacing empty cells of n`sex' with the maximum value, it replaces it with 1. Could someone please help?

Statistically significant coefficient of zero in Poisson panel regression

$
0
0
Hello Statalist members,

i am analysing a firm panel where I am regressing the number of high skilled employees on three dummies and some control variables. One of them is investment (in Euros). I am using a Poisson fixed effects regression and Stata 13. The coefficient for investment is 9.60e-11, which is basically zero (0.000 when I round coefficient to three decimals, same for the confidence interval and standard error) and I am wondering what this means and how to interpret it? If there is no relationship between the variables it should not be statistically significant or am I mistaken?

Code:
 xtpoisson highskill investict product_inno process_inno lnturnover lnavwages collective lnexportshare investment i.year, fe vce(robust)
note: 224 groups (224 obs) dropped because of only one obs per group
note: 10180 groups (49161 obs) dropped because of all zero outcomes

Iteration 0:   log pseudolikelihood =  -120765.5  
Iteration 1:   log pseudolikelihood = -108313.47  
Iteration 2:   log pseudolikelihood = -108273.22  
Iteration 3:   log pseudolikelihood = -108273.22  

Conditional fixed-effects Poisson regression    Number of obs     =     53,130
Group variable: idnum                           Number of groups  =      9,405

                                                Obs per group:
                                                              min =          2
                                                              avg =        5.6
                                                              max =         12

                                                Wald chi2(18)     =     161.63
Log pseudolikelihood  = -108273.22              Prob > chi2       =     0.0000

                                   (Std. Err. adjusted for clustering on idnum)
-------------------------------------------------------------------------------
              |               Robust
    highskill |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    investict |   .0256531    .010968     2.34   0.019     .0041561      .04715
 product_inno |  -.0059202   .0283482    -0.21   0.835    -.0614816    .0496412
 process_inno |    .035324   .0168719     2.09   0.036     .0022557    .0683923
   lnturnover |   .2225503    .038659     5.76   0.000     .1467802    .2983205
    lnavwages |   .0048915   .0269775     0.18   0.856    -.0479835    .0577664
   collective |  -.0064905   .0217641    -0.30   0.766    -.0491475    .0361664
lnexportshare |   .0113623   .0076189     1.49   0.136    -.0035705    .0262951
   investment |   9.60e-11   2.14e-11     4.50   0.000     5.42e-11    1.38e-10
              |
         year |
        2008  |   .0124656   .0145073     0.86   0.390    -.0159682    .0408995
        2009  |   .0502652   .0342328     1.47   0.142    -.0168299    .1173603
        2010  |   .0917918   .0202746     4.53   0.000     .0520543    .1315292
        2011  |    .156842   .0540877     2.90   0.004     .0508321    .2628518
        2012  |   .2284362   .0607472     3.76   0.000     .1093738    .3474985
        2013  |   .2258726   .0640781     3.52   0.000     .1002819    .3514634
        2014  |    .271107    .067284     4.03   0.000     .1392328    .4029813
        2015  |   .2600914   .0546444     4.76   0.000     .1529904    .3671923
        2016  |   .2684858   .0528786     5.08   0.000     .1648456    .3721261
        2017  |   .3187303   .0638673     4.99   0.000     .1935527     .443908
        2018  |   .3404339   .0668107     5.10   0.000     .2094874    .4713804
-------------------------------------------------------------------------------
If anyone can tell me how to interpret the zero coefficient this would be of great help, I have tried research online as well as econometrics books but I have never found an example case like this with a coefficient being zero.

Thanks,

Helen

Why low R2 in my random effects-model?

$
0
0
Hi everyone.

I have two questions, which I would be very thankful if you could help med with. I have limited experience with panel data.

I am trying to analyze how loneliness is affecting political trust (simplified). My main variables are drawn from Likert scales, and I am using panel data with a set of controls. The panel data consists of 3 panels with a total of around 12 000 respondents (distributed differently depending on used IV).
I have done a Hausman test, which indicated an RE-model to be favorable.
I have also done a tsset that indicated that my data is "strongly balanced". Although I still have some missing data for some respondents.

Issue 1: Very low R-squared.

Why is my R-squared so low? Is it low because I have done something wrong, or is it low simply because my model doesn't explain that many variations? (The thing is, when using regular OLS the R-squared is around 0.40, although drawn from another dataset.)

I hope the table below is fairly readable. For some reason I couldn't paste the layout shown in Stata.

xtreg state_trust subj_lonli i.gender i.age i.edu i.employment i.pol_pos log_org, robust

Random-effects GLS regression
Group variable: idpers

Number of obs = 6,388
Number of groups = 3,948

R-sq:
within = 0.0184
between = 0.0520
overall = 0.0555

Obs per group:
min = 1
avg = 1.6
max = 3

.
corr(u_i, X) = 0 (assumed)

Prob > chi2 = .
Wald chi2(99) =

(Std. Err. adjusted for 3,948 clusters in idpers)
------------------------------------------------------------------------------
state_trust | Coef. | Std. Err (robust). | z | P>|z| | [95% Conf. Interval]
-------------+----------------------------------------------------------------
subj_lonli | -.044875 |.0118406 -3.79 | 0.000 | -.0680822 | -.0216677

2.gender | -.0299601 .0690845 -0.43 0.665 -.1653633 .105443


age |
15 | -.0242228 .4280639 -0.06 0.955 -.8632127 .8147671
16 | -.2432496 .3877489 -0.63 0.530 -1.003223 .5167242
17 | -.3892199 .4286725 -0.91 0.364 -1.229403 .4509628
18 | -.4196856 .4319819 -0.97 0.331 -1.266355 .4269833
19 | -.7100844 .4445975 -1.60 0.110 -1.58148 .1613107
20 | -.9919223 .4505656 -2.20 0.028 -1.875015 -.1088299
21 | -.9234081 .462765 -2.00 0.046 -1.830411 -.0164054
22 | -1.126776 .4601459 -2.45 0.014 -2.028646 -.2249067
23 | -1.225502 .4786806 -2.56 0.010 -2.163699 -.2873056
24 | -1.014845 .4787934 -2.12 0.034 -1.953263 -.0764273
25 | -.6578916 .4919066 -1.34 0.181 -1.622011 .3062276
26 | -.6735368 .5457528 -1.23 0.217 -1.743193 .3961191
27 | -.9135611 .5148083 -1.77 0.076 -1.922567 .0954446
28 | -.8806842 .4818212 -1.83 0.068 -1.825036 .063668
29 | -1.440289 .4917325 -2.93 0.003 -2.404067 -.4765108
30 | -1.279424 .5016661 -2.55 0.011 -2.262671 -.296176
31 | -1.243147 .4835396 -2.57 0.010 -2.190867 -.2954264
32 | -1.244969 .4722995 -2.64 0.008 -2.170658 -.3192786
33 | -1.391668 .4706714 -2.96 0.003 -2.314167 -.4691686
34 | -1.40175 .4667951 -3.00 0.003 -2.316652 -.4868485
35 | -1.222395 .4689766 -2.61 0.009 -2.141573 -.303218
36 | -1.209475 .457466 -2.64 0.008 -2.106092 -.3128578
37 | -.9726333 .4583795 -2.12 0.034 -1.871041 -.074226
38 | -1.084053 .4624226 -2.34 0.019 -1.990384 -.1777212
39 | -1.490873 .4564992 -3.27 0.001 -2.385595 -.5961511
40 | -1.20474 .4562663 -2.64 0.008 -2.099005 -.3104745
41 | -1.465782 .4610184 -3.18 0.001 -2.369361 -.5622025
42 | -1.181186 .4605393 -2.56 0.010 -2.083826 -.278545
43 | -1.147428 .4589952 -2.50 0.012 -2.047042 -.2478139
44 | -.9302757 .471138 -1.97 0.048 -1.853689 -.0068621
45 | -.9146044 .4630601 -1.98 0.048 -1.822185 -.0070233
46 | -1.046955 .4640126 -2.26 0.024 -1.956403 -.1375074
47 | -1.194849 .4693776 -2.55 0.011 -2.114813 -.2748862
48 | -1.458196 .4637315 -3.14 0.002 -2.367093 -.5492989
49 | -1.531758 .4588825 -3.34 0.001 -2.431151 -.6323648
50 | -1.212892 .4702986 -2.58 0.010 -2.13466 -.2911234
51 | -1.114101 .4551873 -2.45 0.014 -2.006252 -.2219502
52 | -1.201248 .4562561 -2.63 0.008 -2.095493 -.3070023
53 | -.8778564 .4590725 -1.91 0.056 -1.777622 .0219091
54 | -1.329905 .4682386 -2.84 0.005 -2.247636 -.4121743
55 | -1.353538 .4730581 -2.86 0.004 -2.280715 -.4263611
56 | -1.194882 .4638121 -2.58 0.010 -2.103937 -.285827
57 | -1.307294 .4688561 -2.79 0.005 -2.226235 -.3883531
58 | -1.186869 .4610377 -2.57 0.010 -2.090487 -.2832523
59 | -.9172123 .4672325 -1.96 0.050 -1.832971 -.0014535
60 | -.9843392 .462074 -2.13 0.033 -1.889988 -.0786908
61 | -.9555716 .4617807 -2.07 0.039 -1.860645 -.050498
62 | -1.106859 .4638763 -2.39 0.017 -2.01604 -.1976782
63 | -.977484 .4601274 -2.12 0.034 -1.879317 -.0756509
64 | -1.057566 .4593934 -2.30 0.021 -1.957961 -.1571717
65 | -.9569004 .4579375 -2.09 0.037 -1.854441 -.0593593
66 | -.9439535 .4577537 -2.06 0.039 -1.841134 -.0467726
67 | -1.182253 .4678739 -2.53 0.012 -2.09927 -.2652374
68 | -1.111331 .4620249 -2.41 0.016 -2.016883 -.2057786
69 | -1.26849 .4678471 -2.71 0.007 -2.185453 -.3515264
70 | -.6979144 .4573333 -1.53 0.127 -1.594271 .1984425
71 | -1.014777 .4735079 -2.14 0.032 -1.942835 -.0867186
72 | -.8364775 .4674897 -1.79 0.074 -1.752741 .0797855
73 | -1.100835 .4770821 -2.31 0.021 -2.035899 -.1657714
74 | -1.080575 .4966258 -2.18 0.030 -2.053944 -.1072063
75 | -.8730281 .5033653 -1.73 0.083 -1.859606 .1135498
76 | -.6902913 .4945775 -1.40 0.163 -1.659645 .2790629
77 | -1.103201 .5094065 -2.17 0.030 -2.101619 -.1047823
78 | -.4252277 .5027349 -0.85 0.398 -1.41057 .5601146
79 | -.5360097 .5036819 -1.06 0.287 -1.523208 .4511888
80 | -.6822742 .5222598 -1.31 0.191 -1.705885 .3413362
81 | -.6859098 .5459123 -1.26 0.209 -1.755878 .3840587
82 | -.5849168 .5741404 -1.02 0.308 -1.710211 .5403777
83 | -.8567249 .6325746 -1.35 0.176 -2.096548 .3830986
84 | -1.015225 .5695311 -1.78 0.075 -2.131485 .1010359
85 | -.817436 .8447003 -0.97 0.333 -2.473018 .8381461
86 | -1.716431 .8791466 -1.95 0.051 -3.439526 .006665
87 | -.9582347 .7482772 -1.28 0.200 -2.424831 .5083618
88 | -.7534905 .6502242 -1.16 0.247 -2.027906 .5209255
89 | -2.111355 1.342654 -1.57 0.116 -4.742909 .5201986
90 | -.9557962 .8015679 -1.19 0.233 -2.52684 .6152479
91 | 1.123961 .8423056 1.33 0.182 -.5269275 2.77485


edu |
8 | .2083036 .3093472 0.67 0.501 -.3980057 .8146129
9 | .0051441 .2757922 0.02 0.985 -.5353986 .5456867
10 | .3364264 .3195026 1.05 0.292 -.2897872 .9626401
12 | .2482652 .3081384 0.81 0.420 -.3556751 .8522054
13 | .8433266 .3123009 2.70 0.007 .2312281 1.455425
14 | .6817005 .4025953 1.69 0.090 -.1073717 1.470773
16 | .4819251 .3152663 1.53 0.126 -.1359855 1.099836
18 | .6971742 .3200453 2.18 0.029 .0698969 1.324451
21 | .8120132 .3573241 2.27 0.023 .1116708 1.512356


2.employment | .3522081 .057271 6.15 0.000 .239959 .4644571


pol_pos |
1 | .1049498 .2242916 0.47 0.640 -.3346535 .5445532
2 | .258274 .148335 1.74 0.082 -.0324571 .5490052
3 | .3113382 .1406105 2.21 0.027 .0357467 .5869297
4 | .3090364 .1390905 2.22 0.026 .0364239 .5816488
5 | .3087768 .1371765 2.25 0.024 .0399157 .5776378
6 | .3459819 .1514121 2.29 0.022 .0492196 .6427443
7 | .3215975 .1545897 2.08 0.037 .0186072 .6245877
8 | .0815976 .1642886 0.50 0.619 -.2404022 .4035974
9 | -.1982149 .3213479 -0.62 0.537 -.8280452 .4316154
10 | .0554127 .2384647 0.23 0.816 -.4119694 .5227949


log_org | -.1888445 .0507535 -3.72 0.000 -.2883197 -.0893694
_cons | 6.133003 .461184 13.30 0.000 5.229099 7.036907
-------------+----------------------------------------------------------------
sigma_u | 1.5854354
sigma_e | 1.3066094
rho | .59552347 (fraction of variance due to u_i)
------------------------------------------------------------------------------

.



Issue 2:
I can't seem to find what assumptions a random effects model relies on. Do my regressors and DV need to be linearly related? Do my variables need to have normal distribution? Etc. Simply: What requirements does my data need to meet (more than "passing" the Hausman test)?

Many thanks in advance.

Equal slope t-test

$
0
0
Hello,

I ran two seperate regressions on two portfolios. The regressors are the Fama/French Factors. Because the portfolio excess-returns are already time series averages, I ran a regression with Newey-West adjusted std. errors.

The code looks like this:
Code:
 tsset ym
newey portfolio1 mktrf smb hml, lag(6)
newey portfolio2 mktrf smb hml, lag(6)
Here is an example of my dataset:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(ym portfolio1 portfolio2 mktrf smb hml)
210 -1.0934919 -1.2953682  -1.69  2.12  -.59
211 -1.5076194  -.9989501  -1.75  1.52 -2.79
212   1.846868  1.8286937   -.27  1.45  -.49
213  -2.503698 -2.2496798  -4.38  1.27  1.72
214   9.566316   7.653448      4  3.72   .31
215  1.5677294   .4746514    .27  1.35  -.37
216  -3.239614  -4.017523  -6.01  2.22  3.31
217    2.35551  1.5107986  -1.38  3.59   .76
218    6.92542   4.650113   2.85  3.48   1.2
219   9.188552   7.253043   7.88    .4 -3.54
220    5.65227   5.163275   1.76  4.56  -.62
221 -1.0532163 -1.5494046  -1.69  1.69   .59
222    6.21275    5.09622   5.11   .26 -1.11
223   7.366338   7.396804   3.75  5.06  -.46
end
format %tm ym
Now I want to run a t-test, that tests if the slopes of the coefficients are equal.

Any help is highly appreciated.

Best regards
Steven

Multiple fixed effects for Binary outcomes model with Cross Sectional Data

$
0
0
Hello Statalist,

I'm currently analyzing cross-section data of individual loans originated by 2 banks in 2017-2018 years in the country. (each individual loan appear once only). My goal is to estimate the effect of a relief lending program to loan outcomes (overdue/default).
Dependent variable Outcome: binary 1 if overdue/default within 2 years from the origination date, and 0 otherwise
I would like to add bank fixed effects and time fixed effects and zip-code fixed effects (note that the time variable is the quarter when the loan is issued so it repeated within bank and zip code). Because my data is not a time series there for function xtset is not applicable. Please help to advise me:
1. Is there any way to add fixed effects into a logit/probit model when xtset cannot be used? I tried:
Code:
xtlogit outcome i.relief loan_control borrower_control i.bank i.zip i.date, vce(cluster bank)
where relief is dummy variable indicates whether the loan is qualified for the relief lending program or not. But because there are more than 900 zipcodes in my data, so Stata showed errors of matsize too small.
I also tried:
Code:
egen zip_date = group(zip date)
areg outcome i.relief loan_control borrower_control, absorb(gse zip_date)
however absord does not allow me to add 2 variables in.
2. One of the 2 banks in my sample issues an overwhelming numbers of loans compared to the other. I wonder if I need to do anything with this unbalance?
Thank you in advance.

Methodological advice - obtaining panel data from census data: is it possible and how?

$
0
0
I'd like to ask members of Statalist for methodological advice.

I have found data on the spending with government ads by the Argentine state of Santa Fé. That's the data of my intended dependent variable. They uploaded the spending from 2007 to 2018 disaggregated by municipality. That's great.

The problem is the socio-economic data available to work with. The state of Santa Fé released the census data of 2001 and 2010 with data also disaggregated by municipality. The problem is that there is no data for the years 2002 to 2009 and beyond 2010.

I need data for the years of 2011 and 2015.

In this case, what can be done? What would you suggest? Linear interpolation and extrapolation of data? Some alternative technique?

Thank you

Problem with interpreting coefficients interaction terms

$
0
0
Dear all,

I currently am stuck with a theoretical question, and was wondering whether any of you may know the answer to this. It is a general question, for which I give an example as illustration. Say I have data for the years 1990 until 2010 for 20 countries. I do the regression y=a+bx+e where a is a constant and b is the coefficient for the independent variable and e the error term. In this this panel setting, I run an OLS regression first without and then with interaction terms (x variable interacted with decade dummies). Is it possible that in the former case the b-coefficient is statistically significant (at e.g. the 5% level), but when I interact it with the decade dummies (1990,2000,2010) and then run the -margins- command, I fail to find a statistically significant coefficient b at the 5% level for any of the three decades?

I guess what I would like to know is if evidence of a significant coefficient over the complete period considered, guarantees that it would be present in at least one of the individual decades?

This is more of a curiosity/theoretical question, where I am trying to improve my understanding of interaction terms coefficients interpretation and the -margins- command results. Any help on this nevertheless would be greatly appreciated.

Best,

Satya



event analysis using percentiles in stata

$
0
0
Hi All, I have a dataset for event study and I am trying to normalize the median by setting t-1=1 so I can make figures with the 25th percentile, 50th percentile and 75th percentile but I am not sure how to code this in stata. Here is an example data below which shows how my data looks with two of my variables income and GDP. I have already found the percentiles of my variables but not sure how to do it with t-1=1 and how to interpret that. Thanks!
time income_50 income_75 income_25 gdp_50 gdp_75 gdp_25
-2 98 103 72 98 104 79
-1 95 100 75 98 105 79
0 94 99 73 100 105 80
1 97 101 76 101 106 82
2 98 102 79 102 107 84

event study looping figures

$
0
0
Hi all, I have a panel dataset and I am trying to generate percentiles for variables and then graph them. This is how I did it but I feel like there could a more efficient way to go about it using loops for the graphs. I tried looping it but I was not successful. Here is how I went about it

global controls gdp income
preserve
foreach var of varlist $controls {
collapse (p50) $controls, by(time)
}
foreach var of varlist $controls{
rename `var' `var'_50
}


*********Then I save it and then do it for the 75th percentile:

restore
preserve
foreach var of varlist $controls{
collapse (p75) $controls, by(time)
}
foreach var of varlist $controls {
rename `var' `var'_75
}

*********Then I save it and then do it for the 25th percentile:

restore
preserve
foreach var of varlist $controls{
collapse (p25) $controls, by(time)
}
foreach var of varlist $controls {
rename `var' `var'_25
}

*****Then I save it and merge all the three files to get this example table below:
time income_50 income_75 income_25 gdp_50 gdp_75 gdp_25
-2 98 103 72 98 104 79
-1 95 100 75 98 105 79
0 94 99 73 100 105 80
1 97 101 76 101 106 82
2 98 102 79 102 107 84
*****Then I make my figures:

*for gdp*

twoway rarea gdp_75 gdp_25 time, fcolor(gs12%70) lpattern(dot)|| line gdp_50 time, lc(black) lw(medium) || line gdp_25 time, lc(gs14) lw(thin)|| line gdp_75 time,lc(gs10) lw(thin) subtitle(GDP, size(small))legend( label (1 "75th Percentile/25th Percentile") label (2 "Median"))

*for income*
twoway rarea income_75 income_25 time, fcolor(gs12%70) lpattern(dot)|| line income_50 time, lc(black) lw(medium) || line income_25 time, lc(gs14) lw(thin) || line income_75 time,lc(gs10) lw(thin) subtitle(income, size(small))legend( label (1 "75th Percentile/25th Percentile") label (2 "Median"))


I think there may be a more efficient way of doing this?Thanks!

Help with Multidimensional IRT

$
0
0
Good day StataList.

I am having a bit of trouble in doing Multidimensional IRT. I have already done confirmatory factor analysis and extracted the latent abilities for the model that I have below.

[ATTACH=CONFIG]temp_19369_1597563864575_544[/ATTACH]

So I would like to ask the following to the community
  1. Is the code I used to extract the latent ability correct?
  2. Is it possible to create a model with another latent ability that is dependent on those 3 latent ability? An overall latent ability if I may say so.
  3. How can I estimate item parameters (item difficulty and discrimination) in a Multidimensional IRT model like this one? Items are dichotomous
Code I used is as follows:
Code:
gsem (L1 -> q1, ) (L1 -> q2, ) (L1 -> q3, ) (L1 -> q4, ) (L1 -> q6, ) (L1 -> q7, ) (L1 -> q9, ) (L1 -> q10, ) (L2 -> q11, ) (L2 -> q12, ) (L2 -> q13, ) (L2 -> q15, ) (L2 -> q16, ) (L2 -> q17, ) (L2 -> q19, ) (L3 -> q21, ) (L3 -> q22, ) (L3 -> q23, ) (L3 -> q24, ) (L3 -> q25, ) (L3 -> q26, ) (L3 -> q27, ) (L3 -> q28, ) (L3 -> q29, ) (L3 -> q30, ), covstruct(_lexogenous, diagonal) latent(L1 L2 L3 ) cov( L2*L1 L3*L1 L3*L2) nocapslatent
predict L*, latent
Thank you to anyone who could shed some light to this.

Extracting accurate FE estimators after using PPMLHDFE to estimate gravity equation

$
0
0
Hello all.

I am trying to estimate a granular commuting gravity equation (As described in http://www.jdingel.com/research/DingelTintelnotSEGS.pdf , estimated model in pages 15-16).

I have a cross-sectional commuting matrix which contains roughly 2600 regions, and I am interested both in the coefficent of commuting time, and the origin and destination fixed effects.

I estimate the following model:
Code:
ppmlhdfe commutes dist, absorb(origin_code dest_code) tolerance(1.0e-10)
I'm concerned about the estimation of the origin and destination fixed effects, this concens rises from the following statement in the ppmlhdfe help file:
Code:
    - To save the estimates specific absvars, write newvar=absvar.
    - However, be aware that estimates for the fixed effects are generally inconsistent and not econometrically identified.
I was wondering if:
1. Is there a way to consistently estimate the fixed effects using ppmlhdfe?
2. I am replicating the methodolgy in the working paper linked above. The paper uses ppmlhdfe for estimation, and reports destination fixed effects (page 19). Am I mis-specifying the equation or missing something else?

Thanks in advance,
Gal

Probit Regression

$
0
0
Hi to everyone in the forum!

I am an MSc student in Financial Economics and I would like to raise a question. Any help would be much appreciated
I am running an ordered probit model in stata for my thesis in which my dependent variable takes three values denoted: 1,2,3 which represent the method of payment in mergers and acquisitions i.e. cash only payments, stock only payments, hybrid cash-stock payments.
While the main regression and the marginal fixed effects run perfectly fine, below the marginal effecs table i get an asterisk that says: dy/dx is for discrete change of dummy variable from 0 to 1. Does this indicate that there is a mistake in my calculations? Because I do not have a binary model but rather three values for my dependent variable and thus I cannot understand why it says 0 to 1. Could anyone please provide a clarification for that?
Thanks in advance,

Management of dates (break down the dates)

$
0
0
Dear All,

I am working a study on sick leave using register data. From the register, I got only starting dates and end dates of sick leaves for each individual. But the dates are not broken down year by year. For instance , for person A, there are only data for start date (1-may-2016) and end date (14-feb-2018).
So, I would like to know how I can split out the dates year by year (ie. 1/5/16 to14/2/18 will be divided into 1/5/16-31/12/16, 1/1/2017-31/12/17, 1/1/18-14/2/18) in order to calculate total number of sick leaves for each year.

The example data created for the question is as follow;

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id float(from to)
1 21185 22191
2 20454 22281
3 21186 21337
3 21367 21549
4 21459 21582
4 21914 22281
5 21094 21324
6 22341 22645
7 20454 20610
end
format %tdCCYY-NN-DD from
format %tdCCYY-NN-DD to
Desired data (created manually) is as follow;

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id float(from to f_16 t_16 f_17 t_17 f_18 t_18 f_19 t_19 f_20 t_20 f_21 t_21)
1 21185 22191     .     .     .     . 21185 21550 21550 21915 21915 22191     .     .
2 20454 22281 20454 20820 20820 21185 21185 21550 21550 21915 21915 22281     .     .
3 21186 21337     .     .     .     . 21186 21337     .     .     .     .     .     .
3 21367 21549     .     .     .     . 21367 21549     .     .     .     .     .     .
4 21459 21582     .     .     .     . 21459 21550 21550 21582     .     .     .     .
4 21914 22281     .     .     .     .     .     . 21914 21915 21915 22281     .     .
5 21094 21324     .     . 21094 21185 21185 21324     .     .     .     .     .     .
6 22341 22645     .     .     .     .     .     .     .     .     .     . 22341 22645
7 20454 20610 20454 20610     .     .     .     .     .     .     .     .     .     .
end
format %tdCCYY-NN-DD from
format %tdCCYY-NN-DD to
format %tdCCYY-NN-DD f_16
format %tdCCYY-NN-DD t_16
format %tdCCYY-NN-DD f_17
format %tdCCYY-NN-DD t_17
format %tdCCYY-NN-DD f_18
format %tdCCYY-NN-DD t_18
format %tdCCYY-NN-DD f_19
format %tdCCYY-NN-DD t_19
format %tdCCYY-NN-DD f_20
format %tdCCYY-NN-DD t_20
format %tdCCYY-NN-DD f_21
format %tdCCYY-NN-DD t_21
Kind Regards,
Moon Lu

Event Study Hypothesis Testing

$
0
0
Hello Community,


in the recent days I was able to read a lot of questions and helpful ansewrs in statalist.

However, I was not able to completly solve my problem, which I briefly want ot introduce in the following:

I want to replicate an event study, which shows effects of a specific event (regulation) on a business key figure "days sales outstanding" (DSO). Therefore I want to analyze yearly panel data providing the DSO of several businesses over a period of time and by detect abnormal performance between the control group and the sample group.

I want test the hypotheses by examining whether a firm affected by the treatment (regulation) had abnormal performance in terms of account receivable days.

I already managed created a sample group and a control group with the respective treatment variable by using a matching technique with the help of code snippets provided by Clyde Schechter

Now I am trying to calculate the abnormal performance between the every single time period. e.g. 2000-2001; 2000-2002, ..., 2008-2009 and perform t-tests and WSR-tests on whether the change was significant.

The abnormal performance AP(t+j) = PS(t+j) - EP(t+j)
The expected performance EP(t+j) = PS(t+i) + Avg(PCk(t+j)-PCk(t+i))

AP is the abnormal performance,
EP is the expected performance,
PS is the performance of the sample (treated) group
PC is ther performance of the control (untreated) group

t=year of regulation (2008)
i= starting year of comparison (-8,-7,...,+6)
j= ending year of comparison (-7, -6,...,+7)
k is the number of control firms


I want to create a table in the following format

Period (e.g. 2000-2001) | AP Mean | p-Value (t-test) | p-value (WSR test)


Would be awesome if some of you could spend some time to help me. I appreciate any form of comment, code, or further support. Please contact me in case you have further questions!

Thanks a lot in advance!

How to create loops for creating dummies and carrying out regression analysis?

$
0
0
Hi,

I am working on an upward mobility approach which looks at the likelihood that an individual will surpass their parent's place in the distribution by a given amount conditioned on their parents being at or below a given percentile. The equation is: URM_a,ρ = Pr(Y1 − Y0 > a|Y0 ≤ ρ) where a is the probability that the child exceeds the head's place in the distribution.

I have split head and child's earnings into percentile using:

Code:
xtile head_earnings = head_wage, n(100)
xtile child_earnings = child_wage, n(100)

To generate a, I'm using the following command:
a=0
Code:
gen d0 if child_earnings-head_earnings>0

a=10%
Code:
gen d10 if child_earnings-head_earnings>10

and so on till a=30

In order to carry out the regression for a=0, a=10%, a=20%, and a=30%, I using the following command:

Code:
reg d0 if group==1 & head_wage <=10
reg d0 if group==1 & head_wage <=20
reg d0 if group==1 & head_wage <=30
reg d0 if group==1 & head_wage <=40
reg d0 if group==1 & head_wage <=50

I have 4 groups belonging to the year 2005 and 4 belonging to the year 2012. I want to carry out this regression for different values of a and export the result using outreg2 but separately for the two years. Somehow, I have no idea how should I create a loop or maybe double loop for this.
The groups are as follows:
Group 1 and Group 5 are Hindus
Group 2 and Group 6 are Muslims
Group 3 and Group 7 are Scheduled Castes/Tribes
Group 4 and Group 8 are Other Backward Castes


I have attached an example of data set variables:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float group byte(head_earnings child_earnings) float(head_wage d0 d10 d20 d30)
1  2  2 4.3367343 1 0 0 0
1 17 14  9.779411 0 0 0 0
1 28 31 12.339286 0 0 0 0
1 63 68  19.73077 0 0 0 0
1 10 12  7.683594 0 0 0 0
3 93 85  56.95454 0 0 0 0
3 54 49    17.275 0 0 0 0
3  5 13  6.316288 1 0 0 0
4  4  5  5.898693 1 0 0 0
4 95 99  63.96104 1 1 1 0
4 85 29 33.824173 0 0 0 0
4 94 43  61.73684 0 0 0 0
2 40 17 13.716216 0 0 0 0
5 44  3 14.571428 0 0 0 0
5 52 14 16.666666 0 0 0 0
5 86 92      37.5 0 0 0 0
5 26 61        12 1 0 0 0
5 29 32      12.5 0 0 0 0
5 71 77        25 0 0 0 0
5 71 77        25 0 0 0 0
5 93 20        60 0 0 0 0
7 29 32      12.5 0 0 0 0
7 13 15      8.75 0 0 0 0
7 45 61        15 1 0 0 0
8 71 57        25 0 0 0 0
8 29  5      12.5 0 0 0 0
8 11 15     8.125 1 0 0 0
6 88 61        40 0 0 0 0
6 20 10      10.5 0 0 0 0
6 25 20 11.688312 0 0 0 0
end
Viewing all 73374 articles
Browse latest View live


Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>