Accidentally creating collinearity with interaction of dummy variables

October 28, 2019, 3:50 pm

≫ Next: Adding the Values of Non-Trading Days to Trading Days

≪ Previous: Baseline hazard function for Piecewise constant model

I am trying to figure out how to correctly establish interactions between dummy variables. I need to show the differences between gender and whether athlete/nonathlete.
I feel like my code is correct. Somehow, it is omitting both the male variables for collinearity and I cannot figure out why! I must be accidentally creating duplicate somehow? I'm new to STATA so any help is MUCH appreciated

generate fem_ath=1 if female==1 & athlete==1
replace fem_ath=0 if female==1 & athlete==0
generate fem_nonath=1 if female==1 & athlete==0
replace fem_nonath=0 if fem_nonath==.
generate male_ath=1 if female==0 & athlete==1
replace male_ath=0 if male_ath==.
generate male_nonath=1 if female==0 & athlete==0
replace male_nonath=0 if male_nonath==.
reg colgpa hsize hsizesq hsperc sat fem_ath fem_nonath male_ath male_nonath

↧

Adding the Values of Non-Trading Days to Trading Days

October 28, 2019, 4:05 pm

≫ Next: Generating variable based on two groups

≪ Previous: Accidentally creating collinearity with interaction of dummy variables

Dear Statlisters,

I am working on a large dataset. Where day_status = 1 is trading and 0 is non-trading. Observations are unique by permno and date.

The trading day is when the stock market is open and the non-trading day is when it is closed.

Required: I need to add the values of uc, ic, be_rf and bu_rf of non-trading days to the subsequent trading days by writing a foreach loop. For example, The values of Saturday and Sunday must be added to the values of Monday IF it is a trading day.

Please let me know how can I get the required outcome.

Sample data is here:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(permno date) byte day_status double uc long ic double(be_rf bu_rf)
10104 18995 1 2 3 0 3
10104 18996 1 1 1 0 1
10104 18997 1 3 5 3 2
10104 18998 1 1 1 0 1
10104 18999 0 5 5 0 5
10104 19000 0 2 2 0 2
10104 19001 1 2 2 0 2
10104 19004 1 4 4 1 3
10104 19005 1 1 2 0 2
10104 19007 0 1 1 0 1
10104 19009 1 3 4 0 4
10104 19010 1 1 1 1 0
10104 19011 1 5 6 1 5
10104 19012 1 2 2 1 1
10104 19013 0 1 1 0 1
10104 19014 0 1 1 0 1
10104 19016 1 2 2 0 2
10104 19017 1 2 2 2 0
10104 19018 1 3 3 0 3
10104 19019 1 1 1 0 1
10104 19022 1 2 2 0 2
10104 19023 1 2 2 1 1
10104 19024 1 2 2 0 2
10104 19025 1 1 1 0 1
10104 19026 1 2 2 2 0
end
format %td date

↧

Generating variable based on two groups

October 28, 2019, 5:04 pm

≫ Next: Ranking in descending order

≪ Previous: Adding the Values of Non-Trading Days to Trading Days

↧

Ranking in descending order

October 28, 2019, 6:31 pm

≫ Next: Event Study in Corporate Finance

≪ Previous: Generating variable based on two groups

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(clientid partnerid year) byte indyear double industry_share_ep    byte    spec
1081    7 2014  7 .0048005590215325356 7
1324   15 2014  7  .005018132273107767 5
628   33 2014  7 .0036202515475451946 9
1103   41 2014  7  .012695363722741604 2
792   41 2014  7  .012695363722741604 2
504   46 2014  7  .006923461798578501 4
617   48 2014  7  .004891650751233101 6
1203   48 2014  7  .004891650751233101 6
1054   54 2014  7  .020323878154158592 1
487   58 2014  7  .004543224349617958 8
74   59 2014  7  .007345101330429316 3
1279    4 2015  8  .004082599189132452 .
1081    7 2015  8  .005235201679170132 .
628   33 2015  8  .003987567033618689 .
792   41 2015  8  .012957603670656681 .
1103   41 2015  8  .012957603670656681 .
1203   48 2015  8  .005978947039693594 .
617   48 2015  8  .005978947039693594 .
627   58 2015  8  .017269672825932503 .
1096   58 2015  8  .017269672825932503 .
504   58 2015  8  .017269672825932503 .
487   58 2015  8  .017269672825932503 .
599   58 2015  8  .017269672825932503 .
74   59 2015  8  .007718401029706001 .
1165   66 2015  8  .005468184128403664 .
190   86 2015  8   .03738570585846901 .
453   86 2015  8   .03738570585846901 .
526   88 2015  8  .005846200976520777 .
490 1089 2016  9   .02829659730195999 .
253 1089 2016  9   .02829659730195999 .
997 1093 2016  9  .008093434385955334 .
1490 1093 2016  9  .008093434385955334 .
1081 1100 2016  9 .0054756589233875275 .
948 1109 2016  9  .020321520045399666 .
893 1110 2016  9  .005612920969724655 .
1124 1148 2016  9                    0 .
1181 1153 2016  9 .0053636180236935616 .
1260 1164 2016  9                    0 .
1457 1192 2016  9   .01999679021537304 .
868 1205 2016  9  .020531661808490753 .
206 1210 2016  9  .005854192189872265 .
1230 1230 2016  9  .009276914410293102 .
806 1230 2016  9  .009276914410293102 .
724   19 2017 10  .005364460404962301 .
628   33 2017 10  .004450527019798756 .
436   43 2017 10  .004119019955396652 .
617   48 2017 10  .005194490309804678 .
1203   48 2017 10  .005194490309804678 .
490   52 2017 10  .030522581189870834 .
821   52 2017 10  .030522581189870834 .
285   54 2017 10  .024143129587173462 .
633   57 2017 10 .0068115307949483395 .
751   58 2017 10   .01648836024105549 .
1096   58 2017 10   .01648836024105549 .
627   58 2017 10   .01648836024105549 .
487   58 2017 10   .01648836024105549 .
1165   66 2017 10  .006617478094995022 .
190   86 2017 10   .03865887597203255 .
453   86 2017 10   .03865887597203255 .
994    5 2014 18  .006885720416903496 .
1112   11 2014 18  .016296733170747757 .
478   12 2014 18  .004438069649040699 .
918   22 2014 18  .005527980625629425 .
337   22 2014 18  .005527980625629425 .
49   59 2014 18  .007213207893073559 .
1301   66 2014 18  .005558209028095007 .
1384   84 2014 18  .012285162694752216 .
1133  171 2015 19  .017234930768609047 .
424  191 2015 19  .020662015303969383 .
1164  211 2015 19                    0 .
1162  235 2015 19  .016075843945145607 .
1449  266 2015 19  .016308801248669624 .
318  278 2015 19  .004750378429889679 .
105  289 2015 19   .01624026708304882 .
1561  293 2015 19  .007419430650770664 .
21  306 2015 19  .006908397190272808 .
113  312 2015 19  .017234930768609047 .
352  315 2015 19  .005803853273391724 .
1010  318 2015 19  .005676507484167814 .
279  319 2015 19  .005354285705834627 .
end

Hi All

I kindly request assistance for the above. I am looking to generate a code such that for each 'indyear' : the 'partnerid' with the highest 'industry_share_ep' will have a ranking of 1, and the next 2, in descending order as demonstrated in 'Spec'.

NB: The rankings in spec were done manually to give an idea of what i am seeking to achieve with the code.

↧

Event Study in Corporate Finance

October 28, 2019, 6:36 pm

≫ Next: Network Meta analysis

≪ Previous: Ranking in descending order

Hi all,

I have two questions about how I should run an event study. First, I have a staggered implementation of policy but do not know how I should run an event study. What I am doing is as follows:
1- I created an event variable which is equal to -3, -2, -1, 0 , 1,2,3,..
2- I run the following regression:
reg output ib0.event , r

event is my event variable from the first stage.

my question is whether my approach is correct or not. I would really appreciate it if someone can tell me how I can run an event study with lags and leads.

My second question is what happens if some states after a while cancel that policy. So some states change their status. I would really appreciate it if you can tell me how I should deal with it. Thank you very much for your time.

Best,
Mori

↧

Network Meta analysis

October 28, 2019, 8:40 pm

≫ Next: List of institutes

≪ Previous: Event Study in Corporate Finance

Dear All,

I am trying to run a network meta analysis for weight loss for intervention method for example diet/exercise /Mixed and would like to know the best intervention. My data consists of weight loss/gain( _ES) and se_ES) between the two groups intervention vs controls and the total number of participants in Stata. Your help would be most appreciated.

Thank you,

Sanjeva

↧

List of institutes

October 28, 2019, 10:10 pm

≫ Next: matrix var nov with zeros for base dummy level

≪ Previous: Network Meta analysis

Hiii,
I have data of various institutes month wise. There is a column named institute code and another column named institute name. The data is month wise so there are repeated institutes. I want list of Institute with there code in next column. Could any one help me out with this??. I tried list command it didn't work.
Thank you.

↧

matrix var nov with zeros for base dummy level

October 29, 2019, 1:27 am

≫ Next: 4-year interval and retardation variable

≪ Previous: List of institutes

Dear friends from the State community :
I estimated a model with dummy variables but when I take a look at the variance covariance matrix it has columns/rows with only zero values, corresponding to the baseline of the dummy variables. Hence the matrix is non-invertible.
For the sake of simplicity, let's consider a simple regression "reg y x i.gov" , with "gov" a dummy variable taking the values 1, 2 and 3. Of course the regression output doesn't include the first level of that dummy variable BUT when I see the VarCov matrix there is one column and one row for this base level (zeros), hence it is not invertible. The same happens if I consider a dummy variable taking values 0 and 1. I attach the regression output and the matrix. Can anyone lend me a hand? Many thanks..

Array

↧

4-year interval and retardation variable

October 29, 2019, 1:47 am

≫ Next: 2SLS with an instrumented variable as an instrument

≪ Previous: matrix var nov with zeros for base dummy level

Dear all:

I use fixed-effect threshold model to study the effect of aid on FDI. The data is yearly with 1 year delay for control variables and variable of interest to avoid posible endogeneity problem.
Recently, I've read some relevant studies in which the authors use 3- or 5-year interval to estimate the effect. So I would like to test with my sample.
The original sample covers from 2003 to 2016 and I remove the data for 2003 and 2004, thus it can be divided into 4-year interval. 2005-2008(t1), 2009-2012(t2) and 2013-2016(t3).
My question is that by applying 4-year interval, the endogeneity problem will still existe? And if it exists, the new retarded variable now refers to t-1 not year-1? That is to say, I have generate retarded variable by: gen Xr4=l4.X ?
Thanks

Dongni

↧

2SLS with an instrumented variable as an instrument

October 29, 2019, 2:42 am

≫ Next: Marginal Effects of a Fractional Response Model

≪ Previous: 4-year interval and retardation variable

Hello,

I am studying the effect of Foreign Direct Investment (FDI) presence in an industry on the productivity of domestic firms, and want to estimate the following regression: productivity = b0 + b1*foreign_buyers_share + b2*industry_foreign_share + b*X, where both foreign buyers share (each firm's share of foreign buyers) and industry_foreign_share (share of foreign firms in the industry) are endogenous variables. Some prior literature uses industry_foreign_share as an instrument for foreign_buyers_share (because in industries with higher foreign participation firms are expected to have more foreign buyers as well). However, more recently it has been argued that industry_foreign_share affects the productivity of domestic firms directly as well, and not only through its effect on foreign_buyers_share. I have two other variables, z1 and z2, that I believe are good instruments for industry_foreign_share.

My first, more theoretical question is: can I use the predicted values of industry_foreign_share as an instrument for foreign_buyers_share in a 2SLS regression? Is there any benefit to using 3SLS instead? From the way I understood the threads I read already, 3SLS may be more efficient, but if the Hausman test shows significant differences between 2SLS and 3SLS, one should go for 2SLS?

My second question is how I should go about doing this in Stata. Should I use reg3, use ivregress or ivreg2 twice, or follow the instructions from this Stata article about instruments for recursive systems, but then perform this twice as well: https://www.stata.com/support/faqs/s...rsive-systems/?

Any help or suggestions are appreciated.

Regards,
Dea

↧

Marginal Effects of a Fractional Response Model

October 29, 2019, 2:54 am

≫ Next: How to find the index of the observation that has the maximum value of a variable

≪ Previous: 2SLS with an instrumented variable as an instrument

Hi all,

I have a Fractional Response Model where my dependent variable is bounded between 0 and 1 - with a lot of zeros.

Some of my independent variables lie between 0 and 1 as well, but not all of them.

Should I go for

Code:

margins, dydx(*)

Code:

margins, dyex(*)

?

Here is the output as well:

Code:

 fracreg logit y x1 x2 x3 x4 if datayearfiscal==2008

Iteration 0:   log pseudolikelihood = -197.08665  
Iteration 1:   log pseudolikelihood = -182.02928  
Iteration 2:   log pseudolikelihood = -181.99607  
Iteration 3:   log pseudolikelihood = -181.99607  

Fractional logistic regression                  Number of obs     =        279
                                                Wald chi2(4)      =      87.36
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -181.99607               Pseudo R2         =     0.0589

------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   1.254019   .3392668     3.70   0.000     .5890682     1.91897
          x2 |  -2.369368   .9091443    -2.61   0.009    -4.151258   -.5874784
          x3 |   .1100148   .1341461     0.82   0.412    -.1529068    .3729364
          x4 |  -2.741405   .4381923    -6.26   0.000    -3.600246   -1.882564
       _cons |   .2069838   .2960355     0.70   0.484    -.3732351    .7872028
------------------------------------------------------------------------------

x1, x2, and x3 lie between 0 and 1 while x4 does not have bounds.

Code:

 margins, dydx(*)

Average marginal effects                        Number of obs     =        279
Model VCE    : Robust

Expression   : Conditional mean of y, predict()
dy/dx w.r.t. : x1 x2 x3 x4

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .2888261   .0765901     3.77   0.000     .1387123    .4389399
          x2 |  -.5457139      .2072    -2.63   0.008    -.9518185   -.1396092
          x3 |   .0253387   .0308573     0.82   0.412    -.0351405    .0858178
          x4 |  -.6314015   .0943868    -6.69   0.000    -.8163962   -.4464069
------------------------------------------------------------------------------

Code:

 margins, dyex(*)

Average marginal effects                        Number of obs     =        279
Model VCE    : Robust

Expression   : Conditional mean of y, predict()
dy/ex w.r.t. : x1 x2 x3 x4

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/ex   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .1726102   .0459446     3.76   0.000     .0825605    .2626599
          x2 |   -.061035   .0229026    -2.66   0.008    -.1059233   -.0161466
          x3 |   .0125208   .0152707     0.82   0.412    -.0174091    .0424508
          x4 |  -.1666824   .0238142    -7.00   0.000    -.2133574   -.1200074
------------------------------------------------------------------------------

Is the following correct for the dyex?
1% increase in x1 increases the value of y by 0.173

Does that make sense to talk about 1% change in x1 while it is already in percentage as well. Would it be better to go for dydx, instead?

Thanks a lot.

↧

How to find the index of the observation that has the maximum value of a variable

October 29, 2019, 5:33 am

≫ Next: Finding the values behind encode

≪ Previous: Marginal Effects of a Fractional Response Model

Hello, I have two questions:
- Is there easy way to retrieve the index of the observation that has the maximum value for a variable?
- How can I retrieve the value of a variable at the i-th observation point?

Thank you,

↧

Finding the values behind encode

October 29, 2019, 5:49 am

≫ Next: different 'Beta' values in the same Effect coding in regression

≪ Previous: How to find the index of the observation that has the maximum value of a variable

Hi All,

I am working with two rounds of survey data, that interviews individuals across different states (varname v024) in India. I want to append the datasets but there are a few issues with the encoded state names that I need to sort out.

For example in data from 2015-16

Code:

tab v024

                      state |      Freq.     Percent        Cum.
----------------------------+-----------------------------------
andaman and nicobar islands |      2,811        0.40        0.40
             andhra pradesh |     10,428        1.49        1.89
          arunachal pradesh |     14,294        2.04        3.94
                      assam |     28,447        4.07        8.00
                      bihar |     45,812        6.55       14.55
                 chandigarh |        746        0.11       14.65
               chhattisgarh |     25,172        3.60       18.25
   --------------------------------------------------------------

Here for example the state andhra pradesh is encoded with value 2.

In the data from 2005-06, however, label names and values change:

Code:

 tab v024

                 state |      Freq.     Percent        Cum.
-----------------------+-----------------------------------
[jm] jammu and kashmir |      3,281        2.64        2.64
 [hp] himachal pradesh |      3,193        2.57        5.20
           [pj] punjab |      3,681        2.96        8.16
      [uc] uttaranchal |      2,953        2.37       10.54
          [hr] haryana |      2,790        2.24       12.78
            [dl] delhi |      3,349        2.69       15.47
        [rj] rajasthan |      3,892        3.13       18.60
    [up] uttar pradesh |     12,183        9.79       28.40
            [bh] bihar |      3,818        3.07       31.47
           [sk] sikkim |      2,127        1.71       33.18
[ar] arunachal pradesh |      1,647        1.32       34.50
         [na] nagaland |      3,896        3.13       37.63
          [mn] manipur |      4,512        3.63       41.26
          [mz] mizoram |      1,791        1.44       42.70
          [tr] tripura |      1,906        1.53       44.23
        [mg] meghalaya |      2,124        1.71       45.94
            [as] assam |      3,840        3.09       49.03
      [wb] west bengal |      6,794        5.46       54.49
        [jh] jharkhand |      2,983        2.40       56.89
           [or] orissa |      4,540        3.65       60.54
     [ch] chhattisgarh |      3,810        3.06       63.60
   [mp] madhya pradesh |      6,427        5.17       68.77
          [gj] gujarat |      3,729        3.00       71.77
      [mh] maharashtra |      9,034        7.26       79.03
   [ap] andhra pradesh |      7,128        5.73       84.76
        [ka] karnataka |      6,008        4.83       89.59
              [go] goa |      3,464        2.78       92.37
           [ke] kerala |      3,566        2.87       95.24
       [tn] tamil nadu |      5,919        4.76      100.00
-----------------------+-----------------------------------
                 Total |    124,385      100.00

And the same state andhra pradesh now has label [ap] andhra pradesh with value equal to 28.

I thought to fix this I could instead generate a new variable called state, replace values and define labels to match 2015-16, and then append the two, dataset after creating a variable called state in 2015-16.

Code:

gen state =.
replace state = 2 if v024 == 28
replace state = 3 if v024 == 12
replace state = 4 if v024 == 18

label define 2 "andhra pradesh"  3 "arunachal pradesh" 4 "assam"

Else, appending without these changes result in the wrong states being matched based on the encoded value.

My question now is, given the rather large number of observations,how do I find the corresponding value behind each label without having to scroll through the data browser ie 1 - andaman and nicobar islands, 2- andhra pradesh 3 - arunachal pradesh etc? Also does the aforementioned method seem like the most efficient way to accomplish the correct append?

Thanks a lot!

Best,
Lori

↧

different 'Beta' values in the same Effect coding in regression

October 29, 2019, 5:57 am

≫ Next: Reshaping multiple variables at once

≪ Previous: Finding the values behind encode

I have entered a categorial variable (country) to a regression and I coded their values in Effect coding (0, 1, -1). In other words, I put them in the regression to get the 'beta' values of the coefficients as in the post-regression contrast (compared to the grand mean). I did the analysis twice, each time another country got the value -1 in order to getting data also about the comparison country.

I ended up getting results of the two regressions with the same coefficients, standard deviations, and significance. Only the value of the beta came out different the two times I ran the regressions. I don't understand how it can be! To my understanding, beta is simply the standardized coefficient.

I attach both outputs, marked a random country, but you can see that in both outputs all the values of all countries are completely identical except the 'beta' values. Array Array

Questions:
A. How can such a situation arise? How is the beta calculated in stata ?
B. How, however, can I get the correct 'beta' values for a categorical variable in the effect coding for comparison to the grand mean? Is there any way to do this in Stata?

Thanks!

↧

Reshaping multiple variables at once

October 29, 2019, 6:05 am

≫ Next: Multiple imputation when there is a certain group that did not answer a certain question

≪ Previous: different 'Beta' values in the same Effect coding in regression

Hi All,

I have data that resembles the following:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str2 varlist float(yr1995 yr1996 yr1997)
1 "x1" 2 23 1
1 "x2" 3  4 3
1 "x3" 2  2 2
2 "x1" 3  2 2
2 "x2" 2  2 2
2 "x2" 1  2 3
end

In the above, I have information by individual id, for different years, for the variables x1, x2 and x3. I wish to reshape this data to long format, where each variable (x1, x2 and x3) is stacked in an independent column, where each entry in the column is the value of that variable for a particular year.

Normally, the reshape command would do the trick, but the fact that there are multiple variables complicates this. Is there a straightforward way to implement this?

Many thanks,
CS

↧

Multiple imputation when there is a certain group that did not answer a certain question

October 29, 2019, 7:27 am

≫ Next: Collapse by what?

≪ Previous: Reshaping multiple variables at once

Hello Statalist,

I have a general question about multiple imputations method in Stata.
There are multiple variables with missing data for my analysis, and I want to use MI. The problem is that a certain group of participants did not answer a particular question by design. As a result, the variable is missing for that group. In detail, the question was for anyone who had a child/children at the time of the survey, so those who did not have a child at the time of the survey did not answer the question at all.

I am not sure how to use MI for this variable. This is an important variable in the model, so I cannot exclude it.
One way might be to assign a certain value to those participants (e.g., -9) and treat it like non-missing then do MI. But I am not sure if this would create bias or not.

I would appreciate any advice on this issue.
Thank you.

↧

Collapse by what?

October 29, 2019, 7:32 am

≫ Next: Hausman Test producing different results

≪ Previous: Multiple imputation when there is a certain group that did not answer a certain question

Hi everyone,

I have the following database which I describe quickly. Basically it is a escalation of each Year according to a certain criterion which is not of interest in this context. As you can see, it can happen that a single year is associated with more than one rescaled value (e.g. see 0 -1 -9 . . . . . 2004, 3 2 -6 . . . . 3.367997 2007 and others):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(rescaled1 rescaled2 rescaled3 rescaled4 rescaled5 rescaled6 rescaled7 av_t Year)
-11  .  . . . . .         . 2004
  0  .  . . . . .         . 2015
 -5  .  . . . . .         . 2004
 -4  .  . . . . .         . 2005
 -3  .  . . . . .         . 2006
 -2  .  . . . . .  2.610031 2007
 -1  .  . . . . .         . 2008
  0  .  . . . . .         . 2009
  1  .  . . . . .         . 2010
  2  .  . . . . .         . 2011
  0 -1 -9 . . . .         . 2004
  1  0 -8 . . . .         . 2005
  2  1 -7 . . . .         . 2006
  3  2 -6 . . . .  3.367997 2007
  4  3 -5 . . . . 2.9179316 2008
  5  4 -4 . . . .  3.071436 2009
  6  5 -3 . . . .  3.307787 2010
  7  6 -2 . . . .  2.887359 2011
  8  7 -1 . . . . 2.2057533 2012
  9  8  0 . . . .  3.188269 2013
 10  9  1 . . . .  3.572203 2014
 11 10  2 . . . .         . 2015
 -4  .  . . . . .         . 2005
 -3  .  . . . . .         . 2006
 -2  .  . . . . .         . 2007
 -1  .  . . . . . -.2077017 2008
  0  .  . . . . .  .9114361 2009
  1  .  . . . . . .19138622 2010
  2  .  . . . . .         . 2011
  3  .  . . . . . -.6401815 2012
  4  .  . . . . .  2.249567 2013
  5  .  . . . . . 2.2605314 2014
  6  .  . . . . .  .7698421 2015
 -5  .  . . . . .         . 2004
 -4  .  . . . . .         . 2005
 -3  .  . . . . .         . 2006
 -2  .  . . . . .         . 2007
 -1  .  . . . . .         . 2008
  0  .  . . . . .         . 2009
  1  .  . . . . .         . 2010
  2  .  . . . . .         . 2011
  3  .  . . . . .         . 2012
  4  .  . . . . .         . 2013
  5  .  . . . . .         . 2014
  6  .  . . . . . -.1879759 2015
 -5  .  . . . . .         . 2004
 -4  .  . . . . .         . 2005
 -3  .  . . . . .         . 2006
 -2  .  . . . . .         . 2007
 -1  .  . . . . .         . 2008
  0  .  . . . . .         . 2009
  1  .  . . . . .         . 2010
 -7  .  . . . . .         . 2004
 -6  .  . . . . .         . 2005
 -5  .  . . . . .         . 2006
 -4  .  . . . . .         . 2007
 -3  .  . . . . .         . 2008
 -2  .  . . . . .  .9470272 2009
 -1  .  . . . . .         . 2010
  0  .  . . . . .         . 2011
  1  .  . . . . .         . 2012
  2  .  . . . . .         . 2013
  3  .  . . . . .         . 2014
  4  .  . . . . . -3.317956 2015
 -5 -6  . . . . .         . 2004
 -4 -5  . . . . .         . 2005
 -3 -4  . . . . .         . 2006
 -2 -3  . . . . .         . 2007
 -1 -2  . . . . .         . 2008
  0 -1  . . . . .         . 2009
  1  0  . . . . .         . 2010
  2  1  . . . . .         . 2011
  3  2  . . . . .         . 2012
  4  3  . . . . .         . 2013
  5  4  . . . . .         . 2014
  6  5  . . . . .         . 2015
 -6 -7  . . . . .         . 2004
 -5 -6  . . . . .         . 2005
 -4 -5  . . . . .         . 2006
 -3 -4  . . . . . 2.4716835 2007
 -2 -3  . . . . .  3.514191 2008
 -1 -2  . . . . .  3.426614 2009
  0 -1  . . . . .  3.675565 2010
  1  0  . . . . .  3.166685 2011
  2  1  . . . . .         . 2012
  3  2  . . . . .         . 2013
  4  3  . . . . .         . 2014
  5  4  . . . . .         . 2015
-11  .  . . . . .         . 2004
-10  .  . . . . .         . 2005
 -9  .  . . . . .         . 2006
 -8  .  . . . . .  .6483145 2007
 -7  .  . . . . . -.6747503 2008
 -6  .  . . . . .  .6400528 2009
 -5  .  . . . . .         . 2010
 -4  .  . . . . . -.4608803 2011
 -3  .  . . . . .  -.381546 2012
 -2  .  . . . . .         . 2013
 -1  .  . . . . .         . 2014
  0  .  . . . . .         . 2015
end

I am trying to collapse (mean) the variable av_t and specifically I would like to perform a mean by rescaled year, thus ending up with 22 values (a mean for rescaled year -11, another for -10...finally a mean for rescaled year 11). The problem is the "by" option basically. Since I have rescaled the years, to a single year can correspond several rescaled years (e.g. 2011 is rescaled both as 4 and as -1 for idpr 15 in #2). Hence when performing the mean, I would like the collapse to include an av_t in the numerator of the mean of each rescaled year corresponding to it. So for instance, for idpr 15 in issue 2 if av_t in 2011 = 3.4, since 2011 is rescaled both as 4 and -1, 3.4 should be in the numerator (and also counted in the denominator) of both the mean in -1 and in the rescaled year 4.

I am trying to end up with a graphic that looks like the one I attach below.

Can someone please help?

↧

Hausman Test producing different results

October 29, 2019, 7:52 am

≫ Next: Compound double quotes for single quotes

≪ Previous: Collapse by what?

I am using panel data and trying to decide between the fixed and random effects models to use.

When I type "hausman fixed random", the results are not to reject the null hypothesis. Nonetheless, I have been warned instead do write the hausman test code considering random effects first: "hausman random fixed", and when I do this it results in a negative p-value error, even when adding the option sigmamore.

Why does it make such a difference which effects to consider first in the hausman test code? Also, what are my options trying to solve this problem of negative p-value?

Best,
João

↧

Compound double quotes for single quotes

October 29, 2019, 8:39 am

≫ Next: Difference in Difference with multiple time periods

≪ Previous: Hausman Test producing different results

Dear Statalisters,

I'm wondering if there is a possibility like compound double quotes for single quotes used in locals. Here is what I am doing in a pretty crude example:

Code:

loc text=""
forvalues i=1(1)2 {
      loc t1="var_`i'"
      loc text="`text' `t1'[`r']"
}

I would like to get this:
loc text= var_1[`r'] var_2[`r']

So that I can use it in another loop over r.

But no matter what I try, it doesn't seem to be possible to write `r' into a new local without Stata directly replacing it by the not-existing local r.

Thanks a lot for your help,
Anna

↧

Difference in Difference with multiple time periods

October 29, 2019, 8:59 am

≫ Next: One sided meta analysis

≪ Previous: Compound double quotes for single quotes

Dear All,

I am working with a cross-sectional pooled dataset with 8,586 individual-level observations, covering 15 countries, between 2004-2016 (stata v14). I am trying to estimate the effect of a policy change on the probability of individuals exhibiting a specific labour market outcome.

I have a treated group of individuals, and the policy is implemented in 13 countries. The data captures information for the pre-treatment period (2004-2007), during the treatment period (2007-2014) and post-treatment period (2014-2016). The treatment period starts in 2007 for all 13 countries, but ends at different times (i.e 2009, 2011, 2012 and 2014 the latest year). I find this latter aspect particularly difficult to reflect in the model.

I also have individual level variables, country level variables and year level variables. I would like to capture the effect of the policy on individuals within the same country, but also variation between countries. What i am trying to do, is use a probit diff-in-diff estimator in a multilevel modelling setting that nests individuals -> country-years, country-years -> countries, and countries -> years.

Given all these parameters, does the below model reflect what i am trying to obtain? Is such a model feasible? Am i overlooking something?

Code:

 meprobit outcome i.year i.country pre##treated during##treated post##treated covariates || _all: R.year || country: || country#year:

Thank you very much for your help, it is most appreciated!
Magda

↧