Running multiple event studies using the loop command

July 15, 2019, 12:18 pm

≪ Previous: Get levels value from differenced forecast

Dear Statalist users,

I am new to posting in Statalist and was hoping I could find some help on loop and matrix commands.

I am conducting an event study of the impact of central bank announcements on foreign stock market returns. I have reviewed the suggestions made previously on this site and have created a program which effectively produces the abnormal returns (ARs) and cumulative abnormal returns (CARs) for a single announcement. The event window is 40 days around the event day with an estimation window of roughly half a year (180 days) prior. Here is the code for a single event which I created following the guides from Princeton (https://dss.princeton.edu/online_hel...ventstudy.html) and Dr. Woochan Kim (http://www.sunwoohwang.com/Event_Study_STATA.pdf),

Code:

gen ret = 100*ln(RealPrice[_n]/RealPrice[_n-1])

gen day_cnt=_n
tsset day_cnt
tssmooth ma mkret = ret, window(100,1)
gen target_day=day_cnt if Change<0
egen target_id = group(target_day)
egen max_target_day=max(target_day)
gen evday=day_cnt-max_target_day

sort evday
gen evt_window=1 if evday>=-20 & evday<=20
gen est_window=1 if evday<=-21 & evday>=-200
drop if evt_window==.&est_window==.

qui reg ret mkret if est_window==1
gen rmse=e(rmse)

predict phat
gen ar=ret-phat if evt_window==1
drop phat
drop if evt_window==0
keep if evday>=-20 & evday<=+20
egen car=sum(ar)
gen tstat=car/(rmse*sqrt(_N))

gen car_ca = sum(ar)
list ar car_ca car tstat
graph twoway line car_ca evday

However I am having a difficult time with two issues in extending this program to further events,

1. There is a total of 52 events across the time sampled, 1990m1 - 2009m12. I would like to build a loop which can store the ARs and CARs from each announcement. In this manner I can...
2. Calculate the average ARs and CARs for the whole period.

I am using STATA 15.1

Here is a snapshot of my data. In this example I am looking only at negative surprises (i.e. when actual interest rate changes<expected interest rate changes), so that is why the target_id does not count positive surprises.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long Date double RealPrice str6 NewtargetFFrate str5 Change byte Surprise str3 Expected float(ret day_cnt target_day target_id mkret)
12161988 2464.0391 "0"     "0"       0 "0"    -2.8297124  980    . .    .05851697
12191988 2371.7617 "0"     "0"       0 "0"      -3.81689  981    . .   .014021377
12201988 2395.5908 "0"     "0"       0 "0"      .9996869  982    . .   .008627481
12211988 2460.5762 "0"     "0"       0 "0"      2.676567  983    . .    .03253506
12221988 2496.4266 "0"     "0"       0 "0"       1.44648  984    . .    .06023775
12231988 2494.6437 "0"     "0"       0 "0"   -.071443595  985    . .   .072360724
12261988 2477.7128 "0"     "0"       0 "0"     -.6810037  986    . .     .0770777
12271988 2466.8838 "0"     "0"       0 "0"     -.4380142  987    . .    .06843801
12281988 2440.3787 "0"     "0"       0 "0"    -1.0802503  988    . .    .07005557
12291988 2444.1652 "0"     "0"       0 "0"      .1550401  989    . .    .06855935
  121989 2362.7816 "0"     "0"       0 "0"    -3.3864064  990    . .    .03809728
  131989 2327.7026 "0"     "0"       0 "0"    -1.4957796  991    . .   .022140115
  141989 2317.2013 "0"     "0"       0 "0"     -.4521651  992    . .   .024277734
  151989 2342.0382 "9"     "31.25"  -7 "38"    1.0661454  993  993 3   .036884636
  161989 2346.0133 "0"     "0"       0 "0"     .16958435  994    . .    .05970508
  191989 2331.7127 "0"     "0"       0 "0"     -.6114358  995    . .    .06971015
 1101989 2295.6297 "0"     "0"       0 "0"    -1.5595877  996    . .    .05868369
 1111989 2295.3626 "0"     "0"       0 "0"    -.01163583  997    . .    .04518678
 1121989 2335.5932 "0"     "0"       0 "0"      1.737508  998    . .   .034511913
 1131989  2360.314 "0"     "0"       0 "0"     1.0528755  999    . .    .03878065
 1161989 2389.6377 "8.5"   "-25"     4 "-29"   1.2347103 1000    . .    .05540232
 1171989 2444.1239 "0"     "0"       0 "0"      2.254497 1001    . .    .08175389
 1181989 2417.8188 "0"     "0"       0 "0"    -1.0820924 1002    . .    .07725401
 1191989  2378.164 "0"     "0"       0 "0"     -1.653705 1003    . .    .06821929
 1201989 2395.4742 "8.25"  "-25"   -17 "-8"     .7252446 1004 1004 4    .07172761
 1231989 2370.7793 "0"     "0"       0 "0"    -1.0362487 1005    . .    .05284029
 1241989 2360.8008 "0"     "0"       0 "0"     -.4217836 1006    . .    .03849471
 1251989 2361.0746 "0"     "0"       0 "0"    .011597087 1007    . .   .013611864
 1261989 2389.8247 "0"     "0"       0 "0"     1.2103162 1008    . .   .003226084
 1271989 2399.5271 "0"     "0"       0 "0"       .405166 1009    . .   .007926226
 1301989 2378.5798 "0"     "0"       0 "0"     -.8768089 1010    . .  -.003822959
 1311989 2368.4818 "0"     "0"       0 "0"     -.4254428 1011    . .  -.005608775
  211989 2355.9894 "0"     "0"       0 "0"    -.52883923 1012    . .    .02691932
  221989 2365.0959 "0"     "0"       0 "0"      .3857804 1013    . .    .03531341
  231989 2395.7901 "0"     "0"       0 "0"       1.28945 1014    . .    .03545245
  261989 2400.4548 "0"     "0"       0 "0"     .19451474 1015    . .   .016686652
  271989 2399.9275 "0"     "0"       0 "0"   -.021969084 1016    . .     .0103428
  281989 2401.2843 "0"     "0"       0 "0"     .05651907 1017    . .   .018845793
  291989 2390.9525 "9.125" "12.5"    1 "11"    -.4311897 1018    . .   .016037846
 2101989 2382.9063 "0"     "0"       0 "0"     -.3370945 1019    . . -.0025060754
 2131989  2361.808 "0"     "0"       0 "0"      -.889345 1020    . .  -.011229324
end

Below is my code:

Code:

*Creating my AR matrix and running my loop. This is where I run into my problems*
mat mat_ar = J(41,52,.)

foreach x of varlist target_id{
    egen max_target_day=max(target_day) if id==`x'
    gen evday=day_cnt-max_target_day

    sort evday
    gen evt_window=1 if evday>=-20 & evday<=20
    gen est_window=1 if evday<=-21 & evday>=-200
    drop if evt_window==.&est_window==.
    
    qui reg ret mkret if est_window==1
    gen rmse=e(rmse)

    predict phat
    mat mat_ar[`a', `i'+1] = ret-phat if evt_window==1
    drop phat
    drop if evt_window==0
    keep if evday>=-20 & evday<=+20
    
    egen car=sum(ar)
    gen tstat=car/(rmse*sqrt(_N))
    gen car_ca = sum(ar)
}

*Calculating the average ARs over the sample and the overall CAR and significance*
egen AR_mean = rowmean(mat_ar)
egen CAR_mean = sum(AR_mean)
gen tstat = CAR_mean/(rmse*sqrt(41))
gen CAR_sum = sum(AR_mean)
list AR_mean CAR_sum
graph twoway line CAR_sum evday

*Visualizing the announcement impact*
list ar car_ca
graph twoway line car_ca evday

My event window is <-20,1,+20> so I create a matrix with 41 rows and 52 columns comprising the 52 events. Then I attempt to run the event study regression 52 times, storing the 41 AR values in a column of the matrix. In the end I hope to average across the rows to calculate the average AR for each evday. This way I can calculate the eventual CAR for the period.

However I am new to using loops in STATA and was hoping someone could help advise on how to continue? What I am looking for is rather simple but I am unfamiliar with the commands. Is there a simpler approach to calculating the mean ARs and overall CAR for the sample?

I am open to your advice, thank you.

↧

Specific Problem with Stata

July 15, 2019, 1:15 pm

≫ Next: question about xtabond2

≪ Previous: Running multiple event studies using the loop command

Delete this post please.

↧

question about xtabond2

July 15, 2019, 1:33 pm

≫ Next: Piecewise regression with interaction with categorical var - some groups are non-linear after spline

≪ Previous: Specific Problem with Stata

Dear all,

I am just learning about xtabond2 and gmm methods, so please correct me if I am wrong. I have an IV x that is endogenous and a set of monthly variables to control for seasonality.

My T=52 (weekly data) and N=523

Is it correct to the below syntax? (monthly variables are lumped together below but will be specified)

xtabond2 y L.y x feb-dec, gmmstyle(L.y L.x) ivstyle(feb-dec) twostep robust

Since my T is relatively large, should I specify, say, gmmstyle(L.y L.x, lag(2 8)? The other question is how do I know if I want to include L2.y and L2.x in addition to L.y and L.x?

Thank you!

Hannah

↧

Piecewise regression with interaction with categorical var - some groups are non-linear after spline

July 15, 2019, 4:03 pm

≫ Next: wordcb, a new command for creating codebooks in Microsoft Word format, is available on SSC

≪ Previous: question about xtabond2

I have used mkspline to make the variables for a piecewise regression

Code:

mkspline preH 3 postH = time
generate jump = 1
replace  jump = 0 if time < 3

Then fit a piecewise regression with an interaction by a categorical variable with 3 levels:

Code:

mixed hrelsat c.preH#ibn.H_Group i.jump#ibn.H_Group c.postH#ibn.H_Group || CSID: preH jump postH

This works just fine (see attached), but when I plot the raw means by group (see attached) I see that two of the groups appear to have a quadratic trend after the spline while only one is truly linear. Is there a way to incorporate these quadratic trends after the spline for these two groups to at least test if the quadratic is actually significant?

Array Array

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double CSID byte time double(hrelsat wrelsat H_Group W_Group H5_Exposure W5_Exposure) byte(preH postH) float jump
10056 0                50                46 3 3 1  2 1 0 0
10056 1                50                46 3 3 1  2 2 0 0
10056 2                46                50 3 3 1  2 3 0 1
10056 3                50                43 3 3 1  2 3 2 1
10056 4                48                27 3 3 1  2 3 3 1
10056 5                50                41 3 3 1  2 3 4 1
10060 0                47                46 2 2 4  . 1 0 0
10060 1                35                15 2 2 4  . 2 0 0
10060 2                37                46 2 2 4  . 3 0 1
10060 3                 .                 . 2 2 4  . 3 2 1
10060 4                 .                 . 2 2 4  . 3 3 1
10060 5                 .                 . 2 2 4  . 3 4 1
10073 0                48                44 3 3 5  5 1 0 0
10073 1                50                52 3 3 5  5 2 0 0
10073 2                49                52 3 3 5  5 3 0 1
10073 3                47                52 3 3 5  5 3 2 1
10073 4                51                52 3 3 5  5 3 3 1
10073 5                48                52 3 3 5  5 3 4 1
10080 0                51                52 3 3 7  5 1 0 0
10080 1                51                49 3 3 7  5 2 0 0
10080 2                48                50 3 3 7  5 3 0 1
10080 3                46                46 3 3 7  5 3 2 1
10080 4                45                48 3 3 7  5 3 3 1
10080 5                46                50 3 3 7  5 3 4 1
10081 0                17                20 1 1 .  . 1 0 0
10081 1                 .                 . 1 1 .  . 2 0 0
10081 2                 .                 . 1 1 .  . 3 0 1
10081 3                50                41 1 1 .  . 3 2 1
10081 4                 .                 . 1 1 .  . 3 3 1
10081 5                 .                 . 1 1 .  . 3 4 1
10089 0                45                30 3 2 3  6 1 0 0
10089 1                46                32 3 2 3  6 2 0 0

↧

wordcb, a new command for creating codebooks in Microsoft Word format, is available on SSC

July 15, 2019, 4:50 pm

≫ Next: Editing graph

≪ Previous: Piecewise regression with interaction with categorical var - some groups are non-linear after spline

wordcb creates a Microsoft Word format codebook of the dataset in memory. Stata 15.1 is required.

My research group and I work with a lot of different datasets from a multitude of sources; we need to document certain aspects of the data we have regularly. We got tired of that being time consuming, so I wrote this. The command is useful for data documentation and archival, or for initial data exploration.

By default, the output Microsoft Word file includes data file metadata, and for each variable specified provides variable information (label, value label, type, notes, etc) and five random examples of values. Users can control how many values are shown, and can optionally specify to show a frequency distribution sorted ascending by value or descending by frequency (similar to the sort option of tabulate oneway).

The number of values shown cannot be specified for each variable; instead users should invoke the command multiple times with the nodta option, which suppresses file metadata, and the append option.

There is another limit... Stata 15's putdocx command, on which this relies, can run out of memory when either a large number of variables (i.e., hundreds) or a large number of values are specified.

I was all set to present this at the Stata Conference, but an existential threat to my employer changed my ability to travel to Chicago.

Thanks as ever to Kit Baum for getting this up to SSC so quickly!

↧

Editing graph

July 15, 2019, 6:14 pm

≫ Next: Gender wage gap, hourly wage or weekly wage

≪ Previous: wordcb, a new command for creating codebooks in Microsoft Word format, is available on SSC

Hi,

I need to edit a graph that starts from 0 to start from 10. I know how to change the x axis range to start from 10, but the dots corresponding to x values smaller than 10 are still in my graph.

For example, I want to convert graph 1 below to graph 2, but the problem is that I only have the graph file and do not have the data that created the graph, so I cannot add a condition to the code to create the graph only if x>=10. Is it possible to do it on the graph without having the data?

Code:

sysuse auto, clear
replace length=length-142

***graph 1
twoway (scatter weight length), xlabel(10(10)91)

***graph 2
twoway (scatter weight length) if length>=10, xlabel(10(10)91)

Thanks

↧

Gender wage gap, hourly wage or weekly wage

July 15, 2019, 7:40 pm

≫ Next: Validity of exclusion restriction in an Heckman IV model

≪ Previous: Editing graph

Hello everyone,

I am doing a project on gender wage gap using Oaxaca Decomposition. I am wondering if there is any difference between

1) using the hourly wage as the dependent variable

vs

2) using the weekly wage as dependent variable
with
number of hours worked per week (excl Overtime) and number of overtime hours per week as independent variable?

Thank you, any help would be greatly appreciated!

↧

Validity of exclusion restriction in an Heckman IV model

July 15, 2019, 8:09 pm

≫ Next: Time series data set up for time series analysis

≪ Previous: Gender wage gap, hourly wage or weekly wage

Dear Statalist Users,

I was hoping to get some inputs on how to evaluate the validity of the exclusion restriction in the Heckman selection model.

My sample suffers from the issue of self-selection and simultaneity. To correct for the latter, I have instruments and to correct for first I use the Heckman model.

So my selection equation is Probability of firms entering the international markets, and in the outcome equation, I take the intensity of the foreign market participation regressed on a set of variables. My exclusion restriction is Age. and for the outcome equation, I make use of xtivreg2 whereas selection is a probit estimation. I have panel data.

How do I establish the validity of my exclusion restriction?

↧

Time series data set up for time series analysis

July 15, 2019, 10:22 pm

≫ Next: Using global variables within forvalue loops

≪ Previous: Validity of exclusion restriction in an Heckman IV model

Dear Altruistic,
I'm using Stata 13. My data set is in following formation

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte SpecID long LabID str42 Orgid str10 Patid str1 Sex int(Date Age) byte(Agem Aged) str1(Tetracycline Ampicillin Chloramphenicol Gentamicin Cotrimoxazole NalidixicAcid)
6 1609648 "Staphylococcus haemolyticus" "JAN1800245" "M" 1 50 0  0 "S" ""  "" "R" "R" ""
4 1609605 "Proteus species"             "JAN1800054" "F" 1 40 0  0 ""  "R" "" "R" "R" "R"
1 1609631 "Burkholderia cepacia"        "JAN1800170" "M" 1 66 0  0 ""  ""  "" "R" "S" ""
7 1609351 "Escherichia coli"            "JAN1800170" "M" 1 66 0  0 ""  "R" "" "R" "R" "R"
4 1609670 "Enterobacter cloacae"        "JAN1800282" "F" 1 66 0  0 ""  ""  "" "S" "S" "S"
4 1609625 "Enterococcus faecalis"       "JAN1800135" "F" 1 66 0  0 "R" ""  "" ""  ""  ""
6 1609692 "Acinetobacter species"       "JAN1800306" "F" 1 80 0  0 ""  "R" "" "R" "R" ""
4 1609632 "Providencia rettgeri"        "JAN1800170" "M" 1 66 0  0 ""  "R" "" "R" "R" "R"
3 1609652 "Campylobacter species"       "JAN1800232" "M" 1  9 6 23 "R" "S" "" ""  "R" ""
6 1609626 "Escherichia coli"            "JAN1800159" "M" 1 38 0  0 ""  "R" "" "R" "R" "R"
end
format %td Date
label values SpecID SpecIDl
label def SpecIDl 1 "Blood", modify
label def SpecIDl 3 "Stool", modify
label def SpecIDl 4 "Urine", modify
label def SpecIDl 6 "PUS", modify
label def SpecIDl 7 "Tracheal aspirate", modify

Here the problem is Date variable which entered as numerical value eg. 1= 1st January, 2= 2nd January, 3= 3rd January.....365= 31st December but not in Stata recommended format. So I recoded in following way to get the month

Code:

gen Month= Date
recode Month (min/31=1) (32/59=2) (60/90=3) (91/120=4) (121/151=5) (152/181=6) ///
             (182/212=7) (213/243=8) (244/273=9) (274/304=10) (305/334=11) (335/365=12)
label define Monthl 1 "Jan" 2 "Feb" 3 "Mar" 4 "Apr" 5 "May" 6 "Jun" 7 "Jul" 8 "Aug" 9 "Sep" ///
                    10 "Oct" 11 "Nov" 12 "Dec"
label values Month Monthl

I have searched related posts but didn't find anything like this so tried to use Month but it's provided error with "one data point is multiple time" So my questions are
1. Is it possible to convert this "Date" variable to Stata formated data variable and how?
2. If not, how can I work with Month variable to Stata formated time series variable?

Thanks,
Rayhan

↧

Using global variables within forvalue loops

July 15, 2019, 11:59 pm

≫ Next: Using same variable for weight and control

≪ Previous: Time series data set up for time series analysis

I am trying to design a for loop that can be easily adapted for the input values but cannot work out a way to use global variables within my for loop. I have developed a very simple example to help isolate the problem. Essentially, this is the basic code:

Code:

clear all

sysuse auto.dta

gen tax = 0

    forvalues y = 0(1)10 {
    replace tax = `y'
    // other code...
    }

And now i want to make it more flexible by changing the lower, sensitivity and upper bounds of the forvalues loop:

Code:

sysuse auto.dta

gen tax = 0

global low = 0
global sensitivity = 1
global high = 10

    forvalues y = ${low}(${sensitivity})${high} {
    replace tax = `y'
    }

However, when i run this i get an error saying:

program error: code follows on the same line as open brace

Why am I getting this error and how do i fix it?

↧

Using same variable for weight and control

July 16, 2019, 12:20 am

≫ Next: Interaction term between social origin and country

≪ Previous: Using global variables within forvalue loops

I would like to use the same variable for weight and control like:

Code:

sysuse auto
ivreghdfe price weight (length=trunk) [aw=weight], absorb(rep78)

But it generates error that says

Error: cannot use weight variables as dependent variable, regressor or IV in combination with -partial- option.

Is there any workaround?

↧

Interaction term between social origin and country

July 16, 2019, 1:50 am

≫ Next: How to obtain weighted confidence intervals of the median

≪ Previous: Using same variable for weight and control

I'm looking at social origin effect on occupational attainment in three different countries and want to run a regression with an interaction between the social origin dummies and country. However, in the dataset (ESS) the country variable is a string variable (no numbers assigned). Is it possible to create an interaction when one of the variables is a string variable?

↧

How to obtain weighted confidence intervals of the median

July 16, 2019, 2:07 am

≫ Next: FIXED EFFECT MODEL, small saple size.

≪ Previous: Interaction term between social origin and country

Dear Statalist community,

I would like to calculate weighted confidence intervals of the median, which I need to use for a descriptive graph (median wealth and associated confidence intervals graphed by marital status group and gender).

Currently I am using the following command, which unfortunately cannot handle weights:

Code:

statsby median=r(c_1) upper=r(ub_1) lower=r(lb_1), by(mar_di_pro2 female) saving(medianci, replace) : centile wealth
use medianci , clear

I can calculate weighted medians using the following command:

Code:

gen median1 = . 
gen median0 = .

quietly forvalues i = 1/6 { 
    forvalues f=0/1{
        summarize wealth [w=xrwght] if mar_di_pro2 == `i' & female==`f', detail 
        replace median`f' = r(p50) if mar_di_pro2 == `i' & female==`f'
  }
}

But summarize obviously does not provide confidence intervals of the median. Has anyone an idea of how I can obtain weighted confidence intervals of the median (by different groups) and save those in separate variables?

I am using Stata 15 and my data are the German Socio-Economic Panel.

Thanks,
Nicole

↧

FIXED EFFECT MODEL, small saple size.

July 16, 2019, 5:20 am

≫ Next: Export Regressions to Different Sheets in Same Excel Doc.

≪ Previous: How to obtain weighted confidence intervals of the median

Dear all, I am regressing the impact of Netflix subscriptions on theatrical admissions in 16 countries from 2012 to 2017.
I have heterogeneity, hence I run

xtreg logadmissions lognetflixsubscribers loggdp averageTV, fe vce (robust)
xtreg logadmissions lognetflixsubscribers loggdp averageTV, re vce (robust)

and xtoverid test to decide which one is a better fit.

The result of the test shows that FE model is a better fit p<0.05. My question is, as you can see I don't use i.year and i.country variables in my model because of small sample size and low degrees of freedom. If I don't add time fixed effect in to my model as a result of this constraint, would be wrong? What can I do as an alternative? or shall I keep my model?

(Here is my FE regression results.)

Thank you.

↧

Export Regressions to Different Sheets in Same Excel Doc.

July 16, 2019, 5:28 am

≫ Next: xtgcause interpretation

≪ Previous: FIXED EFFECT MODEL, small saple size.

I have been trying to figure out a way to automate my regression output such that each set of regression goes to a separate sheet in the same Excel document. In every cell of the final output there is, for example, ="0.067" and the stripquotes(yes) option gets rid of the quotes, but I am still left with the equal sign in every cell. Any ideas on how to get rid of it? I've tried the "plain" option with esttab, and while this gets rids of the = and the "", it gets rid of a lot of the other formatting.

// Regression set (1)
foreach v of varlist outcome_1 outcome_2 {

eststo: reg `v' x1 x2
eststo: reg `v' x1 x2 x3
eststo: reg `v' x1 x2 x3 x4

}

esttab using "$output\regressions_set_1.csv", ///
n se nobaselevels noconstant r2 aic bic replace ///
order(x1 x2 x3 x4) ///
keep (x1 x2 x3 x4) ///
label
estimates clear

// Regression set (1)
foreach v of varlist outcome_1b outcome_2b {

eststo: reg `v' z1 z2
eststo: reg `v' z1 z2 z3
eststo: reg `v' z1 z2 z3 z4

}

esttab using "$output\regressions_set_2.csv", ///
n se nobaselevels noconstant r2 aic bic replace ///
order(z1 z2 z3 z4) ///
keep (z1 z2 z3 z4) ///
label
estimates clear

// Read in each .csv and export excel so that I can automate getting
// all output on separate sheets in same document
foreach v in regression_set_1 regression_set_2 {
preserve
import delimited P:\Filepath\Regressions_`v'_final.csv, stripquotes(yes)
export excel using "P:\Filepath\Final_Results.xlsx", missing("")sheetreplace sheet("`v'")
restore
}

↧

xtgcause interpretation

July 16, 2019, 5:46 am

≫ Next: Simulation Study: Panel data xtlogit regression

≪ Previous: Export Regressions to Different Sheets in Same Excel Doc.

I have run a test for Granger non-causality in heterogeneous panels using the procedure proposed by Dumitrescu & Hurlin (Economic Modelling, 2012), which is available using the xtgcause Command. The test gives both the Z-bar and Z-bar tilde statistics, I was wondering which one of these I should report as they seem to indicate different results. All the P-values for the Z-bar statistic are 0.0000 whereas the P-values for the Z-bar tilde suggest that the null hypothesis should not be rejected. I have data from 8 time periods for 83 countries. I would appreciate some guidance on this issue. Thanks in advance.

↧

Simulation Study: Panel data xtlogit regression

July 16, 2019, 6:12 am

≫ Next: Test of whether country variance is statistically significant

≪ Previous: xtgcause interpretation

Hi everyone,

Unfortunately I'm not able to download the dataex package, since I'm working on an external server. I hope you can still understand my query and are willing to help me out! I know some questions have been asked about simulation before, but none of the posts really matches what I'm looking for.

I am investigating the relationship between employment and crime on an individual and monthly level. I have data from around 1 million individuals for 96 time periods (8 years, 12 months per year), where I know whether they were employed or not, whether they committed an offence or not, monthly income and some other control variables.

My original dataset looks approximately like this:

id	time	emp	crime	income	age	crimehist
1	1	1	0	2000	19	0
1	2	1	0	1800	19	0
1	3	0	1	0	19	0
1	4	0	0	0	20	1
1	5	1	0	1400	20	1
2	1	1	0	1500	24	3
2	2	1	1	1100	24	3
2	3	1	1	1400	24	4
2	4	0	0	0	24	5
2	5	0	0	0	25	5

Crime = 1 if someone committed a crime in that period, emp = 1 if someone is employed in that period. Crimehist is number of crimes committed in the past year (not including current period)

I want to carry out a logistic regression to see whether there is a relationship in the following way:

Code:

xtlogit crime emp age age2 crimehist, fe

To verify that a fixed effects logistic model is an appropriate model to apply to this data, I have been asked to do a simulation study, mainly to verify that the model provides consistent estimates of the parameters. The values of the independent variables and error terms should be simulated, and parameters should be given a fixed value. The dependent variable can then be calculated for every observation. By simulating the model, I can check whether the estimated parameters are close to the true (chosen) parameter values. By trying different values of T, this simulation can verify the consistency of the parameter estimates as long as T is large enough.

Even though there is some documentation on simulation studies online, I have not been able to find a proper code for this simulation study.

I think it's important for me to first of all know what kind of distribution my variables have. How do I find out? For example, income does not seem to have a perfectly normal distribution (see picture) - should my simulated independent variable then have a similar distribution to the real data, or can I assume normal distribution?

Array

In case I assume normal distribution for all my independent variables, what would be the next steps?

For generating the income variable I first used this code:

Code:

gen sim_inc = 0
replace sim_inc = 2416 + 1226 * invnorm(uniform()) if sim_emp != 0

because income had a mean of 2416 and S.D. of 1226 in the original dataset. However, this leads to negative values for income as well and a very different distribution over all (also because I set all values to zero if emp = 0).

Once I have generated all independent variables, how do I create the dependent variable?

And how do I then run the regression, and check whether the logistic model leads to consistent parameter estimates?

Thanks a lot in advance for your help!

↧

Test of whether country variance is statistically significant

July 16, 2019, 6:27 am

≫ Next: Merging two variables from different questionnaires

≪ Previous: Simulation Study: Panel data xtlogit regression

I'm looking at social origin effect on occupational attainment (in terms of ISEI) in three different countries (Sweden, Germany and the UK). I have run three individual regressions and found variations in the effect but want to test whether this variation is statistically significant.

A study of the same issue in 12 European countries - by Christina Iannelli (2002) - uses a regression model which include all the countries as dummy variables. Here the effect of the country variables are statistically significant and the study concludes that there is a 'substantial' difference between the countries but is this a valid way to do it? I mean, does this test whether the effect of social origin varies significantly between the countries? Or should I include an interaction term between country and social origin?

Moreover, a similar study (Bernardi and Ballarino 2016) compare the same effect in 14 European countries (done by different scholars and datasets in each country) and compare the results without testing whether the country variance is statistically significant. The study emphasizes the 'common patterns' more than the differences, but can one compare country differences without testing whether these are significant?

↧

Merging two variables from different questionnaires

July 16, 2019, 6:39 am

≫ Next: Summary statistics table

≪ Previous: Test of whether country variance is statistically significant

Dear all

I am using the panel dataset of South African National Income Dynamics Survey (NIDS) - waves 1 to 5.
I am wanting to create a Binary Variable for whether a child (restricted to age 6 to 17) has a disability or not.
As simple as this should be, the problem is that the questionnaires were split up in such a way that the child questionnaire is only between ages 0 to 15.
The adult questionnaire thus has people older than 15.
I do have a variable for 'disability' arising from each questionnaire, but i now what to make one for the ages 6 to 17.

My question then is: how do you combine the two variables from the questionnaire into one?

Kind regards
Sophie

↧

Summary statistics table

July 16, 2019, 6:56 am

≫ Next: Unicode analyze error - too big file

≪ Previous: Merging two variables from different questionnaires

Hi, I want to summarize age, education, family_size and voters (average and standard deviation) by cluster (columns should have ETR, MTR, WTR and Total), into a single table. How?
The data is as following

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long cluster float(age education family_size voters)
1 25  5 5 52
2 30  7 7 12
1 21  8 9 15
2 22  9 4 48
3 23 10 3 49
3 25 13 5 23
2 24 15 6 56
end
label values cluster cluster2
label def cluster2 1 "ETR", modify
label def cluster2 2 "MTR", modify
label def cluster2 3 "WTR", modify

↧