Combining marginplots for 4 binary independent variables on one line graph (student)

April 14, 2020, 9:28 am

Hello everyone,

Really looking for some help year, doing my final year Economics dissertation. Sadly no face-to-face contact.

I am plotting the chance that individuals say "yes" or no" to a binary question about racism, and how the probability of them saying yes varies over time.

Here is my code:

regress lack_education i.year realinc age sex race educ interaction_black
margins, over(year) saving(file1, replace)
marginsplot, x(year) recast(line)
graph save 1.gph, replace

regress lack_motivation i.year realinc age sex race educ interaction_black
margins, over(year) saving(file2, replace)
marginsplot, x(year) recast(line)
graph save 2.gph, replace

regress lack_ability i.year realinc age sex race educ interaction_black
margins, over(year) saving(file3, replace)
marginsplot, x(year) recast(line)
graph save 3.gph, replace

graph combine 1.gph 2.gph 3.gph, ycommon xcommon commonscheme cols(1)
graph save 1234.gph, replace

Problem:

My margin plot combinations do not integrate the graphs on one axis. I would look all 3 of the answers over time to be plotted on the same graph.

I have tried:
graph twoway (line discrimination year) || ///
(line lack_education year) || ///
(line lack_motivation year)|| ///
(line lack_ability year) || ///

However, this looks very ugly as it just presents 0/1 data in each year, with thick lines every year. I want the probability of somebody saying yes each year, not a plot of all the data.

Shall I continue with the marginsplotting, do I need to save my marginsplots results and combine them onto one graph?

Thank you, Charlie.

↧

Describe command "size"

April 14, 2020, 9:37 am

≫ Next: Same coefficients for Pooled OLS and Panel regressions for some regressions

≪ Previous: Combining marginplots for 4 binary independent variables on one line graph (student)

Hi all, if someone can please show me how to calculate the size of 13,244 shown below? Just a basic calculation.

↧

Same coefficients for Pooled OLS and Panel regressions for some regressions

April 14, 2020, 9:40 am

≫ Next: Create a variable depending on other variables

≪ Previous: Describe command "size"

Dear Stata members

I have the following balanced panel data set.

hh	year	x	y1	y2	y3	y4	y5	y6	y7	y8	y9
a	2000	93.2	-5.4	43.7	110.4	-3.9	56.5	0	-108.1	0	0
a	2001	11.9	-0.3	23.1	0.1	-2.1	0	1.1	-18.7	8.7	0
a	2002	256.5	204.9	509.3	-82.6	-60	40.8	-38.6	-408.7	91.2	0.2
a	2003	586.2	-5.2	579.8	70.5	-4.9	-27.2	0	-167.3	140.5	0
a	2004	618	16	199	-142	0	0	0	0	545	0
a	2005	784.4	126.3	331.6	329.5	-0.5	-2.5	0	0	0	0
a	2006	100.9	-11.7	82	0	-0.4	0	0	31	0	0
a	2007	103.4	14.4	22.2	0	-0.9	2.8	-0.1	65	0	0
a	2008	42.1	-98.8	362.6	0	-0.8	1.9	0	-222.8	0	0
a	2009	110.6	54.4	20	0	-4.6	0.6	0	40.2	0	0
b	2000	10	2	5.5	0	-5.4	-9.1	0	17	0	0
b	2001	-11	35.6	27.1	0	0	8.4	-31.2	-56.5	0	5.6
b	2002	-110.1	52.9	44.9	225.7	-93	6.6	14.1	-361.3	0	0
b	2003	105.8	-30.1	19	0	-2.3	6.9	-34.8	147.1	0	0
b	2004	570.6	6.4	324.6	70.5	-42.1	0	0	211.2	0	0
b	2005	712.2	-31.1	445.4	-161.9	-70.7	0	0	513.4	17.1	0
b	2006	955.8	-50.4	644	-128.1	0	-30.5	0	522.3	17.1	-18.6
b	2007	1703.3	342.3	275.4	0	-25.9	205.7	0	905.8	0	0
b	2008	1991.2	91.3	521.4	0	-15.5	-197.9	0	1591.9	0	0
b	2009	116.7	-8.6	215	0	-9.3	-3.7	0	-76.7	0	0
c	2000	510.6	340.1	89.6	0	-23	-0.1	0	104	0	0
c	2001	301.4	-25.3	102.5	-0.9	0	0	0	192.4	15	17.7
c	2002	144	31.6	72.5	-0.9	0	0	0	36.4	11.7	-7.3
c	2003	140.8	0.7	140.6	-0.9	0	0	0	15.4	0.2	-15.2
c	2004	61	-57.5	269.2	-118.3	-1.8	-0.6	0	-30	0	0
c	2005	228.6	56.4	139.4	-52	-1.7	0	0	83.1	3.4	0
c	2006	-76.2	-574.7	1166.1	34.2	-174.3	-10.3	-50.8	-496.4	30	0
c	2007	994.6	1041.6	1579.7	99.8	-151	-19	-1527	-65.9	36.4	0
c	2008	2061.2	3939.2	2445.1	1648.9	-215	-150	-1208.3	-4433.5	34.8	0
c	2009	2705.1	-533.2	7865.6	3334.3	-212.6	-2061.2	-1338.3	-4480.8	131.3	0
d	2000	3786.8	-249.1	5897.5	-327.1	91.7	-927.3	-395.3	-446	138	4.4
d	2001	5697.7	369.2	6221.4	1662.6	-280.3	-550.2	-58.4	-1804.2	138.3	-0.7
d	2002	6530.1	1210.3	6281.2	1786.9	407.8	664.8	0	-3960.1	139.2	0
d	2003	3520.2	-1099.7	3747.6	1618.1	1242.8	-397	-147.1	-1587.7	143.2	0
d	2004	6132.1	-8	2641.8	1179.1	2981.8	-2235	0	1229.4	140.1	202.9
d	2005	5662.3	93.5	587.2	-450.5	467.6	-189.4	-5.9	5366	70.1	-276.3
d	2006	-36.9	-25.2	37.9	0	-1.5	0	0	-48.1	0	0
d	2007	2.9	22.6	94.9	55.9	-0.6	0	-500	330.1	0	0
d	2008	502.2	15.7	60.7	58.3	-19.5	50.5	0	259.3	77.2	0
d	2009	305.7	-2	110.8	-13.5	-21	-95.3	0	283.9	42.8	0
e	2000	1339.8	123.3	274.4	60.4	-17.1	350.9	269.1	118.2	81.9	78.7
e	2001	425.4	495.2	924.8	55.6	-22.5	-156.2	0	-1064.9	160.4	33
e	2002	232.3	-19.7	277.7	-173	-2.5	0	0	18.8	131	0
e	2003	222.1	18.7	88.5	-332.7	10.6	334.6	0	65.6	36.8	0
e	2004	628.5	-47.7	441.1	203.3	15.6	0	0	-64.3	80.5	0
e	2005	134.1	-14.9	307.4	-9.4	0.8	0	0	-260.2	110.4	0
e	2006	-35.3	36.6	205	0	0	0	-9.2	-286.8	0	19.1
e	2007	25.4	-4.8	89.9	0	-9.7	10.6	-0.1	-60.5	0	0
e	2008	7555.7	639.4	13189.5	1255	-45.5	-1664.6	52.1	-5940.7	5.7	64.8
e	2009	-4712.9	2727.2	41569.6	-4869.2	-235	1642.6	0	-49558.3	0	4010.2

This data set has 9 dependent variables(y1, y2,x3.............y9) and one independent variable x. The feature of my data is that every year for each hh(household) sum total of y1 to y9 equals x. Assume that y,y2,...y9 are categories of cash sources/uses and x is the total cash at the end. I ran the following panel regressions.

Code:

encode hh ,gen(id)                    
xtset id year
* for running the regressions independently
xtreg y1 x
xtreg y2 x
xtreg y3 x
xtreg y4 x
xtreg y5 x
xtreg y6 x
xtreg y7 x
xtreg y8 x
xtreg y9 x

I also ran the Pooled regressions for the same data by using the following command

Code:

reg y1 x
reg y2 x
reg y3 x
reg y4 x
reg y5 x
reg y6 x
reg y7 x
reg y8 x
reg y9 x

My results reveal that some coefficients of the panel regressions are the same as the coefficients of the pooled regressions. Why this is so? Is it because of the data structure? I guess that the code, xtreg depvar indepvar, assume random effects. Can coefficients of pooled ols be same as random effects regression. Also ,does this has anything to do with balanced nature of the panel.

↧

Create a variable depending on other variables

April 14, 2020, 9:56 am

≫ Next: OLS regression with fixed effects and interaction

≪ Previous: Same coefficients for Pooled OLS and Panel regressions for some regressions

Hi all,

I am trying to create a variable (A) that takes values of an existing variable (B) if 2 other existing string variables (C) and (D) are equal. Any idea on how to do that?
Thanks in advance,

Rezart

↧

OLS regression with fixed effects and interaction

April 14, 2020, 9:57 am

≫ Next: egen not preserving sort order

≪ Previous: Create a variable depending on other variables

Hello everyone!

The model I am using is an OLS regression with time and industry fixed effects. However, I am wondering what the difference is between using robust standard errors and clustering by firm? They seem to be giving the same coefficients, but different standard errors, t and p values. How do I decide which one is a better fit?
Array
Array

Thank you in advance!

↧

egen not preserving sort order

April 14, 2020, 10:15 am

≫ Next: XYZ coordinates of the 'globe'

≪ Previous: OLS regression with fixed effects and interaction

Hi all,

I've run into some unexpected egen behavior that broke a function that I commonly use in my code. I've boiled it down to the following example:

Test #1 Code:

Code:

/* synthesize */
clear
set obs 2
generate str = cond(_n == 1, "one", "two")
generate num = _n

/* collapse */
collapse num, by(str)

/* average */
set obs 3
egen avg = mean(num)

Test #1 Output:

Code:

     +-----------------+
     | str   num   avg |
     |-----------------|
  1. |         .   1.5 |
  2. | one     1   1.5 |
  3. | two     2   1.5 |
     +-----------------+

Test #2 Code:

Code:

/* synthesize */
clear
set obs 2
generate str = cond(_n == 1, "one", "two")
generate num = _n

/* DO NOT collapse */
//collapse num, by(str)

/* average */
set obs 3
egen avg = mean(num)

Test #2 Output:

Code:

     +-----------------+
     | str   num   avg |
     |-----------------|
  1. | one     1   1.5 |
  2. | two     2   1.5 |
  3. |         .   1.5 |
     +-----------------+

As you can see, the only difference between the two code snippets is that the first performs a collapse and the second does not. In the output, however, you'll see that egen preserves the sort order in the second output but not the first.

Maybe I never should have expected egen to preserve sort order, but to me this seems like odd behavior. I'm running the 31 Mar 2020 build of Stata 16.1 MP on MacOS. Please let me know if anyone has any thoughts on this!

Thanks,
Reed

↧

XYZ coordinates of the 'globe'

April 14, 2020, 10:19 am

≫ Next: Rangestat

≪ Previous: egen not preserving sort order

Dear Statalisters,

My question is not Stata related as such, but because there are packages like geocircles or geo2XY (from SSC by Robert Picard) that deal with geo-mapping, possibly someone can answer my question.

I am looking for XYZ coordinates of positions on earth, i.e. our globe.
For example, of capitals, or (sketchy) the coastline of the continents.

So, it concerns the '3D Cartesian coordinates', Earth centered, Earth fixed (ECEF) coordinates in relation to latitude and longitude (from Wikipedia):
Array

I have searched at many sites and asked around by several organisations but so far without any result.

So, should you know where to go for such data, your help is much appreciated.

↧

Rangestat

April 14, 2020, 10:22 am

≫ Next: how to keep citycode_s if citycode_s' gap has data in 2005 or 2010 or 2015( _merge==3),and delete citycode_s if citycode_s' gap has no data

≪ Previous: XYZ coordinates of the 'globe'

Hello guys,

I have been running OLS regressions with rangestat for quite some time. Right now I am running a number of cross-sectional regressions like this:

Code:

rangestat (reg) Y X1 X2 X3, interval(date 0 0)

Now I am trying to figure out how to run the same regression, but without a constant. Is there a way to do so with rangestat? I tried going through the help manual, but couldn't find any information about it. Would really appreciate your help.

Thanks

↧

how to keep citycode_s if citycode_s' gap has data in 2005 or 2010 or 2015( _merge==3),and delete citycode_s if citycode_s' gap has no data

April 14, 2020, 10:41 am

≫ Next: Panel Hansen test question

≪ Previous: Rangestat

how to keep citycode_s all if citycode_s' gap has data in 2005 or 2010 or 2015( _merge==3), and delete citycode_s all if citycode_s' gap has no data in 2005 or 2010 or 2015
* Example generated by -dataex-. To install: ssc install dataex
clear
input long citycode_s int year double gap byte _merge
1100 2001 . 1
1100 2002 . 1
1100 2003 . 1
1100 2004 . 1
1100 2005 1.2883141528559072 3
1100 2006 . 1
1100 2007 . 1
1100 2008 . 1
1100 2009 . 1
1100 2010 1.202911498235209 3
1100 2011 . 1
1100 2012 . 1
1100 2013 . 1
1100 2014 . 1
1100 2015 1.1999310635494223 3
1100 2016 . 1
1100 2017 . 1
1200 2001 . 1
1200 2002 . 1
1200 2003 . 1
1200 2004 . 1
1200 2005 1.2148166136286662 3
1200 2006 . 1
1200 2007 . 1
1200 2008 . 1
1200 2009 . 1
1200 2010 1.1501784795324477 3
1200 2011 . 1
1200 2012 . 1
1200 2013 . 1
1200 2014 . 1
1200 2015 1.140917816398572 3
1200 2016 . 1
1200 2017 . 1
1301 2001 . 1
1301 2002 . 1
1301 2003 . 1
1301 2004 . 1
1301 2005 1.3902476987925623 3
1301 2006 . 1
1301 2007 . 1
1301 2008 . 1
1301 2009 . 1
1301 2010 1.229845917485463 3
1301 2011 . 1
1301 2012 . 1
1301 2013 . 1
1301 2014 . 1
1301 2015 1.2774141841215894 3
1301 2016 . 1
1301 2017 . 1
1302 2001 . 1
1302 2002 . 1
1302 2003 . 1
1302 2004 . 1
1302 2005 . 1
1302 2006 . 1
1302 2007 . 1
1302 2008 . 1
1302 2009 . 1
1302 2010 . 1
1302 2011 . 1
1302 2012 . 1
1302 2013 . 1
1302 2014 . 1
1302 2015 . 1
1302 2016 . 1
1302 2017 . 1
1303 2001 . 1
1303 2002 . 1
1303 2003 . 1
1303 2004 . 1
1303 2005 1.5206963425791122 3
1303 2006 . 1
1303 2007 . 1
1303 2008 . 1
1303 2009 . 1
1303 2010 1.251812201577584 3
1303 2011 . 1
1303 2012 . 1
1303 2013 . 1
1303 2014 . 1
1303 2015 1.2160241283892737 3
1303 2016 . 1
1303 2017 . 1
1304 2001 . 1
1304 2002 . 1
1304 2003 . 1
1304 2004 . 1
1304 2005 1.4515449934959717 3
1304 2006 . 1
1304 2007 . 1
1304 2008 . 1
1304 2009 . 1
1304 2010 1.258469579242039 3
1304 2011 . 1
1304 2012 . 1
1304 2013 . 1
1304 2014 . 1
1304 2015 1.1308638175646053 3
end
label values _merge _merge
label def _merge 1 "master only (1)", modify
label def _merge 3 "matched (3)", modify
[/CODE]

↧

Panel Hansen test question

April 14, 2020, 10:51 am

≫ Next: Export Excel: Specify Column Widths and Freeze Frames In Do-File

≪ Previous: how to keep citycode_s if citycode_s' gap has data in 2005 or 2010 or 2015( _merge==3),and delete citycode_s if citycode_s' gap has no data

I run xtivreg2 command as follows xtivreg2 y (x1 x2= l.x1 l.x2) , fe endog(x2 x1) rob
At the same time I run xtabond2 y x1 x2 x3, gmm(l.(y x1 x2)) iv(x1 x2 l.x1 l.x2) nolevel orthog rob (a non-dynamic gmm just for comparison).
I have exactly the same results (coefficients) but Hansen test prob is 0.000 at xtivreg2 and 1.000 at xtabond2.
I would be grateful if someone could help me by explaining me when it could occur.
Thanks in advance!!

↧

Export Excel: Specify Column Widths and Freeze Frames In Do-File

April 14, 2020, 10:51 am

≫ Next: Creating a missing variable from a date variable.

≪ Previous: Panel Hansen test question

Is there a way to specify column widths and freeze frames for an Excel file created by exporting a data file from Stata?

I frequently give Excel files exported from Stata to workers.

I frequently check my work transforming variables in Stata by looking at Excel files exported from Stata.

It would save me a lot of time to specify column width and freeze frames from the do-file that produce such Excel files.

Thanks,

Carl

↧

Creating a missing variable from a date variable.

April 14, 2020, 11:29 am

≫ Next: Pre-post test graph with 95% CIs

≪ Previous: Export Excel: Specify Column Widths and Freeze Frames In Do-File

Hi guys,

I am interested in creating a missing variable where 1= missing and 0= not missing from a date variable to run sensitivity analysis. I formatted the original date variable with the following code:

gen r_HypertensionDiagnosisDate3 = date(r_HypertensionDiagnosisDate2,"MD20Y")

format r_HypertensionDiagnosisDate3 %d

I tried to create a new variable for missing with the code below:

gen HypertensionDateMissing=0

recode HypertensionDateMissing = 1 if r_HypertensionDiagnosisDate3=.

I get the error message below:

"transformation rule without "condition" part"

any help please,
Thanks

↧

Pre-post test graph with 95% CIs

April 14, 2020, 11:45 am

≫ Next: Dynamic panel-data, two step system GMM, instrumental variables

≪ Previous: Creating a missing variable from a date variable.

Hello,

I am trying to create a graph of the means of pre-post test data with 95% CIs. The code I am using is:

twoway(scatter Encourage prepost, connect(L) lwidth(medium) lcolor(black) sort(ID) ///
ytitle("GPs' mean ratings") xscale(titlegap(*10)) xtitle("") xlabel(2 " " 0.5 "Pre-Trial" 1.5 "Post-Trial" 1.7 " ", notick labsize(small) angle(0)))

The variable Encourage includes two data points only: the mean of pre-values and the mean of post-values
The variable prepost includes two data points only: 0 (associated with the mean of pre-values) and 1 (associated with mean of post-values)

My questions are:
1. How can I move the connected line further to the right so it aligns with the x-axis labels Pre-trial and Post-trial?
2. How can I make the y-axis begin at "2" and remove decimal points?
3. How can I add the 95% confidence intervals for the two mean data points? My variables for the pre-post test ratings for each independent ID are pre_encourage and post_encourage

I am new to Stata and this forum, so your help is much appreciated. Thank you.

↧

Dynamic panel-data, two step system GMM, instrumental variables

April 14, 2020, 1:24 pm

≫ Next: Variable being omitted from FamaMacbeth regression

≪ Previous: Pre-post test graph with 95% CIs

Dear all,

I am trying to answer a question regarding capital structure of companies. Since my data is dynamic panel, I use the proposed two-step system GMM as used in similar papers. However, my pc runs into problems when i try to estimate the following equation:

xtabond2 LeverageBV L.LeverageBV L.ROA L.Size L.Tangibility L.MTB CrisisGlobal L.GovD M3 t* j* i*, gmm(L.LeverageBV) iv(L.ROA L.Size L.Tangibility L.MTB t* j* i*, equation(level)) nodiffsargan twostep robust

and runs perfectly when I remove the i*. My questions to you is how can I incorporate a way to account for the firm-fixed effects by having the i* without my pc crashing or my friend's pc loading for 10 hours without even providing the results. i* is a dummy created for each observation so it ranges from i1-i3970.

Thanks in advance.

↧

Variable being omitted from FamaMacbeth regression

April 14, 2020, 1:45 pm

≫ Next: New version of addinby on SSC

≪ Previous: Dynamic panel-data, two step system GMM, instrumental variables

Hi Statalists,

Currently I am running some regressions, including some FamaMacbeth regressions using the xtfmb user written software. With some regressions all is fine whilst with others I run into an error I can't seem to find a solution for in the documentation. When running the regression several red crosses appear on the bottom of the results screen and once it has finished a variable is omitted from the regression. Below is a screenshot of the output I am receiving. If anyone has information on why this is occurring and a direction I could go in this would be greatly appreciated. Interestingly, the variable being omitted is a dummy variable.

I look forward to your replies.

Pim
Array

↧

New version of addinby on SSC

April 14, 2020, 1:45 pm

≫ Next: can omitted variable because of collinearity be interpreted as the reference group?

≪ Previous: Variable being omitted from FamaMacbeth regression

Thanks as always to Kit Baum, a new version of the addinby package is now available for download from SSC. In Stata, use the ssc command to do this, or adoupdate if you already have an old version of addinby.

The addinby package is described as below on my website. The new version has been upgraded to Stata Version 16, and adds a new module fraddinby which adds in data using a foreign key for a data frame, instead of a foreign key for a disk dataset.

Users of old versions of Stata can still download the Stata Version 11 and Stata Version 10 versions of addinby by typing, in Stata,

net from "http://www.rogernewsonresources.org.uk/"

and selecting the Stata version and package required.

Best wishes

Roger

--------------------------------------------------------------------------------
package addinby from http://www.rogernewsonresources.org.uk/stata16
--------------------------------------------------------------------------------

TITLE
addinby: Add in data from a disk or frame dataset using a foreign key

DESCRIPTION/AUTHOR(S)
addinby is a "cleaner" alternative version of merge m:1, designed to
reduce the lines of code in Stata do-files. It adds variables and/or
values to existing observations in the dataset currently in memory
(the master dataset) from a Stata-format dataset stored in the file
filename (the using dataset), using a foreign key of variables
specified by the keyvarlist to identify observations in the using
dataset. These foreign key variables must identify observations in
the using dataset uniquely. Unlike merge m:1, addinby always
preserves the observations in the master dataset in their original
sorting order, and never adds any additional observations, and only
generates a matching information variable if requested to do so.
However, addinby may optionally check that there are no unmatched
observations in the master dataset, and/or check that there are no
missing values in the foreign key variables in the master dataset.
fraddinby is similar to addinby, but it adds variables from a dataset
in a data frame, instead of from a dataset in a disk file.

Author: Roger Newson
Distribution-Date: 14april2010
Stata-Version: 16

INSTALLATION FILES (click here to install)
addinby.ado
fraddinby.ado
addinby.sthlp
fraddinby.sthlp
--------------------------------------------------------------------------------
(click here to return to the previous screen)

↧

can omitted variable because of collinearity be interpreted as the reference group?

April 14, 2020, 1:49 pm

≫ Next: New package on SSC: qrprocess

≪ Previous: New version of addinby on SSC

I will give an example to illustrate my question. Let's see I am going to test how racial composition affects average income at the county level. The following is a subset from the American Community Survey.

Code:

clear all

input fips income white black hisp others racetotal
1001 64733.142 75.02174 18.97645 2.768116 3.233698 100
1003 66053.944 83.01883 9.384115 4.494323 3.102733 100
1005 41710.322 46.14848 47.31596 4.289814 2.245754 100
1007 52292.367 74.58161 22.08017 2.428197 0.9100213 100
1009 55475.87 87.14026 1.4225 9.126551 2.310691 100
1011 49226.233 21.52241 76.24614 0.4926584 1.738791 100
1013 41165.233 51.85518 45.07865 0.3345818 2.731584 100
1015 50336.589 72.47997 20.51991 3.651671 3.348451 100
1017 40392.917 55.7914 39.37504 2.270443 2.563114 100
1019 49201.26 91.78819 4.900785 1.539473 1.771556 100
1021 52296.202 80.13885 9.360346 7.776007 2.724792 100
1023 42668.505 56.36711 42.60038 0.5583174 0.4741876 100
1025 45651.608 52.87653 45.81539 0.1968262 1.111249 100
1027 44303.521 80.28106 14.73314 3.102108 1.883689 100
1029 46340.838 92.64292 2.630874 2.42335 2.302854 100
1031 56398.769 70.62666 17.05272 6.919747 5.400876 100
1033 50478.971 78.65859 15.7097 2.475456 3.156254 100
1035 40997.017 50.13585 47.63465 0.8390602 1.390443 100
1037 48206.928 65.26025 34.20544 0.4606172 0.0736998 100
1039 48407.277 83.38733 12.82429 1.617092 2.171291 100
end

reg income white black hisp others

Array

"white", "black", "hisp", and "others" are representing the percentage of whites, blacks, Hispanics, and other races at the county level, respectively. And they add up to "racetotal". The model randomly dropped "hisp" because of collinearity. My question is can I interpret "hisp" as the reference group and say that the coefficient for "white" is the effect relative to Hispanics? Thanks.

↧

New package on SSC: qrprocess

April 14, 2020, 1:49 pm

≫ Next: Using substr to drop a variable conditionally

≪ Previous: can omitted variable because of collinearity be interpreted as the reference group?

Thanks to Kit Baum a new package, qrprocess, is now available on SSC for Stata 9.2+. You can install it with

Code:

ssc install qrprocess

This package offers fast estimation and inference procedures for the linear quantile regression model. First, qrprocess implements new algorithms that are much quicker than the built-in Stata commands, especially when a large number of quantile regressions or bootstrap replications must be estimated. Second, the commands provide analytical estimates of the variance-covariance matrix of the coefficients for several quantile regressions allowing for weights, clustering, and stratification. Third, in addition to traditional pointwise confidence intervals, this command also provides functional confidence bands and tests of functional hypotheses. Fourth, predict called after qrprocess can generate monotone estimates of the conditional quantile and distribution functions obtained by rearrangement. Fifth, the new command plotprocess conveniently plots the estimated coefficients with their confidence intervals and uniform bands.

Let's consider an example. We load a data set with 5634 observations:

Code:

use http://www.stata.com/data/jwooldridge/eacsap/cps91

The median regression of lwage on age, age², education, and indicator variables for black and hispanic can be estimated with

Code:

. qrprocess lwage c.age##c.age educ i.black i.hispanic

Quantile regression
No. of obs.        3286    
Algorithm:         qreg.
Variance:          kernel estimate of the sandwich as proposed by Powell(1990).

------------------------------------------------------------------------------
      lwage  |      Coef.   Std. Err.      t     P>|t|    [95% Conf. Interval]
-------------+----------------------------------------------------------------
Quant. 0.5   |
        age  |     .04578   .0077878    5.88    0.000     .0305106    .0610495
c.age#c.age  |  -.0005031    .000099   -5.08    0.000    -.0006972    -.000309
       educ  |   .1018382   .0041052   24.81    0.000     .0937893    .1098871
    1.black  |   -.021541   .0414319   -0.52    0.603     -.102776    .0596939
 1.hispanic  |   .0484709   .0432474    1.12    0.262    -.0363236    .1332655
      _cons  |  -.1473537   .1501476   -0.98    0.326    -.4417461    .1470388
------------------------------------------------------------------------------

qrprocess is very similar to the official command qreg when a single quantile regression is estimated but qrprocess offers additional algorithms that are faster when the number of observations is very large and it provides standard errors that allow for clustering and stratification.

The main advantages of qrprocess appear when many quantile regressions must be estimated to analyze the conditional distribution of the outcome. For instance, we may estimate 81 quantile regression for the quantile indexes 0.1, 0.11, 0.12, ..., 0.9 with

Code:

qrprocess lwage c.age##c.age educ i.black i.hispanic, quantile(0.1(0.01)0.9) noprint

We have activated the option noprint because the tables of coefficients is huge. Instead, we can easily plot all the coefficients with the command

Code:

plotprocess

and obtain

Array

Note that qrprocess is significantly faster than calling 81 times qreg. In addition, qrprocess also estimates the covariances between the coefficients estimated at different quantile indexes, which allows testing cross-restrictions.

If this algorithm is still too slow, qrprocess implements a new and even faster estimator, the one-step estimator. This estimator is not numerically identical to the traditional quantile regression estimator but it is asymptotically equivalent to it. We can select this algorithm with the option method(onestep)

Code:

qrprocess lwage c.age##c.age educ i.black i.hispanic, quantile(0.1(0.01)0.9) noprint method(onestep)

Many of the hypotheses of interest to researchers involve the whole quantile regression process, e.g. (1) Has a variable any effect at all? I.e. is the coefficient on this variable 0 at all quantile indexes? (2) Has a variable a positive effect over the whole distribution (stochastic dominance)? (3) Is the effect of a variable homogenous (constant at all quantile indexes)?
These are functional null hypotheses. A naive approach consisting of estimating many quantile regressions and using pointwise tests will suffer from the multiple testing problem. qrprocess offers tests for functional hypotheses as well as uniform confidence bands that cover the whole function with a prespecified probability. The option functional must be activated. Only the bootstrap can be used for functional inference. Here we use the multiplier bootstrap, which is faster:

Code:

qrprocess lwage c.age##c.age i.black i.hispanic educ, quantile(0.1(0.01)0.9) functional vce(multiplier, reps(500))

At the end of the omitted output the p-values for many functional null hypotheses are provided. We can plot the coefficients, the pointwise confidence intervals as well as the uniform bands with plotprocess. Without any argument, we can see all the coefficients. If we are especially interested in the effect of education, we can type

Code:

plotprocess educ, ytitle("QR coefficent") title("Years of education")

and we obtain

Array

qrprocess and plotprocess offer many additional options that you can discover by reading the help files. We have also written a paper that describes the algorithms, the inference procedures, and the codes: "Quantile and distribution regression in Stata: algorithms, pointwise and functional inference". We are still working on it with the objective to submit it to the Stata Journal. We have written another paper where we suggest the new algorithms that are implemented in the package: "Fast algorithms for the quantile regression process".

These codes and papers are the results of joint work by Victor Chernozhukov, Iván Fernández-Val and myself.

↧

Using substr to drop a variable conditionally

April 14, 2020, 2:09 pm

≫ Next: Difference in Difference with multiple treatment periods and variation in treatment status

≪ Previous: New package on SSC: qrprocess

Hello all,

I am trying to drop a variable conditionally on it taking certain values in some of its observations. In the spirit of the prohibited:

drop varname if substr(varname, a, b)=="some string value" (for a string variable).

Searching the forum I found a way to do this via the following piece of code:

foreach c of varlist _all {
if substr(`c',1,3)=="Nov" | substr(`c',1,3)=="Oct" | substr(`c',1,3)=="Dec" | substr(`c',1,3)=="Aug" | substr(`c',1,3)=="Jun" | substr(`c',1,3)=="Jul" | substr(`c',1,3)=="Feb" | substr(`c',1,3)=="Mar" | substr(`c',1,3)=="Jan" | substr(`c',1,3)=="Sep" | substr(`c',1,3)=="Apr" | substr(`c',1,3)=="May" {
drop `c'
}
}

note1: i used all months to extinguish all the possible string values the variable in question can take. (if I had put only 1 or 2 it didn't work, no error, just didn't drop the variable in question)
note2: also in the above variable there were spaces (as in one below), but these didn't create a problem when not specified.
note 3: i am doing this for all possible variables as I need to loop this procedure over different files and the name of the variable changes but the string values it can take don't.

The above code worked just fine. However when i tried, in the same spirit, to do the same for another variable (below), which takes 3 string values: " * " , "-" , " " . ( Its type is str78)

Code:

foreach c of varlist _all {
if substr(`c',1,1)=="*" | substr(`c',1,1)=="-" {
drop `c'
}
}

I am returned the error 109: type mismatch
In an expression, you attempted to combine a string and numeric
subexpression in a logically impossible way. For instance, you
attempted to subtract a string from a number or you attempted
to take the substring of a number.

Thus, I believe the reason stata would execute the first piece of code but not the second, may be that stata recognizes the charachter " * " as numeric rather than a string.
Due to this, I tried without using substr:

foreach c of varlist _all {
if `c'=="*" | `c'=="-" {
drop `c'
}
}

And this runs without errors, but does not yield the desired result, ie does not drop the variable in question. (I believe because this works over observations rather than variables)

Any assistance is greatly appreciated.

↧

Difference in Difference with multiple treatment periods and variation in treatment status

April 14, 2020, 2:15 pm

≫ Next: -markstat-: How to include .docx template?

≪ Previous: Using substr to drop a variable conditionally

Hello everyone,

I am working on a difference in differences project using an unbalanced panel. The units are countries from 1960-2010. I wish to estimate the effect of fighting a war on a country''s GDP growth. The treatment is fighting a war, while the control is a militarized interstate dispute, a sort of low-level conflict that could escalate to a war but does not (I have a theoretical argument that this is true). The before treatment period is GDP growth one year prior to war/militarized interstate dispute onset, and the after treatment period is GDP growth one year after war/militarized interstate dispute conclusion.

In my data, for any given year a country can either receive the treatment, be a control, both, or neither. Further, any country can change in treatment status, potentially several times back and forth from treatment to control to treatment, etc. I'm confident that I should drop observations that are neither treatment or control but I don't know what to do about observations that are both treatment and control. Could I code all of these as treatment since they are technically exposed to the treatment, or does the fact that some units are exposed to both treatment and control simultaneously invalidate the assumptions of the DiD model?

Here is a snapshot of my dataset. ccode is country, year is year, war and mid are dummy variables for the presence of war in that year, and gdp_growthx is GDP growth for that country in that year.

[

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(ccode year) byte(war mid) str21 gdp_growthx
100 1960 0 0 "0.0413917473870093"  
100 1961 0 0 "0.0541918172758031"  
100 1962 0 0 "0.0539754953323991"  
100 1963 0 0 "0.0312177498183623"  
100 1964 0 0 "0.0632414158718799"  
100 1965 0 0 "0.0267577340542174"  
100 1966 0 0 "0.054894557443757"  
100 1967 0 0 "0.041779143413782"  
100 1968 0 0 "0.0625350310606713"  
100 1969 0 0 "0.062720812176032"  
100 1970 0 0 "0.0786489257867446"  
100 1971 0 0 "0.103077366712022"  
100 1972 0 0 "0.0947918763477633"  
100 1973 0 0 "0.102963416787285"  
100 1974 0 0 "0.08873035272064"    
100 1975 0 0 "0.05995747949481"    
100 1976 0 0 "0.0637354781183424"  
100 1977 0 0 "0.050009584829087"  
100 1978 0 0 "0.0724881437835222"  
100 1979 0 0 "0.0371718656908957"  
100 1980 0 1 "0.0551433740229117"  
100 1981 0 0 "0.0044105419570103"  
100 1982 0 1 "0.00919008741430539"
100 1983 0 0 "-0.00426931343788897"
100 1984 0 0 "0.0184030652378583"  
100 1985 0 0 "0.00807798013949018"
100 1986 0 0 "0.064046716687006"  
100 1987 0 1 "0.00702227780084312"
100 1988 0 1 "0.033824936965336"  
100 1989 0 0 "0.0209461675298157"  
100 1990 0 0 "0.0459589983854041"  
100 1991 0 0 "-0.00035585188733463"
100 1992 0 0 "0.0592640996337659"  
100 1993 0 0 "0.0628165763245323"  
100 1994 0 1 "0.0508729967762241"  
100 1995 0 1 "0.04395569336938"    
100 1996 0 0 "-0.00152413299278365"
100 1997 0 1 "0.0185558288583257"  
100 1998 0 0 "-0.0234440913503553"
100 1999 0 0 "-0.0529251184757511"
100 2000 0 1 "0.0318699384784778"  
100 2001 0 1 "0.00590851907224852"
100 2002 0 0 "0.0175135269382603"  
100 2003 0 1 "0.0327180578178813"  
100 2004 0 0 "0.049805987419309"  
100 2005 0 0 "0.0749319957797281"  
100 2006 0 1 "0.0959016023406366"  
100 2007 0 1 "0.101372670380304"  
100 2008 0 1 "0.0905581995946316"  
100 2009 0 1 "0.0200668695254118"  
100 2010 0 1 "0.0914505433276662"  
101 1960 0 0 "0.00886655224042194"
101 1961 0 0 "0.0422846832199143"  
101 1962 0 0 "0.0488220443712535"  
101 1963 0 0 "0.104495016496348"  
101 1964 0 0 "0.099007054624672"  
101 1965 0 0 "0.0378492519893543"  
101 1966 1 1 "0.0617338514862308"  
101 1967 0 1 "0.0438032103384865"  
101 1968 0 1 "0.0905377821685777"  
101 1969 0 1 "0.0533303801364601"  
101 1970 0 1 "0.130290249886797"  
101 1971 0 0 "0.000995262840974914"
101 1972 0 0 "0.0490651871776791"  
101 1973 0 0 "0.150033301136959"  
101 1974 0 0 "0.401070080911717"  
101 1975 0 0 "-0.0246372887197581"
101 1976 0 1 "0.0290491876979522"  
101 1977 0 0 "0.0318858448949207"  
101 1978 0 0 "-0.0228028480467601"
101 1979 0 0 "0.104304539952457"  
101 1980 0 0 "0.0377266243439936"  
101 1981 0 1 "-0.011939187542012"  
101 1982 0 1 "-0.0710660875479929"
101 1983 0 0 "-0.0155044599567291"
101 1984 0 0 "0.0126113590941897"  
101 1985 0 0 "-0.0212528381923865"
101 1986 0 0 "-0.0758934787294459"
101 1987 0 1 "0.0231346985891684"  
101 1988 0 1 "0.0345999745898457"  
101 1989 0 0 "-0.0551616703464218"
101 1990 0 0 "0.0706725010699394"  
101 1991 0 0 "0.0170494434229362"  
101 1992 0 0 "0.130888568375549"  
101 1993 0 0 "-0.077031518464382"  
101 1994 0 1 "-0.055921556835339"  
101 1995 0 1 "0.0459132039195761"  
101 1996 0 1 "-0.0212165734904107"
101 1997 0 1 "0.0277805329612475"  
101 1998 0 0 "-0.0649340821194926"
101 1999 0 1 "0.0378671146831133"  
101 2000 0 1 "0.231174062830601"  
101 2001 0 0 "-0.037283159721536"  
101 2002 0 0 "-0.0339635516351835"
101 2003 0 1 "-0.0374187761187001"
101 2004 0 0 "0.294342473254214"  
101 2005 0 0 "0.270465870442142"  
101 2006 0 1 "0.165990940836077"  
101 2007 0 0 "0.128809456750063"  
101 2008 0 1 "0.162717223691308"  
end

↧

Latest Images