How to rearrange bars in bar graph with two variables and two over groups

August 13, 2020, 6:55 pm

≪ Previous: Annual Average by Year and Calculate Percent

Hi all,

I feel like I should know how to do this - but I'm trying to plot two variables with two sets of over groups. The way the graphs come out, the bars of the two variables are next to each other, within the over groups. But what I would like is for all the bars of each variable to be next to each other within the first over group. In case this sounds vague, I've included code and the resulting graph below.

HTML Code:

clear all
set obs 12
set seed 123456

gen year = . replace year = 2009 in 1/4
replace year = 2013 in 5/8
replace year = 2017 in 9/12

gen color = ""
replace color = "Red" if inlist(_n, 1, 5, 9)
replace color = "Blue" if inlist(_n, 2, 6, 10)
replace color = "Orange" if inlist(_n, 3, 7, 11)
replace color = "Green" if inlist(_n, 4, 8, 12)

gen percent_favorite = .
replace percent_favorite = rnormal(.29, .05) if color=="Red"
replace percent_favorite = rnormal(.41, .05) if color=="Blue"
replace percent_favorite = rnormal(.13, .05) if color=="Orange"
egen total = total(percent_favorite), by(year)
replace percent_favorite = 1 - total if color=="Green"
drop total

gen percent_least_favorite = .
replace percent_least_favorite = rnormal(.15, .05) if color=="Red"
replace percent_least_favorite = rnormal(.11, .05) if color=="Blue"
replace percent_least_favorite = rnormal(.42, .05) if color=="Orange"
egen total = total(percent_least_favorite), by(year)
replace percent_least_favorite = 1 - total if color=="Green"
drop total

graph bar percent_favorite percent_least_favorite, over(year) over(color) scheme(538) legend(pos(6) rows(1))

The resulting graph looks like this:

Array

Basically - I would like all of the mean of percent_favorite bars for each year to be next to each other within the Blue group and then the mean of percent_least_favorite within the blue group next to each other. I tried asyvars, but it doesn't work. So I'm stumped. Any help would be greatly appreciated.

↧

MERGE problem!!

August 13, 2020, 9:13 pm

≫ Next: Reshape problem

≪ Previous: How to rearrange bars in bar graph with two variables and two over groups

Problem with merge. I removed my duplicates and they are my only identical variables but I get "not the only identical variables"

help!!

↧

Reshape problem

August 13, 2020, 11:07 pm

≫ Next: What exists in e(first) at what position after ivreg2 ?

≪ Previous: MERGE problem!!

Hi!
I want to reshape my data wide to long, but state says that there is a problem. It days there are observations within i(id2) with the same value of j(varlabel). I used reshape error command to check my error and it says that there are 248 observations (out of 15500 observation) have repeated j(varlabel) values. Is there any who can know how to fix this problem.
Thnk u so much xD
Array Array

↧

What exists in e(first) at what position after ivreg2 ?

August 14, 2020, 12:03 am

≫ Next: How to automate dummy variable adjustment for missing data?

≪ Previous: Reshape problem

After ivreg2, e(first) can be used. Then, I want to use

Code:

[#,#]

right after the saved e(first) to access regression information.

Code:

sysuse auto
ivreghdfe price weight  (length turn=gear_ratio displacement), ffirst
mat FIRST=e(first)
estadd scalar fstat=FIRST[1,1]

But how can I know at which point what exists? That is, how can I tell whether to do FIRST[2,1] or FIRST[3,1]? Help file doesn't seem to have that.

↧

How to automate dummy variable adjustment for missing data?

August 14, 2020, 1:01 am

≫ Next: Hospital beds within distance (geonear)

≪ Previous: What exists in e(first) at what position after ivreg2 ?

I run a fixed effects panel regression on survey data with missing values at my regressors (and DV). Since missings make about 35% in my personal data, it's time to deal with them. First option I found is dummy variable adjustment. I am aware of some drawbacks of the method in general. In this post I am interested in the code implementation.

I followed the procedure from this site: https://ies.ed.gov/ncee/pubs/20090049/section_3a.asp

My setup is very similar to this MWE:

Code:

* load data
use http://www.stata-press.com/data/r13/nlswork

* set panel structure
xtset idcode year
* 28534 obs, missing data e.g. union 9296
mdesc

* fixed effects regression (automatically uses 13797 complete cases)
quietly xtreg ln_wage c.wks_ue##c.wks_ue##i.occ_code union age, fe
margins, dydx(wks_ue)

* dummy variable adjustment to deal with missing data in regressors
gen D=0
replace D=1 if wks_ue==.
replace wks_ue=0 if wks_ue==.

* run FE again (now 19156 obs are used)
quietly xtreg ln_wage c.wks_ue##c.wks_ue##i.occ_code##D union age, fe
margins, dydx(wks_ue)

First, my question is whether it is correct to use the dummy D once in the interaction (since the regressor with missing data is a quadratic term). The subsequent question is, if there a way to automate this procedure? Since I have have 15 variables in my model, and I would like to use DVA for 5 of them. Thank you

↧

Hospital beds within distance (geonear)

August 14, 2020, 1:08 am

≫ Next: Compare coefficients after reg3

≪ Previous: How to automate dummy variable adjustment for missing data?

Dear Statalist,
I have a dataset with patient IDs and lat/lon data about their residence postal code. I have a second data set with clinic departments and their lat/lon data and the number of hospital beds for each department.

I want to generate a variable in the patient data set that gives me the number of hospital beds within a certain distance.

I looked at this solution via genoear https://www.statalist.org/forums/for...using-lat-long and it works perfectly to generate the number of departments in a radius (which I'm also interested in) but how can I sum the hospital bed numbers instead of counting the number of departments?

↧

Compare coefficients after reg3

August 14, 2020, 3:52 am

≫ Next: How to test the difference between marginal effects in Stata

≪ Previous: Hospital beds within distance (geonear)

This might be something simple, but I cannot wrap my head around it. I am trying to compare coefficients from two regressions that use subsamples after reg3 by storing them but I cannot figure out how to refer to the coefficients in the code to call them back and compare. Here is a simplified example. Note: in the actual problem, I am using panel data, so I cannot do an interaction as my dummy variable is endogenous.

Code:

sysuse auto
bootstrap, reps(100):reg3 ( price mpg) ( weight length)
estimates store eq1
bootstrap, reps(100):reg3 ( price mpg) ( weight length) if foreign==1
estimates store eq1

I want to compare stored mpg coefficient in eq1 and mpg coefficient in equation eq2 by using "test". Something like test [eq1_mean]mpg=[eq2_mean]mpg

suest reports that it does not work with reg3.

Thanks

↧

How to test the difference between marginal effects in Stata

August 14, 2020, 4:31 am

≫ Next: Do file beginner question

≪ Previous: Compare coefficients after reg3

I have an ordinary least squares regression with an interaction term:

Code:

regress price c.trunk##c.weight

I want to test the marginal effects of `trunk` at several percentiles of `weight` and test if these marginal effects are statistically different from each other. I can estimate the marginal effects at several percentiles of `weight` with the `margins` command. However, when I `test` that difference between these marginal effects (e.g., 50th versus 5th or 95th versus 5th percentiles), the answers (i.e., the Fs and ps) are the same.

For example:

Code:

sysuse auto, clear

regress price c.trunk##c.weight
margins, dydx(trunk) at((p5) weight) at((p50) weight) at((p95) weight) post

test _b[1._at] = _b[3._at]
test _b[2._at] = _b[3._at]

Which yields:

Code:

. sysuse auto, clear
(1978 Automobile Data)

.
. regress price c.trunk##c.weight

Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 3, 70) = 10.01
Model | 190636755 3 63545585.1 Prob > F = 0.0000
Residual | 444428641 70 6348980.58 R-squared = 0.3002
-------------+------------------------------ Adj R-squared = 0.2702
Total | 635065396 73 8699525.97 Root MSE = 2519.7

----------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
trunk | -287.6778 309.9737 -0.93 0.357 -905.9009 330.5452
weight | 1.172154 1.510518 0.78 0.440 -1.840479 4.184786
|
c.trunk#c.weight | .0754153 .0979483 0.77 0.444 -.1199365 .270767
|
_cons | 3284.654 4248.161 0.77 0.442 -5188.037 11757.34
----------------------------------------------------------------------------------

. margins, dydx(trunk) at((p5) weight) at((p50) weight) at((p95) weight) post

Average marginal effects Number of obs = 74
Model VCE : OLS

Expression : Linear prediction, predict()
dy/dx w.r.t. : trunk

1._at : weight = 1830 (p5)

2._at : weight = 3190 (p50)

3._at : weight = 4290 (p95)

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
trunk |
_at |
1 | -149.6679 149.0747 -1.00 0.319 -446.9879 147.6521
2 | -47.10315 94.62805 -0.50 0.620 -235.8328 141.6265
3 | 35.85363 155.51 0.23 0.818 -274.3013 346.0086
------------------------------------------------------------------------------

.
. test _b[1._at] = _b[3._at]

( 1) [trunk]1bn._at - [trunk]3._at = 0

F( 1, 70) = 0.59
Prob > F = 0.4439

. test _b[2._at] = _b[3._at]

( 1) [trunk]2._at - [trunk]3._at = 0

F( 1, 70) = 0.59
Prob > F = 0.4439

I do not expect these tests to yield the same Fs and ps. How do I test the differences of these marginal effects?

P.S. I migrated this question from Stack Overflow to Statalist.

↧

Do file beginner question

August 14, 2020, 5:38 am

≫ Next: Reverse each axis after -grqreg-

≪ Previous: How to test the difference between marginal effects in Stata

I am trying to improve my understanding of do file scripting. For disclosure, I program competently in eg R, Python, C.

so, my eventual aim is to reproduce some of my R wrangling scripts. One thing I do in R is programatically check the types of variables and raise a flag if they aren't as expected.

so, I have written this code as my first step on the way (I include my baby programmer comments):

Code:

* loop over the variables specified
foreach v of varlist surname-history {
* test that variable is numeric, don't stop on fail
    capture confirm numeric variable `v'
* if the return code doesn't indicate success write out it's a string (if !_rc is if _rc is not true) 
 if !_rc {
        disp "this is a string"
    }
* otherwise write out that it is a number
    else {
        disp "This is a number"
    
    }

}

The first few lines of my data are like:

1.	ALI	2	1	52	46	35
2.	BLAKEMORE	2	1	56	38	40
3.	RAMANI	1	3	42	43	40
4.	ROWLANDS	1	2	47	50	48
5.	DRURY	2	2	50	50	49

I have checked the types and they are as expected - only surname is a string

I get 'this is a number' six lines in succession as output. No error message. So, correctly six of my variables are identified as numeric, the return code tests and it prints the message. But, although - I assume - it is failing on the first variable, it doesn't display the message.

My instinct is that I misunderstand _rc, but I can't see what is wrong.

I would be really grateful for any help!

↧

Reverse each axis after -grqreg-

August 14, 2020, 6:28 am

≫ Next: How to interpret the coefficient of the variable that takes first difference?

≪ Previous: Do file beginner question

Dear All,

Applying the -qreg- command, I attempt to reverse the X- and Y-axis in the graphs computed from -grqreg-.

Code:

webuse auto, clear
qreg price mpg headroom
grqreg, cons ci ols olsci title(Fig.1a Fig.1b Fig.1c)

In the previous post, I read xsc(reverse) and/or ysc(reverse) could work (here: https://www.statalist.org/forums/for...in-stata-graph). But -grqreg- command does not allow that option, I guess this command processes estimated values.

Although the strength of quantile regression may come from representing the values along with the quantiles at the X-axis, can anyone help me to reverse each axis?

Best regards,
Lin-Geon

(Version 14.2)

↧

How to interpret the coefficient of the variable that takes first difference?

August 14, 2020, 7:14 am

≫ Next: titles of factor variables in output of a regression

≪ Previous: Reverse each axis after -grqreg-

Panel data model.
x takes first difference to become stationary.
Then how to interpret the coefficient of d.X?

↧

titles of factor variables in output of a regression

August 14, 2020, 7:34 am

≫ Next: interaction terms

≪ Previous: How to interpret the coefficient of the variable that takes first difference?

Dear all,

I am running an OLS regression using factor variable as a regressor:

Code:

reg punt_global i.expectativas_salario_bachiller i.expectativas_salario_profesi , robust

The output is

Code:

------------------------------------------------------------------------------------------------
                               |               Robust
                   punt_global |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
expectativas_salario_bachiller |
            Entre 8 y 10 SMLV  |  -16.16014   2.797607    -5.78   0.000    -21.64347   -10.67681
             Entre 5 y 7 SMLV  |  -10.57995    2.08135    -5.08   0.000    -14.65941   -6.500486
             Entre 3 y 4 SMLV  |  -8.756583   1.775683    -4.93   0.000    -12.23693   -5.276231
             Entre 1 y 2 SMLV  |  -2.378132   1.655555    -1.44   0.151    -5.623032    .8667669
              Menos de 1 SMLV  |  -.2397346   1.652037    -0.15   0.885     -3.47774    2.998271
                               |
  expectativas_salario_profesi |
            Entre 8 y 10 SMLV  |  -2.413842   .9830901    -2.46   0.014    -4.340706   -.4869785
             Entre 5 y 7 SMLV  |  -7.506238   .8937014    -8.40   0.000    -9.257899   -5.754577
             Entre 3 y 4 SMLV  |  -15.21564   .8766769   -17.36   0.000    -16.93393   -13.49734
             Entre 1 y 2 SMLV  |  -26.53327   .9426736   -28.15   0.000    -28.38092   -24.68562
              Menos de 1 SMLV  |  -29.36724   1.056401   -27.80   0.000     -31.4378   -27.29669
                               |
                         _cons |    264.533   1.746067   151.50   0.000     261.1107    267.9553
------------------------------------------------------------------------------------------------

When I use the esttab command to export my result, I get

Code:

------------------------------------
                              (1)   
                     Puntaje gl~l   
------------------------------------
Más de 10 SMLV                  0   
                              (.)   

Entre 8 y 10 SMLV          -16.16***
                          (-5.78)   

Entre 5 y 7 SMLV           -10.58***
                          (-5.08)   

Entre 3 y 4 SMLV           -8.757***
                          (-4.93)   

Entre 1 y 2 SMLV           -2.378   
                          (-1.44)   

Menos de 1 SMLV            -0.240   
                          (-0.15)   

Más de 10 SMLV                  0   
                              (.)   

Entre 8 y 10 SMLV          -2.414*  
                          (-2.46)   

Entre 5 y 7 SMLV           -7.506***
                          (-8.40)   

Entre 3 y 4 SMLV           -15.22***
                         (-17.36)   

Entre 1 y 2 SMLV           -26.53***
                         (-28.15)   

Menos de 1 SMLV            -29.37***
                         (-27.80)   

Constant                    264.5***
                         (151.50)   
------------------------------------
Observations                55009   
------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

I would like to get the titles of each one of the factor variables (that I get in the normal reg version) when I export my output. How can I do this?

Regards

↧

interaction terms

August 14, 2020, 8:43 am

≫ Next: Eventstudytools.com/barc/upload - duplicate entry error

≪ Previous: titles of factor variables in output of a regression

Hi,

Im trying to run a regression using interaction terms including a dummy, could some one help me with the code.
the dependant variable is pov
the independant variables are rol and wat
my dummy variable is dev, =1 if yes and =0 if no

I have done this regression already: reg poverty rol wat if dev==1
but i now want to create interaction terms with the dummy- so a single variable that will give the coefficient of rol and wat if dev=1 or dev=0
i know i need to generate a new variable but not sure how to do this

anyone know how i can code this?

↧

Eventstudytools.com/barc/upload - duplicate entry error

August 14, 2020, 9:01 am

≫ Next: r(132) too few quotes

≪ Previous: interaction terms

Hello reader,

I am a beginner when it comes to data-analytics and I have been trying to conduct an event study using the tool Eventstudytools.com/barc/upload; as I need any assistance I can get. However, after closely following all instructions and formatting requirements, I am left with the following error: ''Duplicate entry in firm data: 13312;2011-01-03''.
The tool does not allow for duplicates in my share price variable, even if the date is different. For the firm with identifier 13312, the stock price was 44.2 on both 2011-03-23, and 44.2 on 2011-03--22. This is a very common occurence in my large dataset. Do you guys have any idea how to resolve this? I am desperate.

I apologize for the sloppy formatting,

Best regards,

Ronald

↧

r(132) too few quotes

August 14, 2020, 11:26 am

≫ Next: Save mi estimate results as dataset

≪ Previous: Eventstudytools.com/barc/upload - duplicate entry error

Hi all,

For the life of me I cannot figure out where the error lies in my code. I was hoping a second set of eyes could help me out.

Code:

qui reghdfe dry_yield num_pest_per_hh, a(villgis year)
outreg2 using coef1.tex, replace title(Dry Yield and Pesticide Use per HH) ctitle(OLS) addtext(Village FE, X, Year FE, X)

which returns

"C:\Users\Daniel's PC\ado\plus/o/outreg2.ado
too few quotes
too many ')' or ']'
r(132);
"
I am guessing it has to do with where the file is attempting to save, but I can't figure it out. Thank you for reading.

↧

Save mi estimate results as dataset

August 14, 2020, 12:33 pm

≫ Next: [Beginner] How to do identify the 2 groups after Propensity Score Matching?

≪ Previous: r(132) too few quotes

I'd like to run mi estimate over different subsets of data, and save the resulting estimates to another Stata dataset for further processing. I tried this:

Code:

mi estimate, saving("Data\Chile_1000_CIs.dta",replace): mean math_incomplete, over(school_num)

but the saved output is in .ster format, which I hadn't heard of before. I'd like to put the output in .dta format. Any suggestions?

↧

[Beginner] How to do identify the 2 groups after Propensity Score Matching?

August 14, 2020, 3:50 pm

≫ Next: Time Series: how to obtain a date variable collapsing year and month (dm)?

≪ Previous: Save mi estimate results as dataset

Dear All,

apologies I am really new to statistics and Stata but I need to conduct a Propensity Score Matching. I follow the steps in a link attached below with the corresponding dta file.

global treatment TREAT
global ylist RE78
global xlist AGE EDUC MARR
global breps 5

describe $treatment $ylist $xlist
summarize $treatment $ylist $xlist

pscore $treatment $xlist, pscore(myscore) blockid(myblock) comsup

so the question is:
1. I would like to than continue the code with a nearest neighbour matching, and isolate the 2 groups to conduct a paired t test. May I know how I should approach the codes please?

Thank you!
Link: https://sites.google.com/site/econom...score-matching

↧

Time Series: how to obtain a date variable collapsing year and month (dm)?

August 14, 2020, 8:24 pm

≫ Next: Alternative to inlist() - expression too long

≪ Previous: [Beginner] How to do identify the 2 groups after Propensity Score Matching?

I have time series data of an index of employment in Brazilian industries. Part of the data is reproduced below.

What I have: variable "date_2" measuring year-month (200801, 200802, …, 201911, 201912)

What I need: to create a variable "dm": 2008m1, 2008m2, …, 2019m11, 2019m12

However, I've been unable to do it. There is something simple I'm missing.

Could someone help?

Thank you

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte month int year long date_2 double emprego_ind_indice
 1 2008 200801   106
 2 2008 200802 106.3
 3 2008 200803 107.1
 4 2008 200804 108.4
 5 2008 200805 109.2
 6 2008 200806 109.8
 7 2008 200807 110.3
 8 2008 200808 110.8
 9 2008 200809   112
10 2008 200810   112
11 2008 200811 110.5
12 2008 200812 107.5
 1 2009 200901 105.9
 2 2009 200902 104.9
 3 2009 200903 104.4
 4 2009 200904 104.4
 5 2009 200905 104.7
 6 2009 200906 104.7
 7 2009 200907 104.6
 8 2009 200908 105.7
 9 2009 200909 106.5
10 2009 200910 107.1
11 2009 200911 107.3
12 2009 200912 106.3
 1 2010 201001 106.7
 2 2010 201002 107.8
 3 2010 201003 109.5
 4 2010 201004 110.5
 5 2010 201005 111.6
 6 2010 201006 112.1
 7 2010 201007 112.7
 8 2010 201008 113.9
 9 2010 201009 114.5
10 2010 201010 114.6
11 2010 201011 114.1
12 2010 201012 112.6
 1 2011 201101 112.1
 2 2011 201102 112.7
 3 2011 201103 113.2
 4 2011 201104 113.9
 5 2011 201105 114.7
 6 2011 201106 114.8
 7 2011 201107   115
 8 2011 201108 115.4
 9 2011 201109 115.5
10 2011 201110   115
11 2011 201111 113.9
12 2011 201112 112.4
 1 2012 201201 112.5
 2 2012 201202 112.3
 3 2012 201203 112.7
 4 2012 201204 113.1
 5 2012 201205 113.8
 6 2012 201206 113.6
 7 2012 201207 113.8
 8 2012 201208 113.8
 9 2012 201209 114.3
10 2012 201210 114.6
11 2012 201211 114.1
12 2012 201212 112.3
 1 2013 201301 112.1
 2 2013 201302 112.6
 3 2013 201303 113.3
 4 2013 201304 113.9
 5 2013 201305 114.1
 6 2013 201306 114.2
 7 2013 201307 114.7
 8 2013 201308 114.9
 9 2013 201309 115.7
10 2013 201310 115.8
11 2013 201311 115.2
12 2013 201312 113.5
 1 2014 201401 113.6
 2 2014 201402 114.6
 3 2014 201403 114.5
 4 2014 201404 114.5
 5 2014 201405 114.6
 6 2014 201406 113.9
 7 2014 201407 113.5
 8 2014 201408 113.2
 9 2014 201409 112.7
10 2014 201410 112.3
11 2014 201411 111.8
12 2014 201412 110.3
 1 2015 201501 110.1
 2 2015 201502 109.7
 3 2015 201503 109.4
 4 2015 201504 108.8
 5 2015 201505 108.3
 6 2015 201506 107.3
 7 2015 201507 106.2
 8 2015 201508 105.2
 9 2015 201509 104.6
10 2015 201510 103.6
11 2015 201511 102.3
12 2015 201512 100.4
 1 2016 201601  99.3
 2 2016 201602  99.2
 3 2016 201603  99.2
 4 2016 201604  99.3
end

↧

Alternative to inlist() - expression too long

August 14, 2020, 9:56 pm

≫ Next: Merge 2 datasets by range of dates

≪ Previous: Time Series: how to obtain a date variable collapsing year and month (dm)?

Hi,

Relatively basic question: I am looking for a concise (1 line) alternative to inlist that would accept more arguments. Is there an "ssc install" that would fix this inlist() limitation?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gvkey str2 linktype
"001000" "NU"
"001000" "NU"
"001000" "LU"
"001001" "NU"
"001001" "LU"
"001002" "NR"
"001002" "NR"
"001002" "NR"
"001002" "LC"
"001003" "NU"
"001003" "NU"
"001003" "LU"
"001004" "NU"
"001004" "NU"
"001004" "LU"
"001005" "NU"
"001005" "LU"
"001007" "LU"
"001007" "LU"
"001007" "NU"
"001008" "NR"
"001008" "LC"
"001009" "NR"
"001009" "NR"
"001009" "NR"
"001009" "LC"
"001010" "LU"
"001010" "LU"
"001010" "NU"
"001011" "NR"
"001011" "NR"
"001011" "LC"
"001012" "NU"
"001012" "LU"
"001012" "NU"
"001013" "NU"
"001013" "LU"
"001015" "NU"
"001015" "LU"
"001015" "NU"
"001016" "NR"
"001016" "NR"
"001016" "LC"
"001017" "NR"
"001017" "NR"
"001017" "LC"
"001018" "LU"
"001018" "NU"
"001018" "LU"
"001018" "NU"
"001019" "NR"
"001019" "NR"
"001019" "NU"
"001019" "LC"
"001020" "NR"
"001020" "NR"
"001020" "NR"
"001020" "LC"
"001020" "NR"
"001021" "NU"
"001021" "NU"
"001021" "LU"
"001022" "LU"
"001022" "NU"
"001022" "LU"
"001023" "NU"
"001023" "NU"
"001023" "LU"
"001024" "NU"
"001024" "LU"
"001025" "NU"
"001025" "LU"
"001026" "NU"
"001026" "NU"
"001026" "LU"
"001027" "NU"
"001027" "LU"
"001028" "NU"
"001028" "LU"
"001029" "NU"
"001029" "LU"
"001030" "NU"
"001030" "LU"
"001031" "LU"
"001031" "LU"
"001034" "NR"
"001034" "LC"
"001036" "NU"
"001036" "NU"
"001036" "LC"
"001036" "NR"
"001036" "LX"
"001037" "NU"
"001037" "NU"
"001037" "LU"
"001038" "NU"
"001038" "LU"
"001038" "NU"
"001039" "NR"
"001039" "NR"
end

Code:

keep if inlist(linktype,"LU","LC","LU", "LC", "LD", "LF", "LN", "LO", "LS", "LX")

↧

Merge 2 datasets by range of dates

August 14, 2020, 11:05 pm

≫ Next: Is Durbin–Wu–Hausman test a valid test when using generated regressor without any declaration of IVs?

≪ Previous: Alternative to inlist() - expression too long

Hi,

I would like to merge 2 datasets by range of dates and with an identifier (gvkey). So far, I have used "rangejoin" (ssc install) but I do not see options to manage unmatched data in each dataset (similar to the variable _merge create when using "merge 1:1 var using ...").

I would like to keep data in the using dataset "data3.dta" which includes the date (datadate) that I want to be within a certain range (linkdt) and (linkenddt). I do not wish to keep unmatched observations from the master dataset but I would like to browse and be able to see what has matched or not.

Master dataset: lnk.dta:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gvkey int(linkdt linkenddt)
"001000"  3969  6755
"001001"  8663  9708
"001002"  4731  4904
"001003"  8741 10820
"001004"  4497     .
"001005"  4779  8431
"001007"  5022  6969
"001007"  6970  9037
"001008"  8637 10164
"001009"  8053 13221
"001010" -3532   760
"001010"   761  8945
"001011"  8480 13054
"001012"  6605 10955
"001013"  7014 18627
"001015"  8064  9800
"001016"  6543 10343
"001017"  4731 13208
"001018"  4930  6969
"001018"  6970  7733
"001019"  4731 10266
"001020"  4731 10322
"001021"  7592 14322
"001022"  4930  6086
"001022"  6087  8609
"001023"  8195 10505
"001024"  8602  9037
"001025"  4779  9209
"001026"  4731  6725
"001027"  2222  4165
"001028"  7972 11718
"001029"  8566  9435
"001030"  4779  5402
"001031"    91  2221
"001031"  2222  6299
"001034"  8811 17895
"001036"  8740 15126
"001036"  8740 15126
"001037"  7585 10464
"001038"  8630 16436
"001039"  3771  7273
"001040" -3652   760
"001040"   761  9435
"001042"  8207  9310
"001043" -3289   760
"001043"   761  8152
"001043"  8896 12192
"001043" 12393 14636
"001044"  2952  3833
"001045" -3652   760
"001045"   761 18996
"001045" 19701     .
"001046"  8897  9282
"001047"  7894  8551
"001049"  5873  8181
"001050"  7637     .
"001051"  1958  3287
"001052"  4839  6969
"001052"  6970  8490
"001054"  7944 12691
"001055"  9120 13755
"001056"  6390 17409
"001057"  1461  2586
"001057"  2587  2919
"001058"   913  9799
"001059"  4731  7760
"001061"  4839  6969
"001061"  6970  7609
"001062"  1855     .
"001065"  8494 11522
"001066"  8796  9834
"001067"   913  2221
"001067"  2222  9127
"001069"  8691  9055
"001070"  2922  2951
"001070"  2952  7790
"001072"  4749  4836
"001072"  4837 10975
"001072" 13010     .
"001073"  7738 13389
"001074"  3625  8400
"001075"   761     .
"001076"  8343 12053
"001076" 11995 12053
"001076" 12054 18596
"001076" 12054 18596
"001076" 18597 18606
"001076" 18597     .
"001077"  4930  6969
"001077"  6970  7578
"001078" -3652   760
"001078"   761     .
"001079"   366  3317
"001079"  3318  6117
"001080"  1004  1459
"001080"  1460  5964
"001081" 10058 17465
"001081" 10058 17465
"001082"  8215 18870
"001083"  6573  7335
end
format %td linkdt
format %td linkenddt

Using dataset: data3.dta:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gvkey double datadate
"001000"  5843
"001000"  6209
"001000"  6574
"001001"  8765
"001001"  9131
"001001"  9496
"001003"  8765
"001003"  9131
"001003"  9527
"001003"  9892
"001003" 10257
"001003" 10623
"001003" 10988
"001004"  5629
"001004"  5995
"001004"  6360
"001004"  6725
"001004"  7090
"001004"  7456
"001004"  7821
"001004"  8186
"001004"  8551
"001004"  8917
"001004"  9282
"001004"  9647
"001004" 10012
"001004" 10378
"001004" 10743
"001004" 11108
"001004" 11473
"001004" 11839
"001004" 12204
"001004" 12569
"001004" 12934
"001004" 13300
"001004" 13665
"001004" 14030
"001004" 14395
"001004" 14761
"001004" 15126
"001004" 15491
"001004" 15856
"001004" 16222
"001004" 16587
"001004" 16952
"001004" 17317
"001004" 17683
"001004" 18048
"001004" 18413
"001004" 18778
"001004" 19144
"001004" 19509
"001004" 19874
"001004" 20239
"001004" 20605
"001004" 20970
"001004" 21335
"001004" 21700
"001004" 22066
"001005"  5782
"001005"  6148
"001005"  6513
"001005"  6878
"001005"  7243
"001005"  7609
"001005"  7974
"001006"  7851
"001006"  8216
"001007"  7212
"001007"  7578
"001007"  7943
"001007"  8308
"001007"  8673
"001007"  9039
"001008"  8917
"001008"  9282
"001008"  9647
"001009"  8339
"001009"  8704
"001009"  9070
"001009"  9435
"001009"  9800
"001009" 10165
"001009" 10531
"001009" 10896
"001009" 11261
"001009" 11626
"001009" 11992
"001009" 12357
"001009" 12722
"001010"  5843
"001010"  6209
"001010"  6574
"001010"  6939
"001010"  7304
"001010"  7670
"001010"  8035
"001010"  8400
"001010"  8765
"001010"  9131
end
format %td datadate

Is there something more adapted than rangejoin to do this? Thanks.

↧