Bootstrapping very large sample

February 13, 2020, 6:09 pm

≫ Next: How to combine two observation-varying datasets based on several variables

Hi everyone,

I have a dataset of 70 million observations and am running the following model.

First I estimate the probability of belonging to class A with a logit model. Then I calculate fitted probabilities. Call those values x1hat.

Then I run an OLS regression to explain another variable y, as follows.

reg y x1hat x2 x2*x1hat

Where x2 is another explanatory variable.

I know that the standard errors of the last regression will not reflect the uncertainty of x1hat. So I wanted to bootstrap the standard errors of the entire procedure: first logit then Ols.

But my sample size is very large so I am afraid it won't be feasible to do 1000 reps with a sample size of 70 million each time.

Any suggestions on how to do this?

I noticed the wild and fast bootstrapping command (boottest) but I think that is only after one single estimation command ?So don't know if I can use boottest to repeat the joint procedure of logit followed by ols.

Appreciate your guidance,

Laurie

↧

How to combine two observation-varying datasets based on several variables

February 13, 2020, 7:44 pm

≫ Next: merge two categorical variable to make a new categorical variable

≪ Previous: Bootstrapping very large sample

Hi Statalists,

I am new to Stata and just encountered this issue in data management.

Suppose I have two datasets with different numbers of observations.

First, conditioning on these two datasets contain some common variables but different observations, I want combine these two datasets, remaining all the observations in these two datasets and leaving the unmatched observations as missing values. For example, both these two datasets have variables srcdate, gvkey, and cusip. These two datasets also contain their own but potentially different salecs (namely, salecs1 and salecs2). If these three variables (srcdate, gvkey, and cusip) can match, then we compare their salecs (salecs1 and salecs2) by listing them next to each other. This procedure can be done by generating a dummy variable, if matched, presented by 1, otherwise, presented by 0.

Second, if initially these three variables

(srcdate, gvkey, and cusip)

cannot match, just create a new row of observations and leave whoever's cell as missing values.

The result should contain both datasets entire information as well as comparing their observations respectively.

Here is the example for these two datasets.

Dataset 1

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long srcdate str6 gvkey str10 cusip str58 conm str4 sic str50 cnms str8 ctype double salecs
19904 "122519" "68243Q106" "1-800-FLOWERS.COM"           "5961" "International" "GEOREG"   15.127
20269 "122519" "68243Q106" "1-800-FLOWERS.COM"           "5961" "International" "GEOREG"   11.215
20635 "122519" "68243Q106" "1-800-FLOWERS.COM"           "5961" "International" "GEOREG"    11.73
21000 "122519" "68243Q106" "1-800-FLOWERS.COM"           "5961" "International" "GEOREG"   11.936
21365 "122519" "68243Q106" "1-800-FLOWERS.COM"           "5961" "International" "GEOREG"   11.519
20819 "034066" "68247Q102" "111 INC -ADR"                "5960" "Not Reported"  "COMPANY"   6.997
21184 "034066" "68247Q102" "111 INC -ADR"                "5960" "3 Customers"   "COMPANY"  50.479
18627 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicare"      "GOVDOM"  239.017
18627 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Commercial"    "MARKET"   272.78
18627 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicaid"      "GOVDOM"   16.077
18627 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Self Pay"      "MARKET"    8.039
18992 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Commercial"    "MARKET"  325.093
18992 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Self Pay"      "MARKET"    8.942
18992 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Argentina"     "GEOREG"     43.5
18992 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicaid"      "GOVDOM"   17.883
18992 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicare"      "GOVDOM"  286.772
19358 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Commercial"    "MARKET"  367.812
19358 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicare"      "GOVDOM"  292.328
19358 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Self Pay"      "MARKET"    7.548
19358 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Argentina"     "GEOREG"     62.7
19358 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicaid"      "GOVDOM"   18.528
19723 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Self Pay"      "MARKET"    7.876
19723 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicaid"      "GOVDOM"   19.332
19723 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Argentina"     "GEOREG"     72.5
19723 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Commercial"    "MARKET"  388.787
19723 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicare"      "GOVDOM"  300.004
20088 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicare"      "GOVDOM"  376.865
20088 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Self Pay"      "MARKET"    8.522
20088 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Commercial"    "MARKET"  540.678
20088 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicaid"      "GOVDOM"   20.832
20088 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Argentina"     "GEOREG"     76.8
20453 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicare"      "GOVDOM"  388.049
20453 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Commercial"    "MARKET"  587.074
20453 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Self Pay"      "MARKET"   10.001
20453 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Argentina"     "GEOREG"     90.3
20453 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "8090" "Medicaid"      "GOVDOM"   15.002
end
format %d srcdate

Dataset 2

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long srcdate str6 gvkey str10 cusip str58 conm str100 conml str4 sic double salecs
18627 "179586" "33740N105" "1ST UNITED BANCORP INC"      "1st United Bancorp Inc"             "6020"   50.174
18992 "179586" "33740N105" "1ST UNITED BANCORP INC"      "1st United Bancorp Inc"             "6020"   62.148
19358 "179586" "33740N105" "1ST UNITED BANCORP INC"      "1st United Bancorp Inc"             "6020"   70.183
19723 "179586" "33740N105" "1ST UNITED BANCORP INC"      "1st United Bancorp Inc"             "6020"     71.5
18566 "166222" "90137E106" "20-20 TECHNOLOGIES INC"      "20-20 Technologies Inc"             "7372"   66.531
18931 "166222" "90137E106" "20-20 TECHNOLOGIES INC"      "20-20 Technologies Inc"             "7372"    68.25
20088 "020547" "90214L106" "2050 MOTORS INC"             "2050 Motors Inc"                    "5500"        0
20453 "020547" "90214L106" "2050 MOTORS INC"             "2050 Motors Inc"                    "5500"        0
20819 "020547" "90214L106" "2050 MOTORS INC"             "2050 Motors Inc"                    "5500"        0
21184 "020547" "90214L106" "2050 MOTORS INC"             "2050 Motors Inc"                    "5500"        0
21549 "020547" "90214L106" "2050 MOTORS INC"             "2050 Motors Inc"                    "5500"        0
18627 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "21st Century Oncology Holdings Inc" "8090"  543.963
18992 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "21st Century Oncology Holdings Inc" "8090"  644.717
19358 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "21st Century Oncology Holdings Inc" "8090"  693.951
19723 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "21st Century Oncology Holdings Inc" "8090"  736.516
20088 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "21st Century Oncology Holdings Inc" "8090" 1018.182
20453 "265008" "90131G107" "21ST CENTURY ONCOLOGY HLDGS" "21st Century Oncology Holdings Inc" "8090" 1079.227
18627 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"   79.576
18992 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  162.209
19358 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  244.644
19723 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  324.879
20088 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  463.599
20453 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"   561.05
20819 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  524.525
21184 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  521.449
21549 "186876" "90138A103" "21VIANET GROUP INC"          "21Vianet Group Inc"                 "7370"  494.492
18627 "160938" "90214M104" "2242749 ONT LTD"             "2242749 Ont Ltd"                    "3080"  462.797
18992 "160938" "90214M104" "2242749 ONT LTD"             "2242749 Ont Ltd"                    "3080"  453.605
19358 "160938" "90214M104" "2242749 ONT LTD"             "2242749 Ont Ltd"                    "3080"  457.415
19723 "160938" "90214M104" "2242749 ONT LTD"             "2242749 Ont Ltd"                    "3080"  455.522
20088 "160938" "90214M104" "2242749 ONT LTD"             "2242749 Ont Ltd"                    "3080"  473.866
end
format %d srcdate

↧

merge two categorical variable to make a new categorical variable

February 13, 2020, 11:15 pm

≫ Next: Recursive*vector autoregressive model

≪ Previous: How to combine two observation-varying datasets based on several variables

how can I make a new categorical variable from two other categorical variable??
ex: I have a variable named lafor force and another variable not in labor force . both are categorical variable. I want to make a new varibale name lfnlf includining labor force and not in labor force together.

↧

Recursive*vector autoregressive model

February 14, 2020, 12:39 am

≫ Next: Reshaping the multiple variable region-level panel data from the wide to the long-shape format

≪ Previous: merge two categorical variable to make a new categorical variable

Hi,

I would like to program a basic recursive vector autoregressive model using Cholesky decomposition with method of moment estimation method. I am seeking to reproduce model 1 in the part "8. Illustration" of the following link:
http://www.csam.or.kr/journal/view.h....2017.24.5.421

I have tried to do it as follow, but I do not know how impose the structure to the var:

Code:

webuse lutkepohl2
tsset

var dln_inv dln_inc dln_consump

↧

Reshaping the multiple variable region-level panel data from the wide to the long-shape format

February 14, 2020, 12:43 am

≫ Next: Twin Fixed effects on panel data

≪ Previous: Recursive*vector autoregressive model

Dear Statalist,
I have the following 3 dimensions’ multiple variable region-level panel data. This dataset consists of 267 variable series covering 548 regions and 19 year: variable ID, denoted as sid = (var1, var2, var3, …, var267), the region id, denoted as bps_id =(1, 2, 3, …, 548), and year, denoted as year =(2000, 2001, 2002, …, 2018).
The data structure is described as.

sid bps_id yr2000 yr2001 … yr2018
var1 1 XXX XXX … XXX
var1 2 XXX XXX … XXX
var1 3 XXX XXX … XXX
⸽ ⸽ ⸽ ⸽ ⸽ ⸽
var1 548 XXX XXX … XXX
var2 1 XXX XXX … XXX
var2 2 XXX XXX … XXX
var2 3 XXX XXX … XXX
⸽ ⸽ ⸽ ⸽ ⸽ ⸽

Now, I would reshape to the long-shape format by variable as follows.

bps_id year var1 var2 … var267
1 2000 XXX XXX … XXX
2 2000 XXX XXX … XXX
⸽ ⸽ ⸽ ⸽ ⸽
548 2000 XXX XXX … XXX
1 2001 XXX XXX … XXX
2 2001 XXX XXX … XXX
⸽ ⸽ ⸽ ⸽ ⸽
548 2001 XXX XXX … XXX

I use the following code.
reshape long yr var, i(sid bps_id) j(year)
However, the results is not what I expected as.

Data wide -> long
Number of obs. 146316 -> 2.8e+06
Number of variables 43 -> 27
j variable (19 values) -> year
xij variables:
yr2000 yr2001 ... yr2018 -> yr
var2000 var2001 ... var2018 -> var

I use the I use STATA version 16.

Would you help me with your expertise?

↧

Twin Fixed effects on panel data

February 14, 2020, 1:18 am

≫ Next: Write a function correctly

≪ Previous: Reshaping the multiple variable region-level panel data from the wide to the long-shape format

Hi all,

I have a dataset of 7000 employees divided over 600 companies. I want to assess the impact of a certain policy on the pay gap between men and women. I want to do this with twin fixed effects, by taking company1 as twin1 and company 2 as twin 2 and so on. I just have some trouble pairing them up. is there a command for this? and how do I restrict my dataset to twins?

I know I have to collapse the data, I just can't figure out how.

example of data:
Array

↧

Write a function correctly

February 14, 2020, 1:29 am

≫ Next: longitudinal data

≪ Previous: Twin Fixed effects on panel data

[COLOR=rgba(0, 0, 0, 0.87)]Hello everybody[/COLOR]

[COLOR=rgba(0, 0, 0, 0.87)]Can anyone tell me how to write this formula in Stata?

Array [/COLOR]

[COLOR=rgba(0, 0, 0, 0.87)]Thanks so much[/COLOR]








↧

longitudinal data

February 14, 2020, 1:33 am

≫ Next: How do i get the mean of all the seperate regressions?

≪ Previous: Write a function correctly

I would be very pleased if someone can provide me references on how to claen longitudinal data. Thank you.

↧

How do i get the mean of all the seperate regressions?

February 14, 2020, 2:55 am

≫ Next: Stata runs very slow when estimating mixed models in a multiple imputed database

≪ Previous: longitudinal data

Hey,

I have the problem that I have multiple regressions of years from 1963 to 2004 and I want the mean of all the coefficients instead of the separate coefficient of every year. So, I would like to have 41 observations of the years in one regression.
years earnings lagged earnings (-1 year)
1963 xxx xxx
1964 xxx xxx
…
2004 xxx xxx

Now I have the command: bysort Year (Company_Key): regress defl_Earnings defl_Earnings_sd_L1
Here I get all the separated 41 regressions insteadof the aggregated coefficients of all the years. Thanks in advance!

↧

Stata runs very slow when estimating mixed models in a multiple imputed database

February 14, 2020, 3:03 am

≫ Next: Stata runs very slow when estimating mixed models in a multiple imputed database

≪ Previous: How do i get the mean of all the seperate regressions?

Dear Statalisters, I am using mixedin Stata 13 to assess the effect of a treatment on four different measures of physical activity in a randomised trial. The problem is that the models run very slowly, often more than 45 minutes before rendering an output, if they report an output at all.

My database has over 1300 cases, and it is multiple imputed, with the multiple imputation command mi generating 50 imputed datasets. The dependent variable is Activity (continuous), and the model adjusts it by a number of covariates, both continuous and categorical, and the baseline value of the dependent variable Activity_b. The model also includes 3 fixed effects: treatment (with values 1,2,3), centre (with values 1,2,3,4 for the participating sites) and frailty (with categories frail, pre-frail and no frail), and one random effect defined by Couple - to account for cluster-randomisation of individuals in a couple to the same treatment.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double Activity byte treatment double centre float frailty double Activity_b int Couple 
                . 2 1 2 682.0952380952385  1
                . 2 1 1 732.4722222222217  2
739.7619047619044 2 1 1 669.7380952380956  2
853.2222222222217 1 1 1 870.5714285714286  4
622.2619047619044 3 1 1 588.6666666666671  5
                . 1 1 2 699.5476190476186  6
                . 3 1 1  631.305555555555  7
805.8809523809529 1 1 1 737.3095238095242  8
                . 2 1 1 739.2380952380956  9
                . 2 1 1  644.305555555555 10
                . 2 1 1 697.2857142857143 10
788.5714285714286 3 1 1 768.8809523809529 12
766.7857142857143 1 1 0 761.0476190476186 13
640.4047619047615 2 1 0 657.7857142857143 14
                . 3 1 0 695.6666666666671 15
                . 3 1 2 714.7619047619044 16
793.3095238095242 3 1 0 776.2619047619044 17
  746.97619047619 3 1 0 751.0714285714286 17
                . 2 1 2 799.3333333333329 19
                . 2 1 0               845 20
end
label values frailty frailty_lbl
label def frailty_lbl 0 "No frail", modify
label def frailty_lbl 1 "Pre-frail", modify
label def frailty_lbl 2 "Frail", modify

The command is

Code:

mi estimate: mixed Activity i.treatment i.centre i.frailty $cov Activity_b || Couple:, residuals (independent, by (centre))

where cov is the list of covariates defined as global:

Code:

global cov "covariate1 covariate2 covariate3"

I've tried it in two different computers (Stata 13 and Stata15.1), and the models run extremely slow in both. What can you suggest to increase the speed of estimation?

Thanks a lot !

Marta

↧

Stata runs very slow when estimating mixed models in a multiple imputed database

February 14, 2020, 3:04 am

≫ Next: merging two year database company keys with one year observations database

≪ Previous: Stata runs very slow when estimating mixed models in a multiple imputed database

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double Activity byte treatment double centre float frailty double Activity_b int Couple 
                . 2 1 2 682.0952380952385  1
                . 2 1 1 732.4722222222217  2
739.7619047619044 2 1 1 669.7380952380956  2
853.2222222222217 1 1 1 870.5714285714286  4
622.2619047619044 3 1 1 588.6666666666671  5
                . 1 1 2 699.5476190476186  6
                . 3 1 1  631.305555555555  7
805.8809523809529 1 1 1 737.3095238095242  8
                . 2 1 1 739.2380952380956  9
                . 2 1 1  644.305555555555 10
                . 2 1 1 697.2857142857143 10
788.5714285714286 3 1 1 768.8809523809529 12
766.7857142857143 1 1 0 761.0476190476186 13
640.4047619047615 2 1 0 657.7857142857143 14
                . 3 1 0 695.6666666666671 15
                . 3 1 2 714.7619047619044 16
793.3095238095242 3 1 0 776.2619047619044 17
  746.97619047619 3 1 0 751.0714285714286 17
                . 2 1 2 799.3333333333329 19
                . 2 1 0               845 20
end
label values frailty frailty_lbl
label def frailty_lbl 0 "No frail", modify
label def frailty_lbl 1 "Pre-frail", modify
label def frailty_lbl 2 "Frail", modify

The command is

Code:

mi estimate: mixed Activity i.treatment i.centre i.frailty $cov Activity_b || Couple:, residuals (independent, by (centre))

where cov is the list of covariates defined as global:

Code:

global cov "covariate1 covariate2 covariate3"

I've tried it in two different computers (Stata 13 and Stata15.1), and the models run extremely slow in both. What can you suggest to increase the speed of estimation?

Thanks a lot !

Marta

↧

merging two year database company keys with one year observations database

February 14, 2020, 3:13 am

≫ Next: Converting globals in to scalars

≪ Previous: Stata runs very slow when estimating mixed models in a multiple imputed database

Hi all,
I would like to have an answer regarding the following topic. I want to match observations from a two year observations database with a one year observations database on basis of company keys (see below). So, does someone have an answer to match the company keys below for the two year database with the one year database? I know there is a merging command, but it does not work properly.

Regards and thank you for the effort in replying to me.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long Company_Key
1010
1010
1010
1010
1010
1010
1010
1010
1010
1010
1010
1036
1036
1036
1040
1040
1040
1040
1040
1040
1040
1043
1043
1043
1043
1044
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1045
1048
1048
1048
1048
1048
1048
1048
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1075
1078
1078
1078
1078
1078
1078
1078
1078
1078
1078
1078
1078
1078
1078
1078
1078
1078
1078
1078
end

------------------ copy up to and including the previous line ------------------

Listed 100 out of 24128 observations
Use the count() option to list more

↧

Converting globals in to scalars

February 14, 2020, 3:56 am

≫ Next: replace if with no real changes made

≪ Previous: merging two year database company keys with one year observations database

Hi all

I have a question. During my .do file I created some globals in which I stored the value of some previously defined scalars. I need to reconvert these globals into scalars in order to do some aritmetical operations. This is how the code looks like:

*generate the globals

global country = "IT UK FR"
global year = "2018 2019"
global vars = "x y z"

foreach ctr of global country {
foreach yrs of global years {
foreach var of global vars {
scalar `ctr'_`yrs'_name = 0
...
scalar `ctr'_`yrs'_x = scalar(`ctr'_`yrs'_name) + x
global `ctr'_`yrs'_x = `ctr'_`yrs'_x
...
scalar `ctr'_`yrs'_y = scalar(`ctr'_`yrs'_name) + y
global `ctr'_`yrs'_y = `ctr'_`yrs'_y
...
clear
}

*display the content of the globals

dis `ctr'_`yrs'_x
dis `ctr'_`yrs'_y

-> here I'd like to sum the two and divide all by "z"
(`ctr'_`yrs'_x + `ctr'_`yrs'_y) / z

}
}

I converted scalars in globals becuase I gave the command clear at the end of the third loop (I used clear since actually this third loop loops different datasets) and, if I'm right, clear delets scalars but not globals. Now, at the end of the third loop, in running the second loop, I need to do the mathematical operations with my globals.
How can I do?

Thank you in advance
Nicolò

↧

replace if with no real changes made

February 14, 2020, 4:39 am

≫ Next: foreach loop with fe (fixed effects) and constraints

≪ Previous: Converting globals in to scalars

Dear Statalist,

I have a variable State with storage type str3. I want to creat a dummy variable, 1 when State equals "CT", 0 for else. So I generate a new dummy as below:

gen New_England = 0
replace New_England = 1 if State == "CT"

But 0 real changes made.

I am confused which step is wrong. Forgive me that I may ask a silly easy question.. And thank you for your help!

Best regard
Lijuan

↧

foreach loop with fe (fixed effects) and constraints

February 14, 2020, 4:41 am

≫ Next: Correlation of the change of the current and the previous variable (autocorrelation)

≪ Previous: replace if with no real changes made

Hi

I want to run fixed effects regression together with some restrictions(constraints) for 250 cross-sections that I have. the data is unbalanced quarterly from year 2005-2019. I want to run this regression individually for each cross-section i..e is should be a regression of "rimes-series of cross-sections".

I am not able to specify fe with cnsreg for constraints regression.

Any help should be appreciated

↧

Correlation of the change of the current and the previous variable (autocorrelation)

February 14, 2020, 5:02 am

≫ Next: Multivariate logistic regression

≪ Previous: foreach loop with fe (fixed effects) and constraints

Hi, I am trying to find the auto correlation of current earnings change and past earnings change, but I have no idea how to compute that using the lagged and current variable. An example of my sample and what I am trying to achieve is:

Year	Earnings	Lagged earnings (-1 year)	Earnings change	Past earnings change
1963	200	250	50	0
1964	300	200	100	50
1965	400	300	100	100
1966	425	400	25	100
1967	475	425	50	25
1968	410	475	65	50
1969	400	410	10	65
1970	500	400	100	10
1971	530	500	30	100

So first, you have to compute earnings change and the past earnings change and then calculate the overal correlation between them. Thank you in advance!

↧

Multivariate logistic regression

February 14, 2020, 5:05 am

≫ Next: descriptive statics: change the N from days to months

≪ Previous: Correlation of the change of the current and the previous variable (autocorrelation)

code for Multivariate analysis

↧

descriptive statics: change the N from days to months

February 14, 2020, 5:23 am

≫ Next: loop over nested macro

≪ Previous: Multivariate logistic regression

Hi,

I'm trying to replicate the descriptive statics of a paper. The N of the descriptive statics is in months. My data is formatted in days. I already have a variable mofd created on the date with the mofd command. How can I use the summarize function of STATA to create the descriptive statics in months and not days.

Kind regards

↧

loop over nested macro

February 14, 2020, 6:44 am

≫ Next: how to add the labels of axis for 3D graphs

≪ Previous: descriptive statics: change the N from days to months

I want to plot many two-way plots, therefore I have many pairs of X's and Y's.

I want to use macro and loop over the plot function:

Code:

local pair1 X1 Y1
local pair2 X2 Y2
local pair3 X3 Y3
local pair4 X4 Y4

local pairs pair1 pair2 pair3 pair4

********************** plot two-way scatter graphs **********************

foreach pair in `pairs' {

    graph twoway (scatter `pair')

}

Why it does not work? How can I make it work? Thank you very much!

↧

how to add the labels of axis for 3D graphs

February 14, 2020, 6:45 am

≫ Next: Selecting cases in multiply imputed data with if

≪ Previous: loop over nested macro

I want to add the labels of x axis and y axis, how to fufill that? The below is my code and data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int PY byte Num float id
2019  2 5
2019  2 2
2019 23 3
2019  2 4
2019  2 1
2018  3 1
2018 16 3
2018  3 4
2018  2 2
2017  7 1
2017  3 4
2017 12 3
2017  1 5
2016  2 1
2016 10 3
2016  1 2
2016  4 4
2016  1 5
2015  5 3
2015  3 4
2015  1 1
2014  3 1
2014  2 2
2014  3 3
2014  1 4
2013  2 5
2013  8 3
2013  1 2
2013  4 4
2013  1 1
2012  3 4
2012  6 3
2012  1 1
2011  3 1
2011  2 3
2011  4 4
2010  2 2
2010  1 3
2010  4 4
2010  1 1
2009  1 1
2009  3 3
2009  1 2
2008  1 1
2008  3 3
2008  1 4
2007  2 3
2007  1 4
2006  2 2
2006  1 3
2006  2 4
2006  1 1
2005  1 1
2005  1 2
2005  3 3
2009  1 5
2005  1 5
2004  2 2
2004  1 1
2004  3 5
2003  2 5
2003  2 4
2003  2 1
2003  2 3
2003  1 2
2002  4 3
2002  2 2
2001  1 2
2001  2 5
2001  2 3
2001  1 4
2000  1 1
2000  4 2
2000  3 5
2000  1 3
2000  1 4
end

scat3 id PY  Num,sch(tufte) scale(0.29) titlez(, mlabang(0) mlabpos(5)  mlabs(vhuge)) titlex(,mlabs(vhuge)) titley(,mlabs(vhuge))

↧