Different output from estimates table when using stored estimates

February 1, 2020, 8:09 am

≫ Next: How to save my y and residual errors from a regression loop

≪ Previous: ml maximize, technique(bhhh): option technique() not allowed

How do I preserve value labels when running estimates table with stored estimates?

If I run estimates table immediately after I fit a model, the output shows the correct value labels; however, if I store the estimate, fit another model, and then try to use estimates table with the stored estimate corresponding to the first model, the value labels are lost.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double ped_slight_inj float season byte weather_conditions
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 1 1
1 1 1
0 1 2
0 1 1
0 1 1
0 1 1
0 1 2
0 1 1
0 1 1
0 1 1
end
label values season season
label def season 1 "Winter", modify
label def season 2 "Spring", modify
label values weather_conditions weather_conditions
label def weather_conditions 1 "Fine", modify
label def weather_conditions 2 "Raining", modify

// First regression
qui logit ped_slight_inj i.season

// Store first regression
est store season

// First estimates table
estimates table season

// Second regression
qui logit ped_slight_inj i.weather_conditions

// Second estimates table from stored result of first regression
estimates table season

↧

How to save my y and residual errors from a regression loop

February 1, 2020, 8:22 am

≫ Next: Calculating the Employment-Weighted Mean Differential

≪ Previous: Different output from estimates table when using stored estimates

Could you tell me why I cannot save my y or residuals here? Everything working except last and I dont know how to store y.
In addition the looping over values is not performing, it only does the 10 obs specified in local mc = 10.

Thanks much!
clear local mc = 10

set obs `mc'

g data_store_x3 =.

g data_store_x2 =.

g data_store_con= .

g data_store_y =.

quietly{
forvalues i = 1(1) `mc' {
if floor((`i'-1)/100) == ((`i'-1)/100) {
noisily display "Working on `i' out of `mc' at $S_TIME"
}

preserve
clear
set obs 10
g x2 = rnormal()
g x3 = rnormal()
g e = runiform()
g y = 1 -3*x2 + 2*x3 + e
reg y x2 x3
local x2coeff = _b[x2]
local x3coeff = _b[x3]
local const = _b[_cons]
restore
replace data_store_x3 = `x3coeff' in `i'
replace data_store_x2 = `x2coeff' in `i'
replace data_store_con = `const' in `i'
}
}
summ data_store_con data_store_x2 data_store_x3 data_store_y
display e(rmse)
predict res, resid

↧

Calculating the Employment-Weighted Mean Differential

February 1, 2020, 8:29 am

≫ Next: Difference between using l1.var and previously generated lagged variable

≪ Previous: How to save my y and residual errors from a regression loop

I am currently investigating wage differentials by industry in Germany. My dataset gives me cross-sectional data per individual, with information such as industry, wage, etc. To find the uncontrolled wage differentials per industry, I ran the following command:

reg lnwage i.industry

Stata then returns coefficients for each industry (which represent the wage differential of that industry). I now wish to calculate the employment-weighted mean differential. I obtain the number of employed people in each industry with the following command:

tab industry

Now, I want to weight each coefficient obtained from the regression by its frequency, to get the weighted average differential across all industries. For example, if the industry "Farming" has a coefficient of 0.4 and employs 30% of individuals, while "Mining" has a coefficient of -0.2 and employs 70% of the individuals, then the employment-weighted mean differential is (0.4 * 0.3) + (-0.2 * 0.7) = -0.02

Ideally, I then want to present the difference between each industry's coefficient and the employment-weighted mean differential in a table.

Does anybody have an idea how I can execute this?

↧

Difference between using l1.var and previously generated lagged variable

February 1, 2020, 8:43 am

≫ Next: keeping people with the longest duration

≪ Previous: Calculating the Employment-Weighted Mean Differential

Dear Stata users,

I am using Stata 16 on Windows 10 and I'm working on a quarterly dataset of over 10,000 companies.

Code:

xtset
       panel variable:  gvkey (unbalanced)
        time variable:  fyearq_, 1996q2 to 2008q2, but with gaps
                delta:  1 quarter

I was looking at a variable for the average assets of a company at a given quarter of a year and I noticed something strange. For my work the variable has to be created like this:
' Average assets = ((Total assets) + (lagged Total assets)) / 2 '. The strange thing that occured is that the variable "Average assets" differs if I use l1.[Total assets] instead of a previously generated variable for "lagged Total assets". I provide sample data and the code I used. I will explain at the end why I didn't create new variable names that are straightforward.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double gvkey float fyearq_ double atq2 float(t2_atq_L1 t2_avg_assets t3_avg_assets)
1004 146 449.645       .        .        .
1004 147  468.55 449.645 459.0975 459.0975
1004 148 523.852  468.55  496.201  496.201
1004 149 529.584 523.852  526.718  526.718
1004 150 542.819 529.584 536.2015 536.2015
1004 151 587.136 542.819 564.9775 564.9775
1004 152 662.345 587.136 624.7405 624.7405
1004 153 670.559 662.345  666.452  666.452
1004 154 707.695 670.559  689.127  689.127
1004 155 737.416 707.695 722.5555 722.5555
1004 156 708.218 737.416  722.817  722.817
1004 157  726.63 708.218  717.424  717.424
1004 158 718.913  726.63 722.7715 722.7715
1004 159 747.043 718.913  732.978  732.978
1004 160 753.755 747.043  750.399  750.399
1004 161 740.998 753.755 747.3765 747.3765
1004 162 747.543 740.998 744.2705 744.2705
1004 163 772.941 747.543  760.242  760.242
1004 164 754.718 772.941 763.8295 763.8295
1004 165 701.854 754.718  728.286  728.286
1004 166 758.503 701.854 730.1785 730.1785
1004 167 714.208 758.503 736.3555 736.3555
1004 168 690.681 714.208 702.4445 702.4445
1004 169 710.199 690.681   700.44   700.44
1004 170 722.944 710.199 716.5715 716.5715
1004 171 727.776 722.944   725.36   725.36
1004 172 723.019 727.776 725.3975 725.3975
1004 173 686.621 723.019   704.82   704.82
1004 174 676.345 686.621  681.483  681.483
1004 175 666.178 676.345 671.2615 671.2615
end
format %tq fyearq_

Now to really explain the issue, here is the code I used and the output. The variable for "Total assets" is atq2

Code:

gen t2_avg_assets=((atq2)+(l1.atq2))/2
(15,545 missing values generated)

. gen t2_atq_L1 = l1.atq2
(14,933 missing values generated)

. gen t3_avg_assets=((atq2)+(t2_atq_L1))/2
(15,545 missing values generated)

. * t2_avg_assets and t3_avg_assets should be same, but they aren't:

. compare t2_avg_assets t3_avg_assets

                                        ---------- difference ----------
                            count       minimum      average     maximum
------------------------------------------------------------------------
t2_avg_~s<t3_avg_~s         14814     -.0078125    -.0000578   -2.33e-10
t2_avg_~s=t3_avg_~s        217381
t2_avg_~s>t3_avg_~s         14735      2.33e-10     .0000563    .0039063
                       ----------
jointly defined            246930     -.0078125    -1.06e-07    .0039063
jointly missing             15545
                       ----------
total                      262475

At first I create the 'Average assets' variable by using the lag operator L. Then I create a one-lagged variable for atq2 by using the lag operator L. Then I create again a 'Average assets' variable but instead of using the lag operator L I am using the lagged variable for which I used the lag operator L. To me the variables created should be identical but using the compare command shows that they aren't. So my question is: How are these two 'Average assets' variables not identical?

In preparation for this post I created variables with easier to understand names. But by doing this another question emerged.

Code:

gen assetstotalqtly = atq2
(741 missing values generated)

. gen assetstotalqtly_L1 = l1.assetstotalqtly
(14,933 missing values generated)

. gen averageassets = ((assetstotalqtly)+(assetstotalqtly_L1))/2
(15,545 missing values generated)

. gen test_averageassets = ((assetstotalqtly)+(l1.assetstotalqtly))/2
(15,545 missing values generated)

. compare averageassets test_averageassets

                                        ---------- difference ----------
                            count       minimum      average     maximum
------------------------------------------------------------------------
average~s=test_av~s        246930
                       ----------
jointly defined            246930             0            0           0
jointly missing             15545
                       ----------
total                      262475

compare assetstotalqtly atq2

                                        ---------- difference ----------
                            count       minimum      average     maximum
------------------------------------------------------------------------
assetst~y<atq2             123741       -.00625    -.0000155   -5.96e-11
assetst~y=atq2              12729
assetst~y>atq2             125264      2.61e-11     .0000156      .00625
                       ----------
jointly defined            261734       -.00625     1.46e-07      .00625
jointly missing               741
                       ----------
total                      262475

How are assetstotalqtly and atq2 not identical when I created the first by telling Stata it is equal to the latter? And why doesn't the issue described above occure?

I hope I described everything well enough, if not feel free to let me know. Thank you in advance!

↧

keeping people with the longest duration

February 1, 2020, 12:18 pm

≫ Next: Urgent STATA query please help

≪ Previous: Difference between using l1.var and previously generated lagged variable

Hi all,

I am working with some data and I am trying to keep people with the longest duration from 3 groups (dating, cohabiting, and married). I am only trying to keep people with the longest relationships.

↧

Urgent STATA query please help

February 1, 2020, 12:55 pm

≫ Next: Adding the scores of imputed group variable

≪ Previous: keeping people with the longest duration

I am a complete noob at STATA and am trying to do a subgroup meta analysis but it won't let me create them or maybe I am doing it wrong. I am also struggling with creating a funnel plot for the data. The data I have is continuous and is comparing between the control and intervention. Please help, I have a deadline for monday morning, thanks in advance!

↧

Adding the scores of imputed group variable

February 1, 2020, 1:53 pm

≫ Next: Taking more time in loop (foreach/forvalues)?

≪ Previous: Urgent STATA query please help

Hello Statalist,

I have a multiply imputed dataset that looks like this

Country A B C _mi_m
1 2 2 1 0
1 2 2 1 0
1 2 2 1 0
3 5 3 5 0
3 5 3 5 0
3 5 3 5 0
1 8 8 8 1
1 8 8 8 1
1 8 8 8 1
3 4 4 4 1
3 4 4 4 1
3 4 4 4 1
(note that the real dataset has more per Country observations and number of imputations. What is given above is just to show the format)

I'd like to generate a new variable X by adding the value of C in country1, country3, and etc together. According to the example above, I basically would like to have X=1+5 for the dataset _mi_m=0 and X=1+1 for the dataset _mi_m=1. Please how can I achieve this?
I use Stata 16.0

Best
Ikenna Egwu

↧

Taking more time in loop (foreach/forvalues)?

February 1, 2020, 4:06 pm

≫ Next: How do I do two j for reshape command? Use reshape twice?

≪ Previous: Adding the scores of imputed group variable

Dear All, I notice that if I run 100 regressions (using loop), and save some statistics, every regression takes, say, 0.1 second. However, when I run 30,000 regressions, the average time for additional regression is becoming longer and longer. Does anyone know why this is happening (because of memory?)? Thanks.

↧

How do I do two j for reshape command? Use reshape twice?

February 1, 2020, 5:50 pm

≫ Next: How to match two datasets with constraints?

≪ Previous: Taking more time in loop (foreach/forvalues)?

Hello,
I have a wide data set that contains variables:
PUBID (individuals ID)
startdate__njob_year

Ideally, I would want to have two j (year Njob)
so that the data could look like:
Array

what should I do? My professor suggests that I could do reshape twice.
Thanks in advance!

↧

How to match two datasets with constraints?

February 1, 2020, 7:23 pm

≫ Next: Formatting Several Variables using a Loop

≪ Previous: How do I do two j for reshape command? Use reshape twice?

Hello,

This is my first time posting on this forum so thank you to everyone in advance.

The simplest way for me to describe my problem is as follows: I have two datasets. The first dataset consists of a list of suppliers and their capacities. It looks like

Code:

<supplierid> <capacity>
A 20
B 30
C 10
D 15

The second dataset consists of buyers, how much they want to buy, and who they want to buy from (preferences).

Code:

<buyerid> <quantity> <preference> <supplierid>
1 15 1 A
1 15 2 C
2 10 1 A
2 10 2 C
3 20 1 B
3 20 2 A

To explain a little bit further, each buyer wants to buy a fixed amount of quantity. Each buyer also has a preference for who they want to buy from (in general buyers are not willing to buy from all suppliers). A buyer must buy the entire quantity from one supplier i.e. it is not possible for a buyer to buy 1 unit from supplier A and 1 unit from supplier C. Supplier don't have preferences and don't care who they sell to.

What I want to do is make my way down the list of buyers and assign a supplier to each buyer (to the extent that I can). So for example, buyer 1 will buy 15 units from supplier A. Buyer 2 will buy 10 units from supplier C (since supplier A won't have 10 units of capacity left after buyer 1 has purchased 15 units), and buyer 3 will buy 20 units from supplier B. I want my final dataset to look as follows:

Code:

<buyerid> <quantity> <sellerid>
1 15 A
2 10 C
3 20 B

Total capacity is much lower than total quantity demanded so, at the end, most buyers will not be able to buy anything.

Any help will be greatly appreciated.

Thanks!

↧

Formatting Several Variables using a Loop

February 1, 2020, 9:01 pm

≫ Next: How to resolve numeric overflow while performing xtlogit,fe in stata?

≪ Previous: How to match two datasets with constraints?

Hello
I have several variables (68) consisting of strings, integers, and floats. I am trying to format it using a loop and using if-else condition.
Here's the dataex

Code:

clear all
input str50 var1 int var2 str50 var3 float var4
Sarita            25    Geeta    23.1
mushkanchan 31          Subha    18.9
Kanchandeviparmer    21 Kamla       81.3
Laxmi  23               Subhaparmar    27.1
Ram      21             Sarita        23
Sita      22            Subha     34
Haru       18           Santosh     22
"hari kana" 23       "Santosh K Dash" 23.5
end

The code I used to do the formatting is

Code:

local vars var2 var4
foreach var of varlist var1-var4{

   * Formatting strings
local j : word [Math Processing Error]vars′of  var'
   if j=[Math Processing Error]var′{    format%30s   }   ⋅Forma∈g∫e≥rsandfloats   elseifj=  vars' {
       format %20g
    }
    }

That code has an error. I was trying with if-else condition. But could not succeed. Any help will be greatly appreciated.

↧

How to resolve numeric overflow while performing xtlogit,fe in stata?

February 1, 2020, 10:51 pm

≫ Next: Identify life events in panel data

≪ Previous: Formatting Several Variables using a Loop

Dear all,
Apologies if this question seems to be a repetition of a previous question. I had made some errors in the title, hence posting the corrected question again.
I am getting (error1400): combinations results in numeric overflow; computations cannot proceed, while performing xtlogit, fe in stata with 5738 observations (about 1900 individuals X 3 rounds).
Please consider the following sample data set for this purpose

Code:

 input str3 ID byte str1 round byte str1 hi byte str1 acc byte str1 inf byte str1 shock

            ID      round         hi        acc        inf      shock
  1. IN1 1 1 0 1 1
  2. IN1 2 1 1  1 1
  3. IN1 3 0 0 1 1
  4. IN2 1 1 1 0 1
  5. IN2 2 0 0 1 0
  6. IN2 3 1 0 0 0
  7. end

. list

     +--------------------------------------+
     |  ID   round   hi   acc   inf   shock |
     |--------------------------------------|
  1. | IN1       1    1     0     1       1 |
  2. | IN1       2    1     1     1       1 |
  3. | IN1       3    0     0     1       1 |
  4. | IN2       1    1     1     0       1 |
  5. | IN2       2    0     0     1       0 |
     |--------------------------------------|
  6. | IN2       3    1     0     0       0 |
     +--------------------------------------+

I set up the panel as follows:

Code:

encode ID, gen(ID1)
drop ID
rename ID1 ID
xtset round ID

however when I peformed

Code:

xtlogit hi inf shock, fe

I got the following

Code:

1,913 (group size) take 1,640 (# positives) combinations results in numeric overflow; computations cannot proceed r(1400)

from the original data set

The same regression with

Code:

xtlogit acc inf shock, fe

returned the regression results in my original data set.

I am confused as to why with only 5738 observations I'm getting numeric overflow. Also, please suggest a way to resolve this problem.

Thanks and Regards

↧

Identify life events in panel data

February 2, 2020, 12:02 am

≫ Next: How to remove " and "?

≪ Previous: How to resolve numeric overflow while performing xtlogit,fe in stata?

Hi Statalist. I would like to be able to identify life events within a panel dataset with multiple waves. A large number of questions are asked about the life and experiences of participants within households on an annual basis, including if they 'married', 'have a baby', 'separate from spouse' or if they experience a 'worsening in finances', such as bankruptcy and so on (coded lemarr, lebth, lesep and lefnw). These variables are based on a [1] No, [2] Yes response.

I would like to be able to search through the waves of data (which I've already merged into a single file) to identify changes in an individual's self-rated life satisfaction over time based on the effect of certain life events or experiences throughout the panel data. While many of these related questions are asked annually, if the respondent experiences a given life event, they are asked for further information, such as the quarter in which the life event occurred so it would be good to factor this in if possible. Finally, I have data on both the respondent and their partner and this is represented in the naming of variables, such as lemarr (for the respondent) and p_lemarr (for the partner).

I have the following code which someone else helped me with for a different piece of work, but not sure if it is good code or if it is suitable for the task, so I appreciate any suggestions or guidance with this:

Code:

sort xwaveid wave
cap drop droppout 
gen droppout = mi(mrcurr) 

cap drop change_to_* 
gen change_to_married = 0 if droppout == 0 
bys id (wave): replace change_to_married = 1 if marstat == 1 & marstat[_n-1] != 1 & marstat[_n-1] != . & marstat != . // marstat - marital status  
gen change_to_sep = 0 if droppout == 0
bys id (wave): replace change_to_sep = 1 if marstat == 3 & marstat[_n-1] == 1 & marstat[_n-1] != . & marstat != .
bys id (wave): egen change_to_married_N = sum(change_to_married) if droppout == 0
bys id (wave): egen change_to_sep_N = sum(change_to_sep) if droppout == 0
cap drop nwave 
gen nwave = -wave 
gen timeline_married = .     
replace timeline_married = 0 if change_to_married == 1
bys id (wave): replace timeline_married = timeline_married[_n-1] + 1 if timeline_married == . 
bys id (nwave): replace timeline_married = timeline_married[_n-1] - 1 if timeline_married == . 
order timeline_married, after(change_to_married) 
gen timeline_sep = . 
replace timeline_sep = 0 if change_to_sep == 1     
bys id (wave): replace timeline_sep = timeline_sep_div[_n-1] + 1 if timeline_sep == .  
bys id (nwave): replace timeline_sep = timeline_sep[_n-1] - 1 if timeline_sep == . 
order timeline_sep, after(change_to_sep)        
gen time_single_married = timeline_single_married + 100
gen time_defacto_married = timeline_defacto_married + 100

A sample of the data follows (note that this applies to the respondent only, but could be duplicated to imitate the partner data if needed):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(lemar lebth lesep lefnw)
1 1 1 1
2 1 1 1
1 1 1 1
1 1 1 1
1 2 1 1
1 1 1 1
1 2 1 1
1 2 1 1
2 1 1 1
1 1 1 1
1 1 1 1
1 2 1 1
1 1 1 1
1 2 1 1
1 2 1 1
1 1 1 1
2 1 1 1
1 1 1 1
1 1 1 1
1 2 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 2 1 1
1 2 1 1
1 1 1 1
1 1 1 2
1 1 1 1
2 1 1 1
1 1 1 1
1 1 1 1
1 2 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 2 1 1
1 2 1 1
1 1 1 1
1 1 1 1
1 1 1 2
1 1 1 1
end
label values lemar RLEME
label values lesep RLEME
label values lebth RLEME
label values lefnw RLEME
label def RLEME 1 "[1] No", modify
label def RLEME 2 "[2] Yes", modify

Please let me know if you need more information.

I appreciate any support/guidance. Kind regards, Chris

↧

How to remove " and "?

February 2, 2020, 12:23 am

≫ Next: Draw figure like this?

≪ Previous: Identify life events in panel data

Dear All, I have this dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str8 k
`""020130""'
`""020421""'
`""030490""'
`""030569""'
`""030611""'
`""030613""'
`""040700""'
`""060290""'
`""070190""'
`""070990""'
end

How can I remove " and " for each element in `k' from the data? Thanks.

↧

Draw figure like this?

February 2, 2020, 1:25 am

≫ Next: Problems with generating group identifier

≪ Previous: How to remove " and "?

Dear All, I have this dataset

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str21 agency float(year marketshare)
"AA" 2014 .016206777
"AA" 2015 .034153644
"AA" 2016   .0821594
"AA" 2017 .017304044
"AA" 2018 .023902394
"BB" 2014 .036147483
"BB" 2015 .007484141
"BB" 2016  .06066402
"BB" 2017 .017952515
"BB" 2018     .01004
"CC" 2012    .516475
"CC" 2013   .9294872
"CC" 2014   .5161464
"CC" 2015   .4879826
"CC" 2016  .48794785
"CC" 2017  .52657396
"CC" 2018   .5150023
"DD" 2012  .12227561
"DD" 2014 .004835254
"DD" 2016 .011968442
"DD" 2017  .01982715
"DD" 2018 .003805023
"EE" 2012   .3612493
"EE" 2013 .070512824
"EE" 2014    .426664
"EE" 2015  .47037965
"EE" 2016   .3572603
"EE" 2017   .4183423
"EE" 2018   .4472503
end

and wish to draw figure like thisArray
Any suggestions? Thanks.

↧

Problems with generating group identifier

February 2, 2020, 2:57 am

≫ Next: has problem with : recode with more than one if condition

≪ Previous: Draw figure like this?

Dear all,

I'm having a problem with making a variable that identifies a group. I have a data set with 1,000,000 observations with travel times between 1,000 zones. There is travel times between every origin-destination pair, but I want to check whether the travel time from a to b equals the travel time from b to a.

I think the problem can be represented with the following code:

Code:

clear
input i j
a b
b a
a b
b b  
end

What I need is a code that can generate a variable that identifies the origin-destination pairs, but disregards the order.

Any suggestions are appreciated. Thanks.

↧

has problem with : recode with more than one if condition

February 2, 2020, 3:16 am

≫ Next: Comparing means from 6 different samples.

≪ Previous: Problems with generating group identifier

Dear All,

I have two variables V1 and V2 .

V2 has the option (1= Control , 2= CBE, 3=Government)

I want to recode some value in V1 with "if" condition according to three different Options, it works in separate line of code as following:

recode V1 (5=9)(6=10) if V2==1
recode V1 (5=7)(6=8) if V2==2
recode V1 (5=6)(6=6) if V2==3

but how can I do all the above mentioned three lines codes in one line code to do the above mentioned action. does recode accept more than one if condition

Kindest Regards,
Yousufzai

↧

Comparing means from 6 different samples.

February 2, 2020, 3:49 am

≫ Next: Moving 12 - 1 cumulative return

≪ Previous: has problem with : recode with more than one if condition

Dear All,

I have 6 different data sets. Say, my variable of interest is y and I want to test if the mean of y is statistically different in the 6 different data sets. What is the best way of doing this? I will be grateful for any direction you may be able to offer.
Sincerely,
Sumedha.

↧

Moving 12 - 1 cumulative return

February 2, 2020, 4:03 am

≫ Next: Transform data or use a non-parametric analysis

≪ Previous: Comparing means from 6 different samples.

Hey all,

I am trying to replica a momentum trading strategy for which I'm building portfolios based on the 12 - 1 cumulative returns. This means that for each observation I calculate the cumulative return of the 12 months before, but without the 1 month before. My dataset is ordered like this:

firm_id date company stockprice market_value ret
1 200310 77 BANK (#T) - 77 BANK (#T) 3030 232266.8
1 200311 77 BANK (#T) - 77 BANK (#T) 2880 220768.4 -.049505
1 200312 77 BANK (#T) - 77 BANK (#T) 2915 223451.4 .0121528
1 200401 77 BANK (#T) - 77 BANK (#T) 3020 231500.2 .0360206
1 200402 77 BANK (#T) - 77 BANK (#T) 2875 220385.1 -.0480132
1 200403 77 BANK (#T) - 77 BANK (#T) 3000 229967.1 .0434783
1 200404 77 BANK (#T) - 77 BANK (#T) 3280 251430.7 .0933333
1 200405 77 BANK (#T) - 77 BANK (#T) 3320 254496.9 .0121951
1 200406 77 BANK (#T) - 77 BANK (#T) 3340 256030.1 .0060241
1 200407 77 BANK (#T) - 77 BANK (#T) 3635 278643.5 .0883234
1 200408 77 BANK (#T) - 77 BANK (#T) 3540 271361.2 -.0261348
1 200409 77 BANK (#T) - 77 BANK (#T) 3250 249131 -.0819209
1 200410 77 BANK (#T) - 77 BANK (#T) 3185 244148.4 -.02
1 200411 77 BANK (#T) - 77 BANK (#T) 3165 242615.3 -.0062794
1 200412 77 BANK (#T) - 77 BANK (#T) 3310 253730.4 .0458136
1 200501 77 BANK (#T) - 77 BANK (#T) 3605 276343.8 .0891239
1 200502 77 BANK (#T) - 77 BANK (#T) 3765 288608.7 .0443828
1 200503 77 BANK (#T) - 77 BANK (#T) 3745 287075.6 -.0053121

What would be the best way going forward to calculate the 12-1 cumulative returns for each observation (starting at 200411) ? I know it has something to do with the rangestat command but I can't seem to make it work the way I want it to.

Thanks in advance.

↧

Transform data or use a non-parametric analysis

February 2, 2020, 5:35 am

≫ Next: Merging two rows

≪ Previous: Moving 12 - 1 cumulative return

I have data from an experiment in which five different methods were used to abrade a ceramic and the resulting particle size in the atmosphere measured. Due to the anisotropic nature of the ceramic there is considerable scatter in the data which is right skewed, and robvar shows the data to fail the homogeneity of variance test.
Array

I used a log transform
Array
There is still inhomogeneity in the variance but the distribution of residuals following anova appears to be normal as assessed visually using distplot (from SSC)
. robvar lc, by(method)

| Summary of lc
method | Mean Std. Dev. Freq.
------------+------------------------------------
1 | -4.5900172 .18631618 900
2 | -2.2221745 1.2543433 1,800
3 | -2.2856141 .74755 1,800
4 | -2.6684095 .90676613 900
5 | -3.2487493 .35297332 900
------------+------------------------------------
Total | -2.7889648 1.1870132 6,300

W0 = 572.11260 df(4, 6295) Pr > F = 0

W50 = 528.26727 df(4, 6295) Pr > F = 0

W10 = 552.20502 df(4, 6295) Pr > F = 0

My question is it acceptable to use this log transformed data; or would another transformation be preferable; or should I consider a nonparametric analysis such as dunntest (from SSC).

Thank you.
Eddy

↧