Quantcast
Channel: Statalist
Viewing all 73243 articles
Browse latest View live

Counting sorted by company and year

$
0
0
Hello everyone,

I have a short question about the count function. I want to sort the data based on cusip code and year and then make a variable that gets the value created by the count function if the classification equals to "I". In order to do so I used the following code
Code:
 bysort cusip8 fyear: egen Ind=count(classification) if classification=="I"
Code:
cusipnr fyear classification Ind
2 2009 "I" 11
2 2009 "I" 11
2 2009 "E"  .
2 2009 "E"  .
2 2009 "I" 11
2 2009 "E"  .
2 2009 "I" 11
2 2009 "I" 11
2 2009 "I" 11
2 2009 "I" 11
2 2009 "I" 11
2 2009 "I" 11
2 2009 "I" 11
2 2009 "I" 11
As the example shows, for this company in this year, the output variable should get the value of 11. However, if the classification does not have the value "I", it wont get the numeric value that is counted.

I was wondering how the output variable can be assigned to every entry of the firm of this year and not only to the entries that have the classification of "I".

Thanks!


Interpretation of interaction between binary and categorical variables (and margins) after Cox regression

$
0
0
Dear all,

I have a question on the interpretation of interaction effect between binary and categorical variable after Cox regression. I am studying if having a diagnosis affects the risk of dying differently in different educational levels.

I have read several posts (and the links included in these) related to this topic:
https://www.statalist.org/forums/for...interpretation
https://www.statalist.org/forums/for...ent-categories
https://www.stata.com/statalist/arch.../msg01122.html
https://www.statalist.org/forums/for...vival-analysis
https://www.statalist.org/forums/for...ferent-samples
https://www.statalist.org/forums/for...ns-after-stcox

I have also studied the examples of Maarten L. Buis: http://www.maartenbuis.nl/publications/interactions.html.
I have interpreted my results especially following the "Example of a categorical by continuous interaction in a Cox regression model for survival data” (https://www.stata.com/statalist/arch.../msg01332.html). However, my case is slightly different since I have a binary and categorical variable.

My question is:
1. Have I misinterpreted the results on Cox regression's interaction effect between diagnosis and educational level? If so, how?
2. Or have I misinterpreted the results of margins and marginsplot instead?

I'm using Stata/MP 15.1. The information on education and diagnosis is measured in 2010. Individuals are followed from 2011 to 2015.

Results:

Code:
stset time, failure(died) id (id)

stcox i.diag ##i.edu
margins, at(diag=(0 1) edu=(1 2 3))
marginsplot, scheme(s1mono)
 
Cox regression -- Breslow method for ties
 
No. of subjects =       99,760                  Number of obs    =      99,760
No. of failures =       10,287
Time at risk    =  401420.2051
                                                LR chi2(9)       =     2827.01
Log likelihood  =   -112414.25                  Prob > chi2      =      0.0000
 
------------------------------------------------------------------------------------
                _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
            1.diag |   3.676446   .4114305    11.63   0.000     2.952367    4.578108
                   |
               edu |
        secondary  |   1.646723   .0612913    13.40   0.000     1.530871    1.771342
            basic  |   2.898637   .0890877    34.63   0.000     2.729183    3.078612
                   |
          diag#edu |
      1#secondary  |   .8658759    .118324    -1.05   0.292     .6624253    1.131812
          1#basic  |    .631256   .0739209    -3.93   0.000     .5017977     .794113
------------------------------------------------------------------------------------
 
. margins, at(diag=(0 1) edu=(1 2 3))
 
Predictive margins                              Number of obs     =     99,760
Model VCE    : OIM
 
Expression   : Predicted hazard ratio, predict()
 
1._at        : diag            =           0
               edu             =           1
 
2._at        : diag            =           0
               edu             =           2
 
3._at        : diag            =           0
               edu             =           3
 
4._at        : diag            =           1
               edu             =           1
 
5._at        : diag            =           1
               edu             =           2
 
6._at        : diag            =           1
               edu             =           3
 
------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   1.008644   .0292157    34.52   0.000     .9513822    1.065906
          2  |   1.660957   .0789297    21.04   0.000     1.506258    1.815657
          3  |   2.923693   .1268029    23.06   0.000     2.675164    3.172222
          4  |   3.708225   .4283425     8.66   0.000     2.868689    4.547761
          5  |   5.287402   .4463702    11.85   0.000     4.412533    6.162272
          6  |   6.785243   .3498909    19.39   0.000      6.09947    7.471017
------------------------------------------------------------------------------
 
. marginsplot
 
Variables that uniquely identify margins: diag edu
Array



diag = have diagnosis (1=yes, 0=no), edu = education level (1=tertiary, 2=secondary, 3=basic), event = died between 2010-2017 (yes/no).

Interpretation:
Having a diagnosis increases the hazard by 3.67 times among tertiary educated.

Secondary educated have a 1.64 times higher risk for mortality when compared to highly educated. Those with only basic education are 2.89 times more likely to die than the highly educated.

Those with secondary education and a diagnosis, have a 14% (1-0.86) smaller risk of dying compared to highly educated with a diagnosis - However, the difference is not statistically significant. Those with basic education and a diagnosis have a 37% smaller risk of dying than tertiary educated (statistically significant). In other words, wouldn't these results suggest that having a diagnosis is more "harmful" for tertiary educated than for those with only secondary or basic education?

However, looking at the results after margins and marginsplot: here the results do not suggest that the diagnosis would have different effect in different educational levels. These results are more similar to what is produced after running a Cox regression without the main effects:

Code:
stcox i.diag#i.edu

------------------------------------------------------------------------------------
                _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
     diag#edu |
    0#secondary  |   1.646723   .0612913    13.40   0.000     1.530871    1.771342
         0#basic  |   2.898637   .0890877    34.63   0.000     2.729183    3.078612
     1#tertiary  |   3.676446   .4114305    11.63   0.000     2.952367    4.578108
    1#secondary  |    5.24209   .4157252    20.89   0.000     4.487451    6.123634
         1#basic  |   6.727095   .2851373    44.97   0.000      6.19082    7.309824
------------------------------------------------------------------------------------
How could I produce marginsplot including also the main effects?

Please let me know if I have missed something and that I my problem could be solved by revisiting some links that I have listed above.

Best,
Inge

Rolling reggression with newey-west standard error in stata 10

$
0
0
Hi for my thesis I have returns of multiple portfolios and have to regress them against returns of factor portfolios in a rolling window of 52 obs
In stata 10 asreg and rollreg do not seem to work and as it's the uni PC i can't update it.
Is there a way to do it in a loop, something like this:

Code:
forvalues i = 1/"nuber of observations"- 52 {
newey PBQ1 var24 SizeTLS momLS in i/i+51, lag(2)
 }


Graph combine: is there a way to align plot-regions?

$
0
0
Is there a way to align the individual plotregions when using graph combine that isn't just play-around-until-it-looks-ok(-ish)?

Here is a simple example:
Code:
twoway function 1, name(a) nodraw ytit("Y is 1", orientation(hor))
twoway function x, name(b) nodraw ytit("Y is equal to X", orientation(hor))
graph combine a b, cols(1)
This produces the below graph, and the x scales don't line up.

Array

The best I can come up with is adding spaces to the the shorter title until the plotregions line up.
To pre-empt the 'don't use graph combine' comments – I do need it here. Using the by() option would align the subplots automatically, but I can't use by() for this specific graph.

Thanks, Tim

generating id within sub-group

$
0
0

dear statalist,
I am faced with another challenge. after exploring all possibility to overcome it but to no avail, I resolve to share it on this forum.
below is the snapshot of my dataset.
my task is to generate a sequential unique id for the locality. the id starts and ends within a district. for example , distric kunini in province Karim has three localities and will be assigned id 1, 2 and 3, other locality will be coded the same way. I try to use
Code:
 group(district locality)
but end with id generated from first to the last locality instead of breaking it by district. I also think of using
Code:
fill (locality)
which throw an error because locality is not a
Code:
 numlist
. I have a large dataset that contains over ten thousand localities spread across over hundreds district. I ran short of option any help will be highly appreciated.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 province str10 district str12 locality
"Karim"   "kunin"      "kunin"      
"Karim"   "kunin"      "tafida"     
"Karim"   "kunin"      "hamma"      
"Karim"   "abbare"     "yitti"      
"Karim"   "abbare"     "donadda"    
"gassol"  "gassol"     "bajumba"    
"gassol"  "gassol"     "doubeli"    
"gassol"  "tutare"     "tutare"     
"gassol"  "tutare"     "gunduma"    
"gassol"  "mutum biyu" "namnai"     
"gassol"  "mutum biyu" "garin magaji"
"gassol"  "mutum biyu" "nanido"     
"gassol"  "mutum biyu" "agure"      
"jalingo" "kona"       "mayo dasa"  
"jalingo" "kona"       "jauro nokiya"
"jalingo" "kona"       "jekunoho"   
"jalingo" "kona"       "garu"       
"jalingo" "jalingo"    "wuro sambe" 
"jalingo" "jalingo"    "magami"     
"jalingo" "jalingo"    "mayo gwoi"  
"jalingo" "jalingo"    "sintali"    
"jalingo" "jalingo"    "bashin"     
"yorro"   "kpantisawa" "kpantisawa" 
"yorro"   "kpantisawa" "kassa"      
"yorro"   "kpantisawa" "bille"      
"yorro"   "kpantisawa" "zokwa"      
"yorro"   "pupule"     "kwaji"      
"yorro"   "pupule"     "tula"       
"yorro"   "pupule"     "bajumba"    
"yorro"   "pupule"     "mika"       
end
Kind Regadrs

Set up a count variable with a filter

$
0
0

Hi all, i would like to add a new variable with a counter. For the following dataset I would like to add a 4th variable which will be the mentioned counter. The counter should count the number of B observations for a given A observation when the filter equals 1. For example: the first count value should be 2 because the observation "1" in B occurs twice for the observation "1" in A given that the filter equals 1. * Example generated by -dataex-. To install: ssc install dataex clear input int(A B filter) 1 1 1 1 2 0 1 3 0 1 2 0 1 3 1 1 1 1 1 4 0 2 4 0 2 2 1 2 3 1 2 3 0 3 1 0 3 2 0 end I would appreciate any kind of help. Thank you very much.

Likelihood ratio test between glm and gllamm models

$
0
0
Hello, all,

I'm trying to do a likelihood ratio test to compare glm and gllamm models, but get an error message. I've seen others report comparing these models (mostly in textbooks), but they never show code.

Any help is appreciated.

Thanks,
DDT

Renaming Time variables in my dataset

$
0
0
Dear Statalisters

I've a problem in renaming my variables in my dataset. My dataset consists of Credit-to-GDP gaps of 44 different countries around the world from 1999Q1 to 2017Q4. The data example shown below is a simplified version,where:
  • BORROWERS_CTY is the abbreviation of a borrowing country
  • BORROWERS_CTYName is the name of a borrowing country
  • Q1 should stand for Yr 1999, Quarter 1 (or 1999Q1;but somehow when I imported the data from the Excel file, the variable has been named as "Q1")
  • Q2 similarly stands for 1999Q2; i.e. all numerical values under this column are Credit-to-GDP gaps of that country in Yr 1999 Q2.
  • Q3 for 1999Q3
  • Q4 for 1999Q4
  • And so on other letters such as G,H, I ...(which represent 2000Q1,2000Q2 and 2000Q3 respectively; They aren't displayed here for simplicity)

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 BORROWERS_CTY str14 BORROWERS_CTYName str5(Q1 Q2 Q3 Q4)
"RU" "Russia"         ""      ""      ""      ""     
"CO" "Colombia"       ""      ""      ""      ""     
"XM" "Euro area"      ""      ""      ""      ""     
"GB" "United Kingdom" "-0.3"  "1.5"   "1.7"   "3.3"  
"SG" "Singapore"      "6.7"   "6.1"   "3.4"   "2.5"  
"TH" "Thailand"       "4.4"   "-3.2"  "-12.9" "-21.8"
"IT" "Italy"          "0.3"   "3.3"   "3.7"   "6.2"  
"AU" "Australia"      "0.1"   "-0.7"  "0.4"   "2.1"  
"GR" "Greece"         "5.6"   "5.2"   "5.9"   "6.5"  
"NO" "Norway"         "6"     "6.6"   "3"     "1.2"  
"NZ" "New Zealand"    "4.2"   "3.2"   "3.8"   "5"    
"IN" "India"          "-0.5"  "-1.8"  "-1.5"  "-0.2"
"IE" "Ireland"        "17.5"  "24"    "25.4"  "29.6"
"HU" "Hungary"        ""      ""      ""      "7.7"  
"CN" "China"          "6.2"   "6.8"   "7.4"   "8.2"  
"SA" "Saudi Arabia"   ""      ""      ""      ""     
"CA" "Canada"         "0.4"   "0.1"   "-2"    "-5.5"
"CL" "Chile"          "20.9"  "24.3"  "24.5"  "21.5"
"ID" "Indonesia"      "-18.1" "-36.4" "-31.6" "-37.5"
"CH" "Switzerland"    "-9"    "-6.5"  "-5.1"  "-6.7"
"PT" "Portugal"       "23.1"  "27.2"  "28.4"  "30"   
"DE" "Germany"        "3.7"   "4.9"   "5.8"   "7.4"  
"PL" "Poland"         ""      ""      ""      ""     
"IL" "Israel"         ""      ""      ""      ""     
"AR" "Argentina"      "8.8"   "8.8"   "7.7"   "7.1"  
"HK" "Hong Kong SAR"  "-0.5"  "-7.2"  "-8.9"  "-12"  
"TR" "Turkey"         "1.6"   "1"     "0.9"   "0.6"  
"FR" "France"         "-5.5"  "-3.3"  "-2.5"  "-1.1"
"MY" "Malaysia"       "11"    "5.8"   "3.3"   "-5"   
"BR" "Brazil"         ""      ""      ""      ""     
"SE" "Sweden"         "-4"    "-4.5"  "-4.8"  "-1.3"
"NL" "Netherlands"    "6.1"   "6.3"   "7.7"   "8.1"  
"BE" "Belgium"        "14.9"  "15.7"  "17.4"  "17.8"
"CZ" "Czech Republic" ""      ""      ""      ""     
"ES" "Spain"          "12.1"  "17.4"  "17.5"  "19"   
"DK" "Denmark"        "0.9"   "2.1"   "1"     "2.4"  
"JP" "Japan"          "-24.3" "-25.7" "-25.5" "-22.2"
"AT" "Austria"        "-1.3"  "-0.6"  "2.3"   "4.2"  
"ZA" "South Africa"   "7.1"   "6.8"   "5.4"   "4.5"  
"LU" "Luxembourg"     ""      ""      ""      ""     
"KR" "Korea"          "10.4"  "1"     "-3.7"  "-12.8"
"FI" "Finland"        "-21.5" "-18.1" "-18.2" "-18.9"
"MX" "Mexico"         "-10"   "-10.3" "-12"   "-12.3"
"US" "United States"  "1.2"   "1.7"   "3"     "3.5"  
end
------------------ copy up to and including the previous line ------------------

Before reshaping the dataset from wide form to long form, I want to rename the variables beginning horizontally from Q1 till the last time period(say, in this case, Q4), i.e. renaming Q1 as "1999Q1",Q2 as "1999Q2",Q3 as "1999Q3" and so on.

But when I tried the following code,
rename Q1 1999Q1

Stata returned me an error message:

1 new variable name invalid
You attempted to rename Q1 to 1999Q1. That is an invalid
Stata variable name.
r(198);

end of do-file

r(198);


Does anyone have any ideas on how I could rename all of the time variables into the correct names in a faster way? What is wrong with my rename command?

Also, as an additional piece of information in my original dataset, the labels of the variables are right, i.e.label of Q1 is "1999-Q1", that of Q2 is "1999-Q2".

Thank you very much.

Many thanks
Keith


IV regression -- No Endogeneity Detected. Should I stick to OLS?

$
0
0
Dear Statalisters:

My question is regarding whether I should stick to OLS when no endogeneity is detected in my model.

I'm estimating the effect of some individual personality traits on individual performance. My N is above 300 and the OLS structure is as follows:

reg y x1 x2 x3 x4 x1x2 x1x3 x2x3 x1x2x3

where:
y -- dependent var (performance)
x1, x2, x3 -- explanatory vars (personality traits)
x4 -- vector of control vars
x1x2, x1x3, x2x3 -- 2way interaction terms
x1x2x3 -- 3way interaction term

Now, I'm told that x1 and x2 may be endogenous (there's not enough theoretical reason for that). I found 2 instrumental vars -- z1 and z2 (1 each for x1 and x2) -- and conducted IV regression by using Stata's own module and the extended version proposed by Baum, Schaffer, & Stillman.


1. Stata module:

ivregress 2sls y x4 (x1 x2 x1x2 x1x3 x2x3 x1x2x3 = z1 z2 z1z2 z1x3 z2x3 z1z2x3)
estat endog

The tests of endogeneity suggest that x1 and x2 are not endogenous.
----------------------------------------------------------------------------------------------------------
Durbin (score) chi2(6) = 1.40071 (p = 0.9658)
Wu-Hausman F(6,339) = .222554 (p = 0.9694)
----------------------------------------------------------------------------------------------------------

2. Baum et al. module:

ivreg2 y x4 (x1 x2 x1x2 x1x3 x2x3 x1x2x3 = z1 z2 z1z2 z1x3 z2x3 z1z2x3), endog (x1 x2)

Again, the tests suggest no endogeneity.

----------------------------------------------------------------------------------------------------------
Wu-Hausman F test: 0.22255 F(6,339) P-value = 0.96938
Durbin-Wu-Hausman chi-sq test: 1.40071 Chi-sq(6) P-value = 0.96582
----------------------------------------------------------------------------------------------------------

While the results of IV regressions are somewhat similar to those of OLS estimation, the 3-way interaction term is no more significant.

1. In this case, is it safe to say that the original OLS results are best estimates?
2. Should I inspect further for other issues? What other tests should I run?
3. Am I correct in instrumenting the interactions of exogenous and endogenous vars by using interactions of instruments and exogenous vars?

Since the equation is exactly identified (1 instrument per endogenous var), I do not get the Sargan stats.

Any help is appreciated.

Thanks in advance!

Cronbach's alpha and mixing categorical and continuous dependent variables in regression

$
0
0
Hi there,

I'm studying for my masters dissertation and being new to Stata I have a couple of questions I was hoping to get some help with. I've researched it myself but not found a helpful answer. I'm using Stata15 on a Mac.

Q1) I have a dependent variable of worry about crime, which for parsimoniousness I have taken as a simple average of the results of worry about different types of crime taken from a 7 point likert scale. To do this I put the following information into Stata:
Code:
egen WCMean=rowmean(WorryCrime_HomeBroken WorryCrime_Mugged WorryCrime_CarStolen
>WorryCrime_StolenFromCar WorryCrime_Rape WorryCrime_Attacked
>WorryCrime_AttackedEOrigin WorryCrime_Online WorryCrime_IdentityTheft)
However, I know that it's important to have a Cronbach's alpha score. To get this for the worry about crime variables listed above, I typed the following into Stata:

Code:
alpha WorryCrime_HomeBroken WorryCrime_Mugged WorryCrime_CarStolen WorryCrime_StolenFromCar
> WorryCrime_Rape WorryCrime_Attacked WorryCrime_AttackedEOrigin
> WorryCrime_Online WorryCrime_IdentityTheft
My question is - can I use this Cronbach's alpha score in reference to my WCMean variable? By this, I mean would it be correct to write something like this: 'In operationalising worry about crime, a new variable was generated to show the mean score of total worry about crime from all of the different crime indicators (apart from worry about terrorist attacks as this is used as the dependent variable). As the Cronbach's alpha score for WorryCrime_HomeBroken, WorryCrime_Mugged, WorryCrime_CarStolen, WorryCrime_StolenFromCar, WorryCrime_Rape, WorryCrime_Attacked, WorryCrime_AttackedEOrigin, WorryCrime_Online and WorryCrime_IdentityTheft shows a value of 0.8521, good internal consistency has been shown for the scales so utilising the mean score from these scales is acceptable.' ?

Q2) Using the following code, is it okay to mix both categorical, continuous and 'effectively' continuous (eg.WorryCrime_TerroristAttack is 1-7 likert scale) variables in multiple regression?

Code:
xi: regress WorryCrime_TerroristAttack Age i.Gender i.ethnicitydummy i.FeelIncome WhereLive
> meanFeelLocalArea PoliticalLeaning  meaninstitutionaltrust TimeMediaUse WCMean
> meanTerrorKnowledge meanlikelyterror i.VictimCrimeAny i.Victim_TerroristAttack
Thanks so much for your help!!

predicting a linear slope, an intercept, and a quadratic slope after a linear mixed model

$
0
0
Hello All,

I am estimating a linear slope, a quadratic slope and an intercept of one specific variable. I want to predict these variables because I want to use them as independent variables in another analysis.
Below is my code for the mixed model. In this code fit is my outcome, cycle is the linear slope, cycle2 is the quadratic slope, and my intercept is a random intercept by ID.

Code:
mixed fit cycle cycle2 ///
> || ID: cycle, covariance(uns) variance

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -14436.131  
Iteration 1:   log likelihood = -14436.131  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =      9,056
Group variable: ID                              Number of groups  =        935

                                                Obs per group:
                                                              min =          1
                                                              avg =        9.7
                                                              max =         15

                                                Wald chi2(2)      =     103.19
Log likelihood = -14436.131                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
         fit |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       cycle |  -.1170634   .0120247    -9.74   0.000    -.1406314   -.0934953
      cycle2 |   .0054118   .0007144     7.58   0.000     .0040116    .0068119
       _cons |   5.792452   .0551227   105.08   0.000     5.684413     5.90049
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
ID: Unstructured             |
                  var(cycle) |   .0134116    .001063      .0114819    .0156657
                  var(_cons) |   1.373924   .0900777      1.208248    1.562318
            cov(cycle,_cons) |  -.0655636   .0080507     -.0813426   -.0497845
-----------------------------+------------------------------------------------
               var(Residual) |   1.016058    .016739      .9837744    1.049402
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 4571.58               Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.
I tried the below code to predict the linear slope, the quadratic slope, and the intercept. But I get an error message (see below).

Code:
 predict fit1_slope fit_slope2 fit_int1, reffects
too many variables specified
you must specify 2 new variable(s)
r(103);

end of do-file

r(103);
Since predict refffects is limited to two variables, how can I predict everything I need? Did I misunderstand anything?

Best wishes and thank you for your time.
Patrick

Spatial Durbin using spmlreg and spwmatrix

$
0
0
I'd appreciate any help or suggestions about what I might be doing wrong or what might be causing the problem I'm encountering.

I can see the matrix "W' in the using directory, though my machine doesn't recognize the file extension. I have also tried this without specifying Mata in the weights matrix-generating portion and specifying wfrom(Stata) in the second portion.

cd [my working directory]
spwmatrix gecon latitude longitude, wtype(inv) wname(W) eignvar(eigenW) mataf replace


(At this point I merge the eignvar into my main dataset, I am certain this step proceeds correctly)

spmlreg [varlist], weights(W) wfrom(Mata) eignvar(eigenW) model(durbin) robust

the error I'm receiving is:
st_store(): 3200 conformability error
splagvar_lagmyvar(): - function returned error
<istmt>: - function returned error


One thing that I have noticed is that spwmatrix won't let me generate the weights matrix from within my main dataset. For some reason, it returns the error that "Spherical latitudes must be in [-90,90]", when the exact same operation will work in an identical version of the dataset with the rest of the variables dropped. This is why I've been merging the two together.

Thanks in advance!

Time series box plots with overlaid connected scatter plot of mean data

$
0
0
Hello,

I have some coral rugosity data for 7 locations measured in four years (2008, 2011, 2014, 2017).
I am hoping to present this as a series of box plots with the mean coral rugosity for each location overlaid as a connected scatter graph.

I have been successful in doing both parts of this separately, but I would love some helping combining them.

So far I have used the code // graph box praslinsw praslinne mahee anne mahew mahenw cousin , over(year) // to plot the annual data for the seven locations
the variables are the 7 locations

and the code // stripplot praslinsw, over(year) vertical box(bfcolor(gs14) barw(0.2)) ms(none) addplot (scatter mpraslinsw myear, connect(1)) // to create the type of diagram I would like for one location where I plot the mean data on top of the box plot for a single location.

However, as far as I am aware, I cannot use stripplot for multiple variables.

Is there a way / code that would enable me to achieve the output of the second code, but, with all 7 locations on one diagram such as in code one?

I have attached the data in csv format (apologies). Cols A-H are the annual data and cols I-P are the mean data I would like superimposed.

Many thanks in advance.

command vce(robust) cluster at which level

$
0
0
Dear statalist community,

very basic question:

If I do not specifically define at which level stata should cluster the standard errors, at which level does it cluster then?
So I used the vce(robust) robust command when I worked with panel data and I have data at the individual level.

Thank you in advance

Best

Lisa

rolling mgarch dcc with predict

$
0
0
Hi everyone,

I am trying to find the predicted residual and variance of rolling window with mgarch dcc model, the command I use is:

rolling , recursive window(1549) clear: mgarch dcc ( INDEX EQUITY =, noconstant) , arch(1) garch(1)

predict R*, residual
predict V*, variance

unfortunately Stata return ' last estimates not found ',please can anyone help, thank you.

Extracting elements from matrix

$
0
0
Hi,
Is there any way of extracting specific information from a matrix. For example:

Code:
sysuse auto, clear

qui reg price mpg weight
mat m =r(table)'

mat li m

m[3,9]
                 b          se           t      pvalue          ll          ul          df        crit       eform
   mpg  -49.512221   86.156039  -.57468079   .56732373  -221.30248   122.27804          71   1.9939434           0
weight   1.7465592   .64135379   2.7232382   .00812981   .46773602   3.0253823          71   1.9939434           0
 _cons   1946.0687   3597.0496   .54101802   .59018863  -5226.2445   9118.3819          71   1.9939434           0

mat myelement = m[1, 1 .. 3]
mat li myelement

myelement[1,3]
              b          se           t
mpg  -49.512221   86.156039  -.57468079
Until now this is fine. But is there any way to extract only 1st, 3rd and 5th elements of first row rather using a range?

Thanks.



Reshaping data with 12 time series

$
0
0
Hi,

When I tried the following command:
reshape long Jan2003 Feb2003 Mar2003 Apr2003 May2003 Jun2003 Jul2003 Aug2003 Sep2003 Oct2003 Nov2003 Dec2003 Jan2004 Feb2004 Mar2004 Apr2004 May2004 Jun2004 Jul2004 Aug2004 Sep2004 Oct2004 Nov2004 Dec2004 Jan2005 Feb2005 Mar2005 Apr2005 May2005 Jun2005 Jul2005 Aug2005 Sep2005 Oct2005 Nov2005 Dec2005 Jan2006 Feb2006 Mar2006 Apr2006 May2006 Jun2006 Jul2006 Aug2006 Sep2006 Oct2006 Nov2006 Dec2006 Jan2007 Feb2007 Mar2007 Apr2007 May2007 Jun2007 Jul2007 Aug2007 Sep2007 Oct2007 Nov2007 Dec2007 Jan2008 Feb2008 Mar2008 Apr2008 May2008 Jun2008 Jul2008 Aug2008 Sep2008 Oct2008 Nov2008 Dec2008 Jan2009 Feb2009 Mar2009 Apr2009 May2009 Jun2009 Jul2009 Aug2009 Sep2009 Oct2009 Nov2009 Dec2009 Jan2010 Feb2010 Mar2010 Apr2010 May2010 Jun2010 Jul2010 Aug2010 Sep2010 Oct2010 Nov2010 Dec2010 Jan2011 Feb2011 Mar2011 Apr2011 May2011 Jun2011 Jul2011 Aug2011 Sep2011 Oct2011 Nov2011 Dec2011 Jan2012 Feb2012 Mar2012 Apr2012 May2012 Jun2012 Jul2012 Aug2012 Sep2012 Oct2012 Nov2012 Dec2012 Jan2013 Feb2013 Mar2013 Apr2013 May2013 Jun2013 Jul2013 Aug2013 Sep2013 Oct2013 Nov2013 Dec2013 Jan2014 Feb2014 Mar2014 Apr2014 May2014 Jun2014 Jul2014 Aug2014 Sep2014 Oct2014 Nov2014 Dec2014 Jan2015 Feb2015 Mar2015 Apr2015 May2015 Jun2015 Jul2015 Aug2015 Sep2015 Oct2015 Nov2015 Dec2015 Jan2016 Feb2016 Mar2016 Apr2016 May2016 Jun2016 Jul2016 Aug2016 Sep2016 Oct2016 Nov2016 Dec2016 Jan2017 Feb2017 Mar2017 Apr2017 May2017 Jun2017 Jul2017 Aug2017 Sep2017 Oct2017 Nov2017 Dec2017,
i(CompanyID VariableID) j(Date)

Stata says it doesn't recognize a# and b#. Only if I ask Stata to:
reshape long Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec, i(CompanyID VariableID) j(Year)

Giving me this output:

Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 10550 -> 158250
Number of variables 198 -> 31
j variable (15 values) -> Year
xij variables:
Jan2003 Jan2004 ... Jan2017 -> Jan
Feb2003 Feb2004 ... Feb2017 -> Feb
Mar2003 Mar2004 ... Mar2017 -> Mar
Apr2003 Apr2004 ... Apr2017 -> Apr
May2003 May2004 ... May2017 -> May
Jun2003 Jun2004 ... Jun2017 -> Jun
Jul2003 Jul2004 ... Jul2017 -> Jul
Aug2003 Aug2004 ... Aug2017 -> Aug
Sep2003 Sep2004 ... Sep2017 -> Sep
Oct2003 Oct2004 ... Oct2017 -> Oct
Nov2003 Nov2004 ... Nov2017 -> Nov
Dec2003 Dec2004 ... Dec2017 -> Dec
-----------------------------------------------------------------------------


This, however, is not what i want since I need panel data for stata to work.
Could you please help me?

Kind regards,

Bob

Square graphs-- placement of axes

$
0
0
Hello,

I'm trying to make a square graph in Stata. When I use the aspectratio(1) option, the graph becomes square but the placement of the y-axis remains the same and so there's a gap between the plot region and the axis. How can I correct this? Also is there a way to change the aspect ratio of the graph region to 1 as well?

Code:
    #d ;
    twoway scatter supplyold_grid_end supply_grid_end || lfit supplyold_grid_end supply_grid_end 
    if !missing(supply_grid_end) & !missing(supplyold_grid_end),
    ti("Admin Supply Data vs. Survey Responses on Hours of Supply", size(med)) legend(off) 
    xtitle("Feeder Supply Data", size(small)) 
    ytitle("Survey Supply Response", size(small)) 
    plotregion(lcolor(black) lwidth(thin)) scheme(s2mono) 
    graphregion(color(white)) aspectratio(1) ;
    #d cr
Thanks,
Mihir
Array


Finding User Written Stata Command &quot;ddrd&quot;

$
0
0
Hi everyone,

while researching ways to combine difference-in-difference and regression discontinuity designs in Stata, I came ascross this talk from the 2016 Stata conference, mentioning a package called ddrd. However, I was unable to find it using -findit as well as via google. Is there another recommended way to search for packages, or could it be that the program simply is not available anymore?

If anybody knows somethin about it, I'd be very grateful for your help.
Thank you very much in advance.

Best,
Mark

Question about GLS RE modeling

$
0
0
Hi Anyone,

I am intending to estimate a GLS RE model (I have STATA 13.0 so I think I can either use mixed or xtreg, with the same results). I have within-person longitudinal data, 11 years of it, and my outcome (average student attendance rate in school) , and time-invariant control variables (like race, sex) as well as time-varying control vars (such as family poverty status). My main IV is the number of years that a student was exposed to the intervention, introduced in the middle of my 11-year time span. I'd like to see the linear relationship between years of implementation (pre and post) and the outcome, which is individuals' average attendance rate. Thus, I have centered the number_of_years_implementing variable on the year in which the intervention was introduced.

Does this approach seem correct? And if yes, is it also feasible to introduce fixed-effect dummies for each year? My final model command looks like this:

xtreg avg_daily_attend y0405 y0506 y0708 y0809 y0910 y1011 y1112 y1213 y1314 y1415 var1 var2 var3 var4 var5 years_into_intervention

where avg_daily_attend ranges between 1-100
where y0405...y1415 are the year dummy vars (with the start of the intervention year excluded)
where var1, var2, and var3 are time-invariant and
where var4 and var5 are time-varying, and
where years_into_intervention is my primary IV

I just wanted to get some input from someone with more experience doing this. No one around my office seems to be able to serve as my sounding board today!

Thanks in advance!

Jane
Viewing all 73243 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>