Quantcast
Channel: Statalist
Viewing all 72944 articles
Browse latest View live

New Package on SSC - asrol - module to calculate moving / rolling window statistics

$
0
0
Thanks to Kit Baum, I have shared a new package on SSC, to describe and install, type
:
ssc desc asrol
ssc install asrol
TITLE
'ASROL': module to generate rolling-window descriptive statistics in time series or panel data

DESCRIPTION

This program calculates descriptive statistics in a user's
defined rolling-window. For example, in time series or panel
data, a user might be interested in knowing standard deviation
or coefficient of variation of a variable in a rolling window of
last 4 years. asrol provides efficient and simple way of
finding such statistics. It also offers various options to
specify minimum number of observations required for
calculation of desired statistics in a rolling window.

KW: descriptive statistics
KW: rolling window

Requires: Stata version 11

Distribution-Date: 20150905

Author: Attaullah Shah, Institute of Management Sciences
Support: email attashah15@hotmail.com

Title

asrol - Generates rolling-window descriptive statistics in
time series or panel data


Syntax

asrol varlist [if] [in] , gen(newvar) stat(statistic) window(#) [ nomiss minmum(#)]


Syntax Details

The program has 3 required options: They are
1. gen : to generate new variable, where the variable name is enclosed in parenthesis after gen
2. stat: to specify required statistics. The following statistics are allowed;
:
        sd : for standard deviation
        mean : for mean
        total : for sum or total
        median : for median
        pctile  : for percentiles
        min  : for minimum
        max : for maximum
3. window :is to specify the length of rolling window for calculation of the
required statistics. The length of window should be less than or equal to the total number
of time-series observations per panel.


Other Options

1. nmiss

The option nomiss forces asrol to find required statistics
with all available observations, which results in no missing values at the
start of the panel. Compare results of Example 1 below where n is not used with the results of
the Example 3 where n is used.
In the example 1, asrol finds mean starting with the
fourth observation of each panel, i.e. the rolling
window does not start working unless it
reaches the required level of 4 observations.


2. minmum
The option minmum forces asrol to find required statistics
where the minimum number of observations are available. If a specific
rolling window does not have that many observations, values
of the new variable will be replaced with missing values.


Example 1: Find Rolling Mean

:
    . webuse grunfeld
 
    . asrol invest, stat(mean) win(4) gen(mean_4)
This command calculates mean for the variable invest using
a four years rolling window and stores the results in a new variable called
mean_4.


Example 2: Find Rolling Standard Deviation

:
    . webuse grunfeld
 
    . asrol invest, stat(sd) win(6) gen(sd_6)
 ​
This command calculates standard deviation for the
variable invest using a six years rolling window and stores the
results in a new variable called sd_6

Example 3: For Rolling Mean with no missing values

:
    . webuse grunfeld
 
    . asrol invest, stat(mean) win(4) gen(sd_4) nomiss
 ​
This command calculates mean for the variable invest using
a four years rolling window and stores the results in a new variable
called mean_4. The nomiss option forces asrol to find mean with
all available observation, which results in no missing
values at the start of the panel. Compare results where
nomiss is not used in example 1 above.
In the example 1, asrol finds mean starting with the
fourth observation of each panel, i.e.
the rolling window does not start working unless it
reaches the required level of 4 observations.

Example 4: Rolling mean with minimum number of observaton

:
    . webuse grunfeld
 
    . asrol invest, stat(mean) win(4) gen(mean_4) min(3)
​
Example 5: Rolling mean with minimum number of observation including the start of the panel

:
    . webuse grunfeld
 
    . asrol invest, stat(mean) win(4) gen(mean_4) min(3) nomiss

 ​
This command forces asrol to calculate mean for the variable
invest using a four years rolling window and stores the results in a new variable
called mean_4. The n option and the min(3) force asrol to find mean with at least 3
available observations even at the start of each panel i.e. asrol will not wait
until 4 observations are are available, it will start calculation when at least
three observations are available.




Melogit, where is my log likelyhood scalar?

$
0
0
Hi, everybody.
I'm running a series of multilevel models using melogit in Stata 14. I’m collecting various statistics as I loop through the models and I can't find the null model log likelihood saved anywhere. According to the help files there should be a comparison model saved in e(ll_c) but that estimate does not exist.
That null model log likelihood is stored somewhere, if someone could tell me where and how I can dig it out that would be most helpful. Right now I see three possible solutions:

1 Find where the null model log likelihood is stored after melogit.
2 Estimate a separate null model.
3 Somehow store the log likelihood displayed during iteration 0.

Thanks for you time
Max

No attachments shown in "Recent Activity" view of a Topic.

$
0
0
In http://www.statalist.org/forums/foru.../1308755-tango, I complained that a poster had shown no links or references and was told that she had. I found that the issue was the view I was using to view the Topic. Here's a repeat of what I wrote there:
1) If I start from a list of Topics http://www.statalist.org/forums/foru...ussion/general

and click on the TANGO topic from there, I get the "Latest Activity" view with URL:

http://www.statalist.org/forums/foru.../1308755-tango

With this URL, I can see all posts, but can't see attachments in any browser; nor can I see photos, if any, and other poster information. Posts are listed from most recent to earliest, except for original, which is at the topic

2) If I go to the top of the topic in that view and click the POSTS tab, I get the URL

http://www.statalist.org/forums/foru...go?view=thread

​Notice the "?view-thread" at the end. In this view, I do see attachments, with poster information and photos and posts are listed oldest to most recent.

Steve




MDS of variables not obs

$
0
0
In the literature I'm addressing it is typical to do both a confirmatory factor analysis and a multidimensional scaling on observed vars to examine latent vars/ patterns in observed vars. I have 25 items (personal values rated on a 7 point scale) that theory says should constitute 7 latent vars and N=~500. Apologies if this is in the manual and I missed it. But how do I get an MDS that shows the relationships of the vars to each other, rather than showing me the relationship of the cases? (The literature typically uses smallest space analysis but I think MDS will suffice.)
Tom D

Multiple imputation of multilevel data

$
0
0
Hi everyone, I am working on a multilevel data which students are nested within schools. Both individual level and school level have missing values. My general question is how to impute variables with missing values at different level? Thank you so much!

HELP!!!!!!! Bounds test

$
0
0
Please help on how to perfom a bounds test in stata. Its command(s) and how to interpret it.

generating a new variable

$
0
0
Dear all

I am trying to create a new variable using generate, the thing is that the expression value is different according to the sex variable, means

the expression for man is
TFG = [(140-age) x (weight] / 72 x (Cr)

the expression for woman is
TFG (mujer) = 0,85 x [(140-age) x (weight)] / 72 x (Cr).

When I apply generate for man, I am getting the correct results with 62 missing (woman), My doubt is how to put together both results in the same variable


:
. dataex  weight Cr Age Sex

clear
input double weight int Cr float Age long Sex
82 88 47 0 
81.5 85 66 0 
99 84 66 1 
71.8 121 78 1 
61 60 73 0 
77 122 79 1 
98.3 104 45 1 
82.1 67 66 1 
84 92 74 1 
69 74 70 0 
78.4 144 77 1 
105 100 50 1 
88 92 69 1 
72 74 71 1 
86 181 68 0 
99.6 79 63 1 
57.2 65 53 0 
90 59 31 1 
71 75 59 0 
84 73 65 1 
98 92 46 1 
79 93 78 1 
64.8 100 47 1 
80 95 74 1 
69 107 81 0 
96.5 116 71 1 
93 88 51 1 
80 129 82 0 
113 95 50 1 
59 59 70 0 
63 56 81 0 
94 83 49 1 
66 77 59 1 
80 65 82 0 
90 91 61 1 
77 66 64 1 
84 86 52 1 
71 89 65 0 
89 83 72 1 
65 76 69 1 
81 77 66 1 
97 100 71 1 
2 72 61 1 
80 84 66 0 
88 100 76 1 
74 143 81 1 
117 75 34 1 
85.5 112 78 1 
75 81 70 1 
59 105 66 1 
82 155 81 1 
83 83 77 1 
61.4 109 74 0 
53 79 53 1 
64 66 61 0 
67.5 74 83 0 
79.5 98 55 1 
84.5 82 49 1 
53.4 69 64 0 
64 69 87 1 
44.3 87 40 0 
74.2 141 57 1 
93 75 77 1 
83.2 93 67 1 
62 106 73 1 
72 99 69 1 
77 156 67 0 
63 65 73 0 
70 109 74 1 
95 88 41 1 
65 59 43 0 
70 65 52 0 
84.6 71 59 1 
85.5 117 67 1 
72.2 83 73 1 
70.2 98 86 0 
73 162 79 1 
63.8 73 64 0 
66 81 73 0 
69 73 79 1 
72.6 72 52 1 
70 87 67 1 
65 88 82 1 
69 93 77 1 
89 70 67 1 
56 83 76 0 
79 76 68 1 
77 72 66 0 
75.5 98 62 0 
54 74 58 1 
end
label def dSexo 0 "man", modify
label def dSexo 1 "woman", modify
label values Sex dSexo

Difference GLS RE and OLS with clustered SEs

$
0
0
Dear all,

I want to explain the difference between the GLS RE estimator, using
:
xtreg, re
and OLS regression with clustered standard errors, using
:
reg, vce(cluster clustervar)
I found an old post from 2003 in which Mark Schaffer said the following
With -xtreg- you say that pupils within schools give you observations that are not independent, and you model this explicitly as your "random effect" or "fixed effect". There are some precise distributional assumption that you are making about the correlation of pupils within schools (e.g., the within-school correlation takes the same form for all schools, that each pupil within a school is correlated equally with any other pupil in the school). With -regress- and -cluster- you don't model this explicitly. Instead, you allow for arbitrary correlation within schools, and the form of this correlation can vary from school to school.
This seems to me like a very good explanation. But I was wondering, which assumption in the GLS model states that the "correlation takes the same form for all schools".

Can anybody help me with this?
Thanks!

Household survey data

$
0
0
I have a household survey data I want to analyze and summarize the information by crop type (crop area, labor, input...) grown on the farm. The raw data are organized by household id, plot number, crop type and all other farm characteristics the same way (by plot). All the variables are numbered from 1 to n. I would like to summarize the household information by household and crop type (horizontally) to capture the contribution of each crop on the farm to the family economics. I am trying to use the "egen" command but it does not seem to give me the expected answer. Does any one have a suggestion on how to proceed. I appreciate.

xtabond2 with lagged and lead variable

$
0
0
Hey everyone,

I estimate the effect of an advertising ban on tobacco consumption using a rational addiction model that includes a lagged and a lead variable of tobacco consumption.

I would like to use the xtabond2 command by Rodman to estimate the following equation, but I am not sure how it works with the syntax when a lead variable is included:


xtabond2 logC L.logC D.logcons logP logY logU lim com $t ........

where logC is the current consumption
L.logC is the prior consumption
and D.logcons the future consumption
logP is the price
logY is the income
logU is the unemployment rate
lim and com are the dummies for the strength of an advertising ban (weak, limited (lim) and comprehensive (com) )
and $t are the time dummies.


Could anyone please help me here with the syntax, when the current consumption is dependent on past and current consumption, treating all other variables as exogeneous.

Thanks a lot
Louisa

A problem with runmplus_load_savedata.

$
0
0
Dear Statalist,

I am trying to use the runmplus to generate a confirmatory factor analysis and encountered an error with runmplus_load_savedata which I did not understand at all.

The codes ran perfectly. But after some respondents were dropped, an error occurred after runmplus_load_savedata.
The codes remain the same; the only change was that I deleted some respondents. The variables that I used to drop those respondents were not involved in runmplus codes.

:
. use temp1, clear 

. qui runmplus s007   d018 d022 d057 d059 , idvariable(s007)  ///
>   categorical(d018 d022 d057 d059) /// 
>   ANALYSIS(TYPE = general; ) model(f1 by d018 d022 d057 d059) ///
>   savedata(save=fscores; file=c:\trash\trash.dat) savelogfile(c:\trash\trash)

. runmplus_load_savedata , out(c:/trash/trash.out) clear
The case in which some respondents were dropped

:
. use temp1, clear 

. keep if s003==50  | x048==356003 | x048==356004 | x048==356008 | x048==356011 | ///
>  x048==356017 
(1303 observations deleted)

. qui runmplus s007   d018 d022 d057 d059 , idvariable(s007)  ///
>   categorical(d018 d022 d057 d059) /// 
>   ANALYSIS(TYPE = general; ) model(f1 by d018 d022 d057 d059) ///
>   savedata(save=fscores; file=c:\trash\trash.dat) savelogfile(c:\trash\trash)

. runmplus_load_savedata , out(c:/trash/trash.out) clear
invalid syntax
r(198);

end of do-file
I have read the help files many times and still have not got a clue.
I appreciate any suggestions on why the error happened and how to fix it.

PS: I am using Stata 13 MP, and the package runmplus is up to date.

Merge A into B by ID and Year keep +/- 3 years around merged year

$
0
0
Dear statlisters,

Below I display my two datasets (A , B) to be merged. I need to merge B into A by gvkey and fyear of B , notice B has only 1 fyear 1978. I want to keep pre and post 3 years’ worth of data of 1978 in the A data set after merging. B contains a subset of firms in A that experienced a shock in 1978, I need to extract pre post 1978 several years of data (e.g. 3).

Question 1: How shall I code that ?
Question 2: is there a way to build in the flexibility to change 3 years to 2 years (or other number for sensitivity test).

data A

clear
input int(gvkey fyear) double xvar
1001 1975 2
1001 1976 5
1001 1977 4
1001 1978 1
1002 1975 2
1002 1976 7
1002 1977 2
1002 1978 2
1003 1975 3
1003 1976 4
1003 1977 1
1003 1978 3
1004 1975 6
1004 1976 5
1004 1977 3
1004 1978 .8
1005 1976 2
1005 1977 3
1005 1978 1.5
1006 1976 3
1006 1977 1
1006 1978 2.3
end

data B

clear
input int(gvkey fyear) byte treated
1001 1978 1
1002 1978 1
1003 1978 1
1004 1978 0
1005 1978 0
1006 1978 0
end

Regards,
Rochelle

Using external Instruments in xtabond2 GMM (Arellano Bond FD GMM and System GMM)

$
0
0
Hello,

I have a cross country panel data set (attached). I am running Arellano Bond GMM using xytabond2. I am regressing deficit on its lag, fiscal rule strength (fr) and controls VFI and fed. The code is as follows:

:
tsset id year, yearly
xi: xtabond2 deficit l.deficit fr VFI fed , gmm(deficit fr VFI fed, lag(2 6)) robust noleveleq
In the code above, past lags from t-2 to t-6 of all variables are used as internal instruments. The variable of interest is fr. However, it is probably endogenous so I want to instrument for it using two political variables pol1 and pol2 that I want to use as external instruments. How can I use these variables to instrument for fr using xtabond2?


Thank you,
Josh

time dummy interpretation

$
0
0
HI Statalisters, i have regressed the commmand below, where the une_rt_a is the unemployment rate. please how can i interpret the time dummies in this model. Does this make sense? i.year is the dummy variable i created for the year variables.


:
	
	
	

:
 xtreg une_rt_a ubdur replacementrate uegen unionden emp_protn lmp_exp tax_wedge cpi i.year, re

Random-effects GLS regression                   Number of obs      =       107
Group variable: id                              Number of groups   =        10

R-sq:  within  = 0.7466                         Obs per group: min =         7
       between = 0.8479                                        avg =      10.7
       overall = 0.8022                                        max =        12

                                                Wald chi2(19)      =    352.76
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

---------------------------------------------------------------------------------
       une_rt_a |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
          ubdur |   .0108012   .0013345     8.09   0.000     .0081856    .0134169[
replacementrate |  -1.068234   2.236778    -0.48   0.633    -5.452238     3.31577
          uegen |  -1.356531   .1785141    -7.60   0.000    -1.706413    -1.00665
       unionden |  -.2176253   .0182453   -11.93   0.000    -.2533855   -.1818651
      emp_protn |   .3533558   .3469191     1.02   0.308    -.3265931    1.033305
        lmp_exp |   4.312082   .4348374     9.92   0.000     3.459817    5.164348
      tax_wedge |  -.1908858   .0251507    -7.59   0.000    -.2401802   -.1415914
            cpi |  -.4762973   .0781887    -6.09   0.000    -.6295444   -.3230503
                |
           year |
          2001  |   .6375319   .8291291     0.77   0.442    -.9875312    2.262595
          2002  |   1.922518   .8769189     2.19   0.028     .2037888    3.641248
          2003  |   2.958694   .9392257     3.15   0.002     1.117845    4.799542
          2004  |   4.191314   .9592599     4.37   0.000     2.311199    6.071429
          2005  |   5.354811    1.07438     4.98   0.000     3.249064    7.460557
          2006  |    6.73956   1.236802     5.45   0.000     4.315473    9.163647
          2007  |   7.969629    1.41202     5.64   0.000     5.202122    10.73714
          2008  |   9.174149    1.61597     5.68   0.000     6.006905    12.34139
          2009  |   9.051898   1.522662     5.94   0.000     6.067535    12.03626
          2010  |   10.63673   1.644455     6.47   0.000      7.41366    13.85981
          2011  |   12.52523   1.880512     6.66   0.000     8.839492    16.21097
                |
          _cons |   62.20993   6.854047     9.08   0.000     48.77624    75.64361
----------------+----------------------------------------------------------------
        sigma_u |          0
        sigma_e |  1.0400054
            rho |          0   (fraction of variance due to u_i)
---------------------------------------------------------------------------------
Thank you for your help,

Lamie

bar chart for categorical varibles

$
0
0
I,m using STATA 12
I successfully used the catplot command to produce a bar chart that compares the percentage of overweight/obese in males & females using the following:

catplot gender obese, percent (gender) asyvars recast(bar)

which yielded the following graph


however, I couldn't figure a way to show the total. I want the graph to look like the attached graph (with males, females & totals)


any hints on how to do that?

Wafa

bar chart for categorical variables

$
0
0
I,m using STATA 12

I successfully used the catplot command to compare the percentage of patients with obesity in females & males ( graph is attached)
the command was:
catplot gender obese, percent (gender) asyvars recast (bar)

I want the graph to include the percentage in total ( male, female, total). I've also included in the attachment the way I want the graph to look.

Any hints on how to do that?

Bar chrat for categorical variables

$
0
0
Hello,

I'm using STATA 12

I successfully used the catplot command to compare the percentage of patients with obesity in females & males ( graph is attached)
the command was:
catplot gender obese, percent (gender) asyvars recast (bar)

I want the graph to include the percentage in total ( male, female, total). I've also included in the attachment the way I want the graph to look.

Any hints on how to do that?

Unbalanced panel data

$
0
0
Dear Stata users,

I am facing the following issue about panel data: I have a dataset which is for the period 2005-2014. I would like to see the effect of different variables on the cumulative return of the companies. However, some companies are with unique observations (one observation per company for the whole period) and others have one observation for almost every year (e.g. 5 observations for the 10 years per company).
Can you please tell me if there is a way in which I can still use all observations and perform a regression? At the moment, the dataset is unbalanced due to the fact that there is a different number of observations per company and I would be very grateful if you can give me some advice on how this issue can be fixed and all observations can be used without losing information.

Thank you!

Calculating RMSE for GLM models (-glmcorr- command)

$
0
0
Hi,

I'm running various different regression models, and using RMSE to compare between them. I am aware that the -glmcorr- command has been developed to produce this. However, the -glmcorr- command only gives the figure to 3 decimal places, ideally I need at least 5 decimal places to compare between models. Is there a way to get the -glmcorr- command to produce more decimal places, or another way to get the RMSE for a GLM regression?

Thanks,

Ash

ARIMA post estimation serial correlation and arch effects

$
0
0
Hi Guys,

Can anyone help me? I am trying to select the best fitting ARIMA (p,d,q) model using the AIC, BIC whether the residuals are white noise and whether there are ARCH effects.

I am trying to run the code:

arima y, arima (p,d,q)
estat ic
estat bgodfrey
estat archlm

and i keep getting the error message invalid subcommand

apparently the developers never prepared for the possibility than an individual fitting an arima model would want to make sure the residuals are white noise or that the individual would try to go on and fit a GARCH model.

does anyone know a work around?

Cheers
D
Viewing all 72944 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>