Quantcast
Channel: Statalist
Viewing all 72797 articles
Browse latest View live

Using putexcel to output titles in multiple columns on sample statistics

$
0
0
I'm attempting to generate ~40 sets of sample statistics, and as such would like to have Stata output the name of each series into excel. My current code looks like this:

Code:
//INPUTS
local Peer1 PEER1_PV_PS PEER1_PV_PE PEER1_PV_EVEBITDA PEER1_PV_EVEBIT
local ncol = 2
local nrow = 2

//LOOP
local col1: word `ncol' of `c(ALPHA)'
local ++ncol
foreach var in `Peer1'{
local name1 `var'
summarize `var', detail separator(0)
local col: word `ncol' of `c(ALPHA)'
putexcel `col'`nrow'=(`name1') `col1'`nrow'=rscalarnames `col'`nrow'=rscalars using"V:\SummaryStats.xlsx", modify
local ++ncol
}
Where each of PEER1_PV_PS PEER1_PV_PE PEER1_PV_EVEBITDA PEER1_PV_EVEBIT are series (of which there are at ~40 of), and the sections in green are those which I'm having issues with.

The current output looks like this: Array



The output I want looks like this: Array



Thanks in advance!

Interpretation of R square in Fixed effect model

$
0
0
Respected Members,
i am using stata to conduct fixed effect model for my regression analysis. here i have R square results in three different sections (within, between or overall). how to interpret these results and also kindly guide me which R square (within, between or overall) should i report in my thesis for my interpretation purpose of R square. in my case the R square result is given below;

R-sq: within = 0.1577
between = 0.2765
overall = 0.2203
Thanks and Regards

how to loop over all levels of an int variable

$
0
0
Hi everyone,

I need to loop over all levels of an int variable. I followed this link http://www.stata.com/support/faqs/da...-with-foreach/ and wrote a test loop to simply display the value of each level:

Code:
levelsof varname if (varname > 0), local(levels)
foreach l of local levels{
    di `l'
}
and the results are number of levels of empty lines. Can someone help me understand why the result looks like that?

Also, eventually, I'm trying to do something like this:
Code:
local some_local varlist
local some_local_2 varlist
levelsof varname if (varname > 0), local(levels)
foreach l of local levels{
    eststo: areg y_var x_var `some_local' `some_local_2' if (some_var == 1 & varname == `l'), absorb(some_var) cluster(some_var)
    sum y if e(sample)
    estadd scalar ymean = r(mean)
}
And I keep getting error r(198): invalid syntax. Could someone please help me with this? I couldn't figure out what went wrong. I'm also open to other better ways than doing this loop.

Thank you very much for your help in advance!
Youran

ATT or OLS

$
0
0
What is better ATT or OLS with third-variable control?

ATT or OLS

$
0
0
What is better ATT or OLS with third-variable control? How can i build OLS with third-variable control? Thank you

interpretation of margins (AME) after mlogit

$
0
0
Dear Statalisters,
i have a question about the correct interpretation of margins (AME) after mlogit.
This is a samplecode and the the results of the samplecode.

use http://www.stata-press.com/data/r13/sysdsn1, clear
mlogit insure i.nonwhite age i.site
est sto m
forval i = 1/3 {
est res m
margins, dydx(*) pr(out(`i')) post
est sto m`i'
}
esttab m1 m2 m3 , wide cells(b(star fmt(2)) se(par fmt(2))) star(* 0.05 ) label legend varw(30) nonumbers mtitles("Indemnity" " Prepaid" " Uninsure")

the result are:
Indemnity Prepaid Uninsure
b/se b/se b/se
nonwhite=0 0,00 0,00 0,00
(.) (.) (.)
nonwhite=1 -0,19* 0,21* -0,02
(0,05) (0,05) (0,02)
NEMC (ISCNRD-IBIRTHD)/365.25 0,00227 -0,00215 -0,000118
(0,00) (0,00) (0,00)
site=1 0,00 0,00 0,00
(.) (.) (.)
site=2 0,02 0,05 -0,07*
(0,05) (0,05) (0,02)
site=3 0,13* -0,14* 0,00
(0,05) (0,05) (0,03)
Observations 615 615 615
* p<0.05

Are these interpretations correct?
1: The probability of Indemnity is on average about 19 percentage lower for nonwhite than for white.
2: The probability of Prepaid is on average about 21 percentage higher for nonwhite than for white.
3: The variable Nonwhite has no influence of the probability of Uninsure.
4: The probability of Indemnity is on average about 13 percentage higher for site_3 than for site_1.
5. The probability of Indemnity increase and the probability of Prepaid decrease if the NEMC increase by one unit.

Thanks for Help,
Jörg

signtest after teffects psmatch

$
0
0
I am trying to run signtest and signrank following teffects psmatch;

global treatment license08
global ylist ln_price
global xlist ln_dist_pop10k ln_dist_pop50k ln_dist_pop1k ln_dist_coast uv02_density urban_rural

teffects psmatch ($ylist) ($treatment $xlist), atet generate(matcha).

In the example given on stata syntax, they say "We create two variables called mpg1 and mpg2, representing mileage without and with the treatment,
respectively" and then run

signrank mpg1=mpg2
signtest mpg1=mpg2

I cannot think how to create variables for the price with and without treatment, which are equivalent to the mpg1 and mpg2 they have created.

Im am very confused any thoughts would be GREATLY appreciated!!

residual saving

$
0
0
Dear stata users,

I have daily panel data for 225 stocks over 3 years and have trouble doing the following regression:

As a measure for volatility I want to regress each separate log return (logr) on the 12 lags of the respective log returns together with its day-of-the-week dummy variable, in order to save the absolute values of the residuals afterwards. I created seperate variables for the 12 lags (logr_1; logr_2 etc.) and dummies for each trading day of the week (monday, tuesday etc.)

Below is an illustrative example of my data set with 2 lags. Basically I want to generate a new variable "resid" that captures the residual value of the described regressions, but I don't know how I should implement such looped regressions since I'm new to stata.


id date logr logr_1 logr_2 monday tuesday
1 19775 . . . 0 0
1 19778 .79372 . . 1 0
1 19779 -.5237573 .79372 . 0 1
1 19780 2.749274 -.5237573 .79372 0 0
1 19781 -.6819873 2.749274 -.5237573 0 0
1 19782 .16978712 -.6819873 2.749274 0 0
2 19775 . . . 0 0
2 19778 .79372 . . 1 0
219779 -.5237573 .79372 . 0 1
2 19780 2.749274 -.5237573 .79372 0 0
2 19781 -.6819873 2.749274 -.5237573 0 0
2 19782 .16978712 -.6819873 2.749274 0 0

Kind regards,

Gianni



how to display factors matrix in principal componenent factors analysis?

$
0
0
I am doing a principal factor analysis.
I have 215 obeservations
My code is:

factor [varlist], pcf
rotate

I need to diplay/export the 2x215 factors Matrix for interpretation porpuses.
Is there a way to do it?

Thanks

Confidence intervals for the means of the predicted probabilities for different groups following a binary logistic regression

$
0
0
How does one compute confidence intervals for the means of the predicted probabilities for different groups following a binary logistic regression? The groups are not covariates in the model.
I have been recommended to use:

Code:
Margins, over(mygroupvariable)
My data is data for all relevant individuals in 2010-2015. However, I only estimate my model on data from 2015. Afterwards, I predict the individual probabilities for all individuals in the period 2010-2015 using the coefficients from the model estimated on 2015 data.

I have looked at -margins, over(region)-, and it seems as a sound approach. However, I have some questions regarding this approach:
  1. Is it possible to compare the mean of the predicted probabilities between two groups in the same year? (i.e. is the difference in the means of the predicted probabilities statistically significant). Is it also possible to adjust for multiple comparisons if I want to compare more than two groups at the same time? (e.g. bonferroni). I know I could do these things if I had included the groups as covariates. But I do not wish to include group fixed effects in my model.
  2. Should I use the default delta method with respects to the standard errors or should I use the vce(unconditional) method? And what kind of statistically uncertainty does the default delta method handle? In principle, I have data from all individuals in a country in each year, but one could also view it as a sample from a super population. And the observations from years different from the newest year have not been part of the estimation sample.
  3. Do the confidence intervals take account of whether the individuals in a given group have a relatively low, medium or high probability of the outcome? I have read that the statistically uncertainty is higher for an individual if his values on the covariates are very different from the mean values on the covariates. Individuals with atypical values would probably differ in the predicted probability from other individuals with more typical values.
  4. Do I risk out of bounds confidence intervals using margins? I.e. do I risk a lower endpoint under 0 or an upper endpoint over 1?

State contigent stochastic frontier

$
0
0
I hope you are all well

I have panel data on agricultural production in Kenya and would like to analyze it through the state contigent stochastic frontier. How can i run the model in stata?

Calculating the Herfindahl-Hirschman Index in Stata

$
0
0
Hello,

I have a dataset on which I want to calculate the HHI, but old posts on this topic don't make me much wiser, so maybe you can help me.
I have patient level declaration data for 1 year where, among other things, I have variables for:
- the "provider" of health services
- the "insurer" that reimbursed
- the "amount" of money that was reimbursed
- "region" where the provider operates
The outcome variable is a transformation of the reimbursed price.
I want to calculate the HHI of the providers for the different regions. Can someone help me how to do this best?

Kind regards,
Chiara

How to deal with mutiple modes

$
0
0
Hello,

I have a large patient level declaration dataset with a lot of different variabeles. Among other things, the type of treatment, the provider that delivered the treatment, the insurer that reimbursed the treatment, the amount of money that was reimbursed, region, time spent treating etc etc. I calculated the mode for the amount of money that was reimbursed for different 'treatment' types by 'provider' & 'insurer' that reimbursed the treatment, using the following command:

by treatment provider insurer, sort: egen mode_price = mode(reimbursed amount)

But I get the following message from Stata:
Warning: at least one group contains all missing values or contains multiple modes. Generating missing values for the mode of these groups. Use the missing, maxmode, minmode, or nummode() options to control this behavior.

Can anyone tell me how I can see which groups have multiple modes? And also how I can fix this?

Kind regards,
Chiara

New command -colorscatter- available from SSC

$
0
0
Hi all,

Thanks to Kit Baum, a new programme -colorscatter- is available for download from SSC. -colorscatter- draws (twoway) scatterplots allowing to vary the marker color by a third varaible.

The package can be installed using:
Code:
ssc install colorscatter
The program works by merging many normal twoway scatter plots with varying parameters.

The following code is an example application using all available options:

Code:
set obs 1000
gen x = rnormal()
gen y = rnormal()
gen c = min(abs(x),abs(y))

colorscatter x y c,
  scatter_options(msymb(Oh))    /// This is passed to twoway scatter to draw circles as markers
  rgb_low(255 0 0) rgb_high(0 255 0) /// This specifies the colors for low and for high values of c
  cmax(1.5) cmin(0.5) /// This specifies the lowest and highest values for the color  gradient. lower and higher values of c wil all yield the same color.
  keeplegend               /// By default colorscatter creates a custom legend, if the users want to specify their own legend this needs to be specified
  legend(order(2 "c = lowest " 150 "c = highest") pos(2) col(1)) /// This draws a new legend
  title("Twowaytitle")       /// Any option which colorscatter does not know is simply passed on to twoway. Hence twoway options can be specified as usual
Array

Heterogeneity in pooled IVPOIS

$
0
0
Hello respected members,

My name is Fabiha Bushra.
I am using Stata 13.0 to estimate the following command:

bs, reps(100) cluster(panelvar): ivpois y1 x1 x2 x3, endog(y2) exog(z1 z2)

a) Where y1, the dependent variable, is the number of terrorist attacks from the Global Terrorism Database (GTD) which is why an estimation method for count data seemed appropriate (hence IVPOIS). My dependent variable also has zeroes and is overdispersed, but from what I've read here http://www.statalist.org/forums/foru...ative-binomial , IVPOIS can be applied to any exponential model even in the presence of over-dispersion of data (Please let me know if I have misread anything).
b) panelvar is the panel identifier.
c) I should also mention that the endogenous regressor, y2, and both the instruments used z1 and z2 are all continuous variables.
d) The x's are controls

My queries are as follows:

1) How do I account for heterogeneity in this model?
2) Wooldridge (2013) provided a way to use Correlated Random Effects (CRE) for pooled Probit. Can the CRE method be applied to pooled IVPOIS?

I will gladly provide further details if required. Looking forward to your response.

Kind Regards,
Fabiha


References:

Wooldridge, J. M. (2013). Correlated Random Effects Panel Data Models. IZA Summer School in Labor Economics (http://www. iza. org/conference_files/SUMS_2013/viewProgram.

Generate specific dates as a dummy (dummy for trading days around holidays)

$
0
0
Hello Statalist,

I have panel data for 989 firms between 1 Jan 1995 until 31Dec 2014. I use business calendar date.
The data as below:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float idc str10 dat double(_marketvalue _numbershares _price _returnindex _tradingvalue _tradingvolume _unpaddedprice) str6 company float(caldate bcaldate dow mondaydum)
87 "12_8_2003"               27.01 103900 .26              31.35 .  .   . "T:BEGR" 16047 2331 5 0
87 "12_9_2003"               27.01 103900 .26              31.35 .  .   . "T:BEGR" 16048 2332 6 0
87 "12_10_2003"              27.01 103900 .26              31.35 .  .   . "T:BEGR" 16049 2333 0 0
87 "12_11_2003"              27.01 103900 .26              31.35 .  .   . "T:BEGR" 16050 2334 1 0
87 "12_12_2003"              27.01 103900 .26              31.35 .  .   . "T:BEGR" 16051 2335 2 0
87 ""                            .      .   .                  . .  .   . ""       16052    . . 0
87 ""                            .      .   .                  . .  .   . ""       16053    . . 0
87 "12_15_2003"              27.01 103900 .26              31.35 .  .   . "T:BEGR" 16054 2336 3 0
87 "12_16_2003"              24.42 103900 .23              28.34 .  4 .23 "T:BEGR" 16055 2337 4 0
87 "12_17_2003"              24.42 103900 .23              28.34 .  .   . "T:BEGR" 16056 2338 5 0
87 "12_18_2003"              22.86 103900 .22              26.53 . 35 .22 "T:BEGR" 16057 2339 6 0
87 "12_19_2003"              22.86 103900 .22              26.53 .  .   . "T:BEGR" 16058 2340 0 0
87 ""                            .      .   .                  . .  .   . ""       16059    . . 0
87 ""                            .      .   .                  . .  .   . ""       16060    . . 0
87 "12_22_2003"              22.86 103900 .22              26.53 .  .   . "T:BEGR" 16061 2341 1 0
87 "12_23_2003"              22.86 103900 .22              26.53 .  .   . "T:BEGR" 16062 2342 2 0
87 "12_24_2003"              22.86 103900 .22              26.53 .  .   . "T:BEGR" 16063 2343 3 0
87 "12_25_2003"              22.86 103900 .22              26.53 .  .   . "T:BEGR" 16064 2344 4 0
87 "12_26_2003"              22.86 103900 .22              26.53 .  .   . "T:BEGR" 16065 2345 5 0
87 ""                            .      .   .                  . .  .   . ""       16066    . . 0
87 ""                            .      .   .                  . .  .   . ""       16067    . . 0
87 "12_29_2003"              22.86 103900 .22              26.53 .  .   . "T:BEGR" 16068 2346 6 0
87 "12_30_2003" 23.900000000000002 103900 .23 27.740000000000002 .  2 .23 "T:BEGR" 16069 2347 0 0
87 "12_31_2003" 23.900000000000002 103900 .23 27.740000000000002 .  .   . "T:BEGR" 16070 2348 1 0
87 "_1_1_2004"  23.900000000000002 103900 .23 27.740000000000002 .  .   . "T:BEGR" 16071 2349 2 0
end
format %td caldate
format %tbmybcal bcaldate
In my regression, I have dummies for day-of-week and holidays for trading around non-weekend holidays.
In order to generate dummy for day-of-week, I use this code;

HTML Code:
 gen dow = dow(bcaldate)
HTML Code:
gen mondaydum=1 if bcaldate ==1
HTML Code:
replace mondaydum=0 if mondaydum==.

For the dummy trading around non-weekend holidays, I would like to extract 25dec (Christmas) and 1jan (new year) for every year as a dummy.
I try this code, but I got error.

HTML Code:
gen holiday=1 if bcaldate 25dec* and (-1) if dow (sa)
invalid '25dec' 
r(198);
HTML Code:
 gen holiday=1 if bcaldate 25dec* and (+1) if dow (sun)
invalid '25dec' 
r(198);
Can anyone help me to extract the specific date 25dec and 1jan for every year, so that I can generate it as a dummy variable.
Thank you in advance.

Regards,
Rozita

Module – respdiff – available from SSC

$
0
0
Thanks to Kit Baum, the module – respdiff – is now available for download from SSC.

Research in survey methodology has provided ample evidence that survey respondents differ in the extent to which they differentiate their answers to survey questions (Krosnick 1991). With respect to this finding, the theory of survey satisficing proposes that under certain conditions respondents might select a somehow reasonable response option for the first item in a set of survey question items, and rate all (or almost all) remaining items with the exactly same response value (Krosnick 1991). In survey methodology, this response pattern is usually referred to as response non-differentiation (e.g., Krosnick and Alwin 1988, Krosnick 1991) or straightlining (e.g., Couper et al. 2013).

The respdiff command enables Stata users to generate several indices of response differentiation for each row r of the data set (e.g. respondents) over the n variables in varlist (e.g. survey questions), ignoring missing values (i.e., system missing values and numeric values that have been changed to missing values using the mvdecode command). While one function creates a binary indicator for non-differentiated responses, the remaining functions (e.g., the standard deviation of responses and the coefficient of variation) provide measures of the extent to which each respondent provided differentiated responses to a user-defined set of survey questions.

The respdiff module should be installed from within Stata by typing “ssc install respdiff”. It requires Stata version 12.1.

References
Couper, M. P., Tourangeau, R., Conrad, F. G., & Zhang, C. (2013). The Design of Grids in Web Surveys. Social Science Computer Review, 31(3), 322-345.

Krosnick, J. A. (1991). Response Strategies for Coping with the Cognitive Demands of Attitude Measures in Surveys. Applied Cognitive Psychology, 5(3), 213-236.

Krosnick, J. A., & Alwin, D. F. (1988). A Test of the Form-Resistant Correlation Hypothesis. Ratings, Rankings, and the Measurement of Values. Public Opinion Quarterly, 52(4), 526-538.

Best wishes
Joss

---
Joss Roßmann
GESIS - Leibniz Institute for the Social Sciences

Change working directory for using multiple machines using - capture do -

$
0
0
Dear Statalist,

In my current job I used multiple computers to do my Stata work from multiple offices. Most of the computers are MACs, but one of them is a PC.

As a result, each time I move locations to do work I need to always SLIGHTLY change the directory from which I am pulling data files into Stata in my do file programs.

Here is an example. I start all my do files generally like this on a PC:

Code:
clear 

set more off 

global myfiles "c:\Users\rnb16003\filepath to stata files"

cd "$myfiles" 

import excel using "$myfiles\name_of_file.xls"
But sometimes I am working on this file from the following place (from a mac):

Code:
clear

set more off

global myfiles "/Users/mac_username/filepath to stata file"

cd "$myfiles" 

import excel using "$myfiles/name_of_file.xls"
Each time then I have to by hand change the file path that I'm working from and that's getting a bit tedious!

I tried this, assuming capture would force the program to keep going until it hit the directory that worked on the machine I was on at the moment:

Code:
clear 

set more off 

capture global myfiles "c:\Users\rnb16003\filepath to stata files"
capture global myfiles "/Users/mac_username/filepath to stata file"

cd "$myfiles" 

capture import excel using "$myfiles\name_of_file.xls" 
capture import excel using "$myfiles/name_of_file.xls"
This isn't working, though.

Does anyone else have a trick I could use? Thanks so much for your help!

Joe Hilbe passes

$
0
0
Joseph Hilbe passed away unexpectedly yesterday, March 12, at his home in Arizona. He was 72 years old.

Joe was a long standing member of the Stata Community. He was the founding editor of Stata Technical Bulletin, predecessor to the Stata Journal, from 1991 to 1993. These days, even though he was supposedly retired, he was still active. He and James Hardin were working on a new edition of their book on generalized linear models. He had that day just sent the latest promised changes to James.

Those who did not know Joe might want to see his Wikipedia entry.

Those who did will mourn his passing.

Remove missing values

$
0
0
Hello,

I'm sure to many of you this will be an elementary matter, but I'm quite new to stata and would desperately appreciate your help!

I'm currently working on a dataset involving Happiness, with values ranging between 1-10. However, missing values have been replaced by "-" where the question was not answered. when I type "tab Happy" I get a detailed breakdown showing that 5166 of the 13255 results are "-" and the breakdown of all other values, but when i type inspect Happy I get that all 13255 are missing.

Ive tried to work around this, using the code:

keep if Happy==1 & Happy==2 & Happy==3 & Happy==4 & Happy==5 & Happy==6 & Happy==7 & Happy==8 & Happy==9 & Happy==10
however I got a type mismatch R109
I've also tried regress Happy A005 if-missing(Happy) to which I got R2000 no observations

I've also tried a few other things but quite frankly I'm out of my depth and would love some help!

Thanks, David
Viewing all 72797 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>