Quantcast
Channel: Statalist
Viewing all 73268 articles
Browse latest View live

Shannon-Weiner index

$
0
0
Does anyone know how to caculate th Shannon
-Weiner index using stata?

"#delimit ;" combined with "*" produces no output

$
0
0
Hello everyone

I have a very simple frustrating problem, for which I found a solution I don't understand, and I couldn't find a post about it anywhere so I am making one here so others can google and hopefully not was time figuring it out themselves.

Problem: using "*" as comments, within a section where ";" is set as delimiter, messes with some commands such that they make no output.
- i have just taken out the part of the code that doesn't work. Making graphs with several lines of code work as normal with "*"

1) Code that doesnt create any output:
Code:
#delimit ;
      
    * PREDICT FORECAST
            predict std_f, stdf;
        
#delimit cr
2) Code that works as normal
Code:
#delimit ;
      
    // PREDICT FORECAST
            predict std_f, stdf;
        
#delimit cr
I have seen posts on how complicated the commenting in stata can get, but not on this issue of "*" creating a problem like this.

I hope this post is not too trivial, but I would like to understand why (1) doesnt work.

Best regards,
Henrik

Loop going through distinct values of variable1 and replacing distinct string in variable2 per each iteration

$
0
0
Hi all,

I have this kind of reproducible code:
Code:
clear

set obs 10
gen str names = ""
gen num = _n

su num, meanonly
di `r(max)'

local abc a b c d e f g h i l
forval i = 1/`r(max)' {
    foreach x of local abc {
        replace names=`"`x'"' if num == `i'
        }
    }
What I would expect is for it to produce a column with "a, b, c, d,...". Instead, I get a column full of "L".
I also checked Statalist but could not figure out why the code is not working.

I'm sure it's a silly mistake, but I find myself stuck.

Thank you

95% BCa confidence interval for Cramers V with negative values - why?

$
0
0
Hello,

I tried to get a 95%-CI for Cramer´s V I calculated using the tabulate-command. I used bootstrapping, the CI got calculated but it entailed negative values which are not possibe for V. So my questions are:
1. Why does Stata calculate negative values for the CI of V? Might there be something wrong with my data?
2. Is there another way to calculate the CI for V besides bootstrapping?

Thanks in advance
Laura

Does a dynamic forecast from a VAR model with exogenous variables consider the values of the exogenous variables in its first forecast

$
0
0
Hi

I have a VAR model with 4 endogeous variables, 2 exogenous variables and 1 lag.

The code is as follows,

Code:
global y2 fdwti fdcoal strg_dev_fd fd_ng
global exog weathershockhdd storage_sf

var $y2 , lags(1) exog($exog)

Where fdwti, fdcoal, strg_dev_fd, fd_ng, weathershockhdd, and storage_sf are all variable.


I am generating a dynamic forecast using this model.


I set up the forecast using this code.

Code:
estimates store m1_1L_f
forecast create m1_1L_model, replace
forecast estimates m1_1L_f
I then run a 4 week dynamic forecast using this code

Code:
forecast solve, suffix(_forecast) begin(date2[`i']) periods(4)

So my question is as follows, when I generate the forecasts, do these forecasts ignore the values of the exogenous variables at the time that the forecast is made. Its obvious that it does not consider these values in the forecasts after the first period, I'm just not sure about the first one.

I noticed that with the forecast function you can specify exogenous variables, however as can be seen my model already included them, hence the confusion.

Thank you for any help.

Probit Analysis Error

$
0
0
Dear Stata Community,


I am trying to run the Probit command with many DVs for different years, I keep receiving the error (2000) message. None of my variables is string.
The period of the analysis is 5 years, I have 3 different dummy variables to be applied for these periods of years (DV1 for year 1 and 2, DV2 for year 2 and 3, DV3 for year 5)


My command is:
probit emp_stat urbdum1 age yrseduc Under5 if year == 2001


I was able to run the code for the first two years but substituting the DV and the year for the third period brings out the error r(2000) message saying "no observation"

My data looks like this:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte emp_stat int year float urbdum1 int age byte yrseduc float Under5
0 1995 1 40 10 0
1 1995 1 28 13 1
1 1995 1 23 13 0
0 1995 1 31 10 0
1 1995 1 37 13 0
1 1995 1 45 10 0
1 1995 1 47 12 0
1 1995 1 35 12 1
1 1995 1 35 12 1
0 1995 1 40 10 0
1 1995 1 39  . 0
1 1995 1 42 12 0
1 1995 1 21 12 0
1 1995 1 34 12 1
1 1995 1 41 12 0
1 1995 1 23 12 0
1 1995 1 55 13 0
1 1995 1 31 15 0
1 1995 1 46 12 0
1 1995 1 21 12 0
1 1995 1 34 15 1
0 1995 1 43 12 0
1 1995 1 29 13 1
1 1995 1 43 12 0
1 1995 1 36 12 0
1 1995 1 39 11 0
1 1995 1 45 12 0
1 1995 1 37 12 0
1 1995 1 41 12 0
1 1995 1 38 12 1
1 1995 1 39 12 0
1 1995 1 45 10 0
1 1995 1 36 11 1
1 1995 1 25 10 0
1 1995 1 27 12 0
1 1995 1 34  6 1
1 1995 1 38 10 1
1 1995 1 28  9 0
0 1995 1 31  9 1
1 1995 1 38  6 0
0 1995 1 22  9 1
1 1995 1 41  7 1
1 1995 1 26  8 1
1 1995 1 19 12 0
1 1995 1 20 12 1
1 1995 1 33 10 1
1 1995 1 43  9 0
1 1995 1 19 10 0
1 1995 1 23 12 0
1 1995 1 46  7 0
0 1995 1 23 10 0
1 1995 1 33 10 0
1 1995 1 42 13 0
1 1995 1 27 12 1
1 1995 1 25 12 0
1 1995 1 23 12 0
0 1995 1 20 11 1
1 1995 1 23 12 1
1 1995 1 39 12 0
1 1995 1 50  7 0
1 1995 1 33  8 0
1 1995 1 26 10 1
1 1995 1 19 12 1
1 1995 1 28  9 1
1 1995 1 29 12 0
1 1995 1 34  9 1
0 1995 1 20  7 1
1 1995 1 30  8 1
1 1995 1 43  9 0
1 1995 0 32 15 1
0 1995 0 29  8 0
0 1995 0 21 12 0
1 1995 0 22 12 0
1 1995 0 49  6 0
1 1995 0 50 12 0
1 1995 0 30  8 0
1 1995 0 41  6 1
1 1995 0 22  6 1
1 1995 0 24  6 1
1 1995 0 30  6 1
1 1995 0 52  6 0
1 1995 1 24 10 0
1 1995 1 30 12 0
1 1995 1 24 12 1
0 1995 1 28 12 1
0 1995 1 24 12 0
1 1995 1 43 10 0
0 1995 1 34 15 1
1 1995 1 32 12 0
1 1995 1 39 12 0
1 1995 1 54 10 0
1 1995 1 25 12 0
1 1995 1 28  8 0
1 1995 1 27 10 0
1 1995 1 42 12 0
1 1995 1 21 12 0
1 1995 1 24 12 0
1 1995 1 23 12 0
1 1995 1 26 12 1
1 1995 1 24 13 0
end
label values age age

What are your recommendations for this?

Download financials

$
0
0
I want to download balace sheet and income statements for traded companies. It can be done with command fetch_statements. For example, command fetch_statements F, freq(A) st(BS)
will download balance sheet of Ford for 2015-2018. Is it possible to download data for earlier periods, e.x. 1990?

** New on Github ** sfv: Dofile backup/versioning program

$
0
0
Hi all

I made a program in Powershell that makes it easy to keep multiple versions of dofiles. It is very straightforward - all it does is repeatedly copy all the changed files in a folder and add the date and time to the name of the file. These files are stored in a separate folder. You can customize what suffix that folder gets and how often the program checks for changes. You can set it up for as many folders as you want. The program takes up essentially no hard drive space and uses almost no cpu or memory. It comes with an "installer" which makes sure the program is always running when you log in.

You can download it on github: https://github.com/Danferno/sfv
All information is in the readme file. It should work on all Windows PCs. Non-Windows PCs may not have Powershell installed.

I've tested this program on a few different setups and it worked fine, but bugs might still exist. If you find one, please report them here or on github.

I use it to keep versions of all my scripts and dofiles. I prefer it over Github or manually saving copies because I don't want to decide when to commit etc. With this thing on, I can just overwrite code without fear, because I can always go back to earlier versions. Also, sometimes I forget which change I made that led to different results. As long as I know roughly when the old result was generated, I can just go back to the dofile from that time and see. Likewise for log files.

FYI: This is essentially the end result of the versions folder.

Array

Nested Logit - sequential Estimation

$
0
0
Dear all,

I'm trying to estimate a nested logit myself. Due to the computational burden of the estimation process of the FIML, I'd like to estimate the nested logit model by estimating two sequential logits (LIML) as described in Greene (2002) p.729 onwards or Train(2002) p.97. I couldn't find a handy example neither with or without stata code.

Could someone explain the steps that are needed in order to estimate a model similar to the one of the example (restaurant) of statas nlogit command?

webuse restaurant

nlogitgen type = restaurant(fast: Freebirds | MamasPizza, family: CafeEccell
| LosNortenos | WingsNmore, fancy: Christophers | MadCows)

nlogit chosen cost distance rating || type: income kids, base(family) ||
restaurant:, noconst case(family_id)


RUM-consistent nested logit regression Number of obs = 2100
Case variable: family_id Number of cases = 300

Alternative variable: restaurant Alts per case: min = 7
avg = 7.0
max = 7

Wald chi2(7) = 46.71
Log likelihood = -485.47331 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
chosen | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
restaurant |
cost | -.1843847 .0933975 -1.97 0.048 -.3674404 -.0013289
distance | -.3797474 .1003828 -3.78 0.000 -.5764941 -.1830007
rating | .463694 .3264935 1.42 0.156 -.1762215 1.10361
------------------------------------------------------------------------------
type equations
------------------------------------------------------------------------------
fast |
income | -.0266038 .0117306 -2.27 0.023 -.0495952 -.0036123
kids | -.0872584 .1385026 -0.63 0.529 -.3587184 .1842016
-------------+----------------------------------------------------------------
family |
income | 0 (base)
kids | 0 (base)
-------------+----------------------------------------------------------------
fancy |
income | .0461827 .0090936 5.08 0.000 .0283595 .0640059
kids | -.3959413 .1220356 -3.24 0.001 -.6351267 -.1567559
------------------------------------------------------------------------------
dissimilarity parameters
------------------------------------------------------------------------------
type |
/fast_tau | 1.712878 1.48685 -1.201295 4.627051
/family_tau | 2.505113 .9646351 .614463 4.395763
/fancy_tau | 4.099844 2.810123 -1.407896 9.607583
------------------------------------------------------------------------------
LR test for IIA (tau = 1): chi2(3) = 6.87 Prob > chi2 = 0.0762
------------------------------------------------------------------------------


1.) How to set up the two logit specifications?
2.) How to calculate the Inclusive values?


Best
Julian

Help with tabout of variables with multiple values and with decimal place?

$
0
0
Hello, I am writing to see if someone would be able to assist me with using tabout to export tabulations of variables with multiple values. I am attempting to tabout a crosstab of hourly rates, which has 24 values (0-23), by states, which has 45 values. I can tabout each state separately, with the code: tabout hour if state==15 using "StatesTime.xls", cells(col) replace, but is there anyway to do the states all at once?

Another question - the code I provided above exports the data to the tenth place. Is there a way to export it to the hundredth?

Many thanks.

newey2 with force

$
0
0
Hello,
I am working with a very unbalanced panel data set, where my id variable is an individual, and the time variable is quarters. I am currently trying to run separate regressions for each individual in my panel, using Newey-West standard errors to control for what I know to be serial correlation in my data. When I read in the data, I --xtset-- my data as well as use --tsfill, full--. My code looks like the following:
Code:
 levelsof person, local(levels) 

 foreach l of local levels {

 newey2   depvar   mean_depvar  if person == `l', force lag(4) 

 }
where --mean_depvar-- is just the mean, by quarter, of the dependent variable of my regression. I am using force because I get the following error:
Code:
yq is not regularly spaced -- use the force option to override
I understand that because I have missing observations for some individuals in the panel as the reason for this error. However, I just want to ensure that --force-- is not interpreting the correlation structure of the residuals incorrectly. If I have a residual for an individual at time t and time t+4, but nothing for t+2 and t+3, I do not want Stata to think that the correlation between the residuals at time t and t+4 are only one period apart. The documentation for --newey2-- says the following:

newey2 handles missing observations differently for time series and panel data sets. Consider the example of a time series data set containing gaps, which is then recast using tsset as a panel data set with one group. newey and newey2 will not run on the time series version without force; with force they treat available observations as equally spaced. After the set is cast as a panel, newey2 will run without , force, and will assume zero serial correlation with missing observations.

Is the reason that I cannot run --newey2-- without --force-- is because now I am running separate regressions, which are essentially using unbalanced time-series for each individual? Any thoughts or guidance on this would be much appreciated!

Accidentally saved as txt file

$
0
0
Hello
I imported a txt file and then accidentally overwrote it as a txt file. I did not change any of the file’s contents, but the save command has resulted in an apparently corrupt txt file full of ‘gibberish’ characters.

Here's my code:

import delim using “C:\myfolder\myfile.txt”, delim(“|”)
save “C:\myfolder\myfile.txt”, replace

Is there any chance of restoring the original content, which presumably is still in there somewhere amongst all the gibberish characters?

Any advice greatly appreciated

With thanks

Issue with svyset, getting "command svyset_8 is unrecognized"

$
0
0
Hello all,

I am in the process of analyzing some data from the National Readmissions Database (NRD). I have used the svyset command on other HCUP databases including NIS and NEDS with no issue. However, for some reason I start using NRD and it appears the svyset function is no longer working.

Here is what is going on. Every time I am trying to define my svyset using the code svyset HOSP_NRD [pweight=DISCWT], strata (NRD_STRATUM) I then receive the error code command svyset_8 is unrecognized. All of the variables (weight, strata) are correct, it seems that svyset is for some reason no longer recognized. Even more concerningly, I just tried to clear and re-run my survey set designation on the prior NIS and NEDS data that had worked in the past, and now the svyset command is no longer working there either.

I have tried update STATA, restarting my computer, all to no avail. Outside of de-installing and re-installing STATA, does anyone have any other advice?

How to print values in parenthesis when using a matrix with esttab?

$
0
0
So I generate a lot of tables where I have to use parenthesis (e.g mean (SD) format). The problem is that I can't find a way to do this out of stata when I have things stored in a matrix

For example:

Code:
sysuse auto
gen pw = 1
svyset [pw=pw]
svy: reg mpg c.weight
esttab, cells(b & se(par))

-------------------------
(1)
mpg
b/se
-------------------------
weight -.0060087 (.0005801)
_cons 39.44028 (1.974654)
-------------------------
N 74
-------------------------


This is the desired output.

Let's say I want to store the mean and sd and print that out.

Code:
sysuse auto
gen pw = 1
svyset [pw=pw]
svy: mean mpg
estat sd
mat values = (r(mean),r(sd))'
mat rownames values = "Mean" "SD"
esttab matrix(values), cells(Mean & SD(par))

-------------------------
values
y1
-------------------------
Mean 21.2973
SD 5.785503
-------------------------



How would you go about this? Note that I usually run many iterations (e.g I have a 2x10 matrix with first row being means and second SDs or SE).
I have done the work of putting 21.2973 (5.785503) in post but there has to be a better way.

cmissing having no effect

$
0
0
I am trying to create a line graph that does not connect when there are observations with missing data. The cmissing option does not appear to be functioning as I expect it would. Using an example given in a previous answer on this topic (https://www.statalist.org/forums/for...issing-command), I would think the following code would have a line broken in the middle:
Code:
webuse grunfeld,clear
replace invest = .  if inrange(year, 1940, 1950)
twoway line invest year if com == 1, cmissing(n)
However, I get a single line with the points connecting Array

Any help is appreciated.

Propensity score matching

$
0
0
Hi everyone, am a little new to stata, and commands are grilling me. I need help, am trying to add an exclussion restriction to my propensity score matching command in stata but, am facing some challenges.
Is there anyone in the group who has tried this before, maybe you can help me?? Am really in need of your help.

import sasxport slow

$
0
0
I'm running Stata 15 SE on a new Windows 2016 terminal server VM. This is a replacement upgrade for a 2008 R2 server VM in the same VMware cluster.

On the new server, reading in an xpt file using the import sasxport command is taking about 8 minutes - much longer than it did on the old server. Both servers have 4 CPUs and 40 GB of ram. The xpt file is only 456 MB. Opening a larger dta file is fast - under 3 seconds to open a 638 MB file.

The data files are located on a SAN connected by a private 10gb switch. I thought at first it was a network setting setting on the vm nics connected to storage, but then I opened the dta file from there. I copied the xpt file to the C drive, but the import command still took 8 minutes to load the file.

I don't believe I ever had to tweak the Stata installation/configuration on the old server. Is there a setting or reg key I should check that would affect the import process?

Thanks.

F-test interpretation

$
0
0
I am running restricted F tests on diff-in-diff regressions to evaluate the effect of a government health initiative. I am using the testparm command and I am trying to understand how the tests should be interpreted when I use the command
testparm education#post
versus
testparm education##post
. Using 2 hashtags tests not only the interaction terms (12.education#1.post, 13.eduction#1.post, and so on) but also the coefficients on education and post themselves. How should my interpretation of the results change based on whether or not these additional coefficients are included in F-test, and which is likely a better test to run? Thank you.

Effect size with CI for categorical variables.

$
0
0
I am interested in obtaining effect size with Cohen method for difference between two groups in balance of a categorical variable (Race)
For binary variables, i usually use the function
Code:
esize twosample AGE, by(FEMALE) cohen
However, here i am interested in knowing the effect size for the difference on the categorical variable RACE which can be either 1, 2, 3 between males and females (FEMALE)
Appreciate help

----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(AGE RACE FEMALE)
90 1 1
86 1 0
72 2 0
72 2 1
88 1 1
64 1 0
77 1 1
90 2 1
81 1 1
87 1 1
64 1 0
86 3 1
60 1 0
90 2 1
81 1 1
83 1 1
88 1 1
87 1 1
83 3 1
88 2 1
82 1 1
87 1 0
64 1 0
87 1 1
79 1 0
80 1 1
90 1 0
89 1 0
89 1 0
89 1 0
90 1 0
84 1 0
73 1 0
72 1 0
87 1 0
81 1 1
90 1 0
75 1 1
70 1 0
73 1 0
90 1 1
78 1 0
75 1 1
80 1 0
68 1 1
71 1 0
84 1 1
90 1 1
88 1 1
89 3 0
end
------------------ copy up to and including the previous line ------------------

Omitted terms vary error following mi estimate: svy: command

$
0
0
Hello Stata Forum,

I am conducting a study using complex survey weighted NHANES data to look at weight across age categories. Some NHANES participants did not have a body mass index (BMI) measured so I employing multiple imputation to address this but have run into an estimation error that I believe is unique to running MI on survey data.

After survey setting my data and running multiple imputation with 20 iteration for the missing bmi values, I create a passive variable for bmi cut off based on the imputed bmi values:
Code:
mi passive: gen ex_bmi = 0
mi passive: replace ex_bmi = 1 if bmi>45
I then run an survey estimate command to obtain totals of patients with BMI>45 across age deciles, however I receive the following error

Code:
mi estimate: svy: total ex_bmi, over(age_cat)
mi estimate: omitted terms vary
    The set of omitted variables or categories is not consistent between m=1 and m=9; this is not allowed.  To identify varying sets, you can use mi
    xeq to run the command on individual imputations or you can reissue the command with mi estimate, noisily
When I follow the error command suggestion and run mi xeq I find that some but not all imputations have 0 patients within an age category who meet the cut off and as a result the whole subpopulation is excluded - two examples out of the 20 imputations show below:

Code:
m=8 data:
-> total ex_bmi, over(age_cat) cformat(%9.0f) noisily

Total estimation                  Number of obs   =      7,581

    _subpop_1: age_cat = 40-49
    _subpop_2: age_cat = 50-59
    _subpop_3: age_cat = 60-69
    _subpop_4: age_cat = 70-79
    _subpop_5: age_cat = 80+

--------------------------------------------------------------
        Over |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
ex_bmi       |
   _subpop_1 |         53          7            39          67
   _subpop_2 |         33          6            22          44
   _subpop_3 |         32          6            21          43
   _subpop_4 |          8          3             2          14
   _subpop_5 |          0  (omitted)
--------------------------------------------------------------

m=9 data:
-> total ex_bmi, over(age_cat) cformat(%9.0f) noisily

Total estimation                  Number of obs   =      7,581

    _subpop_1: age_cat = 40-49
    _subpop_2: age_cat = 50-59
    _subpop_3: age_cat = 60-69
    _subpop_4: age_cat = 70-79
    _subpop_5: age_cat = 80+

--------------------------------------------------------------
        Over |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
ex_bmi       |
   _subpop_1 |         51          7            37          65
   _subpop_2 |         33          6            22          44
   _subpop_3 |         33          6            22          44
   _subpop_4 |          8          3             2          14
   _subpop_5 |          1          1            -1           3
--------------------------------------------------------------
Having identified the problem, I am not sure what the solution is, as I believe the imputation is running correctly but it is a matter of how the survey commands interact with MI that causes this error. I would appreciate any suggestions.

Best,
Tim Anderson
Viewing all 73268 articles
Browse latest View live