Quantcast
Channel: Statalist
Viewing all 72797 articles
Browse latest View live

Area Under Curve (AUC) after random Effects models

$
0
0
I would like to know how can I calculate AUC after I run a logit model using runmlwin command through STATA.

Kind regards

State-level time series: tool for calculating requisite number of observations per state/year?

$
0
0
Greetings,

I'm running Stata 15.1 on OSX. My ultimate goal is to examine whether changes in a specific independent variable predict changes in state-level attitudes across time. My dataset consists of pooled individual-level cross-sectional survey data that was collected in 2008, 2010, 2011, 2012, 2014, 2015, 2016, and 2018. After pooling all of the data, I proceeded to estimate the state-level means per year for my variable of interest. My concern is that certain states (e.g. Wyoming) in certain years have very few observations (because the survey samples themselves vary in size. For instance, there were 7,636 respondents in the 2016 survey and 41,419 in the 2014 survey). This means that the estimates for some states in certain years will be highly unreliable. I'm thus wondering how to handle such a predicament. I'm thinking I need to calculate how many observations per state/year will be sufficient for achieving estimates with, say, a 5-10 percentage point margin of error. I wasn't sure if there was a built-in or add-on tool that performs such tasks, so I wanted to ask. Thanks in advance for your help/suggestions!

-Zach

line of fit for two skewed continuous variables

$
0
0
I am conducting Spearman's correlation on two skewed continuous variables. I do not wish to log transform them in order to perform Pearson's correlation.

I want to display a scatter plot of the two variables and a line of fit as this

Code:
twoway scatter x y || lfit x y
However, I'm worried that the line of fit drawn on the scatter graph by the -lfit- command will be a false representation since the lfit assumes that the distributional assumption of normality is met??

Does anyone know how I can represent a non-parametric line of fit without log transforming the skewed variables?

Thanks,

Madu

Collapse says varlist required

$
0
0
Hi!

I tried to collapse the following data but it says varlist required, why?

collapse smi1 smi2 lsmi1 lsmi2 [pw==shares], by (permno date)

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(permno shares date smi1 smi2) float(lsmi1 lsmi2)
10001 119324 18628        -69.55078125 -1260000   -736.769 -2210000
10001 117181 18628                   0        0  -51.88709  -940000
10001  16900 18628                   0        0 -126.95777 -2300000
10001   6150 18628   87.21446990966797  1580000          0        0
10001  12366 18628                   0        0   998.0674  5500000
10001  24315 18628  240.66776084899902  4360000   635.3408 11510000
10001    750 18628  165.04510498046875  2990000  -165.0451 -2990000
10001  19986 18628   51.33509826660156   930000  103.22218  1870000
10001 190860 18628   35.87936782836914   650000          0        0
10001  58062 18628  -160.0771942138672 -2900000  -204.2364 -3700000
10001  24315 18629  -2143.792381286621 -9550000   807.5618 14630000
10001  58062 18629   1033.946834564209  4500000  -364.3136 -6600000
10001 190860 18629  -792.1060791015625 -2800000  35.879368   650000
10001   6150 18629  -87.21446990966797 -1580000          0        0
10001  16900 18629  -750.0168380737305  -800000          0        0
10001 117181 18629   620.0230731964111  2570000  -51.88709  -940000
10001    750 18629 -1810.5282592773437 -6400000   165.0451  2990000
10001  19986 18629 -1314.4269104003906 -6900000   205.8924  3730000
10001 119324 18629    736.769063949585  2210000  -763.8166 -2700000
10001  12366 18629   818.6706085205078  2250000   906.9891  3850000
10001 117181 18630                   0        0   594.0795  2100000
10001  19986 18630  -205.8923797607422 -3730000  -902.6422   560000
10001 190860 18630                   0        0  -720.3474 -1500000
10001  16900 18630  126.95777130126953  2300000  -750.0168  -800000
10001  24315 18630 -24.287572860717773  -440000 -1557.0267  1080000
10001  58062 18630 -57.958980560302734 -1050000   771.7514  -250000
10001   6150 18630                   0        0  -87.21447 -1580000
10001 119324 18630         69.55078125  1260000    736.769  2210000
10001    750 18630  221.34811401367187  4010000  -1645.483 -3410000
10001  12366 18630   91.07839965820313  1650000   818.6706  2250000
10001  19986 18631                   0        0  -205.8924 -3730000
10001  24315 18631 -195.40457344055176 -3540000  -24.28757  -440000
10001  16900 18631                   0        0  126.95777  2300000
10001   6150 18631   87.21446990966797  1580000          0        0
10001  12366 18631                   0        0    91.0784  1650000
10001    750 18631                   0        0   221.3481  4010000
10001 117181 18631                   0        0          0        0
10001 119324 18631  -27.04752540588379  -490000   69.55078  1260000
10001  58062 18631  -160.0771942138672 -2900000  -57.95898 -1050000
10001 190860 18631                   0        0          0        0
10001 117181 18632                   0        0          0        0
10001  58062 18632  218.03617477416992  3950000  -160.0772 -2900000
10001  12366 18632  -91.07839965820313 -1650000          0        0
10001 190860 18632  -35.87936782836914  -650000          0        0
10001    750 18632  -193.1966094970703 -3500000          0        0
10001   6150 18632                   0        0   87.21447  1580000
10001  24315 18632 -215.27620697021484 -3900000 -195.40457 -3540000
10001  16900 18632  126.95777130126953  2300000          0        0
10001  19986 18632 -154.55728149414062 -2800000          0        0
10001 119324 18632         69.55078125  1260000 -27.047525  -490000
10001 119324 18633  -27.04752540588379  -490000   69.55078  1260000
10001 117181 18633  25.943544387817383   470000          0        0
10001  16900 18633                   0        0  126.95777  2300000
10001  24315 18633   49.67912673950195   900000  -215.2762 -3900000
10001 190860 18633   35.87936782836914   650000 -35.879368  -650000
10001  19986 18633   51.33509826660156   930000  -154.5573 -2800000
10001   6150 18633  -87.21446990966797 -1580000          0        0
10001    750 18633  28.151504516601563   510000  -193.1966 -3500000
10001  12366 18633  -44.15922546386719  -800000   -91.0784 -1650000
10001  58062 18633                   0        0   218.0362  3950000
10001  16900 18634  126.95777130126953  2300000          0        0
10001  19986 18634 -154.55728149414062 -2800000    51.3351   930000
10001  12366 18634  -91.07839965820313 -1650000  -44.15923  -800000
10001 119324 18634         69.55078125  1260000 -27.047525  -490000
10001  58062 18634  102.11821365356445  1850000          0        0
10001  24315 18634 -176.63690948486328 -3200000   49.67913   900000
10001 190860 18634                   0        0  35.879368   650000
10001   6150 18634                   0        0  -87.21447 -1580000
10001 117181 18634                   0        0  25.943544   470000
10001    750 18634  -193.1966094970703 -3500000  28.151505   510000
48486   157100 18630                   0        0 -179.94884 -3260000
48486   114744 18630                   0        0   828.6754  6020000
48486       30 18630                   0        0  -875.7326 -4480000
48486   206461 18630  126.95777130126953  2300000  -750.0168  -800000
48486   735988 18630   44.71121597290039   810000  121.98987  2210000
48486   141006 18630                   0        0  -74.51869 -1350000
48486  6032435 18630   68.99878692626953  1250000   678.9481  2400000
48486    93037 18630   70.10276794433594  1270000  -508.5901 -1170000
48486   134071 18630                   0        0   560.7532  1290000
48486   224297 18630         69.55078125  1260000    736.769  2210000
48486    96331 18630                   0        0  1007.1063  4220000
48486   340026 18630                   0        0  -176.6369 -3200000
48486  2843432 18630   91.07839965820313  1650000   818.6706  2250000
48486     8832 18630                   0        0  565.79004  2000000
48486  1996036 18630                   0        0  -720.3474 -1500000
48486   471167 18630                   0        0 -17.663689  -320000
48486   107866 18630                   0        0   594.0795  2100000
48486   341814 18630                   0        0  -786.4482 -2780000
48486 37259331 18630 -24.287572860717773  -440000 -1557.0267  1080000
48486   205836 18630                   0        0  -704.4086 -2490000
48486  2538329 18630  221.34811401367187  4010000  -1645.483 -3410000
48486   494176 18630                   0        0  -87.21447 -1580000
48486 21538876 18630  -205.8923797607422 -3730000  -902.6422   560000
48486   603629 18630                   0        0  -672.0482 -3100000
48486    13728 18630                   0        0 -1402.4695 -2720000
48486    78400 18630                   0        0          0        0
48486  1629980 18630                   0        0  -777.9614 -2750000
48486   667007 18630    80.0385971069336  1450000  185.46873  3360000
48486  2407336 18630 -57.958980560302734 -1050000   771.7514  -250000
48486  6905562 18630                   0        0          0        0
48486   341814 18631                   0        0          0        0
48486    96331 18631  40.295291900634766   730000          0        0
48486 37259331 18631 -195.40457344055176 -3540000  -24.28757  -440000
48486 21538876 18631                   0        0  -205.8924 -3730000
48486       30 18631  -71.75873565673828 -1300000          0        0
48486    93037 18631  27.047523498535156   490000   70.10277  1270000
48486  2843432 18631                   0        0    91.0784  1650000
end
format %td date
Thanks!

Merging datasets based on recorded time

$
0
0
Hello,

I have a data file that contains activities done during certain times (for the purpose of this forum I am making up examples). Let's say , individual 1 made dinner from 0 - 29 minutes and then from watched TV from minute 29 - 78 (start times and endtimes recorded as hh:mm:ss and every activity is a new row). There are no gaps in the recording of activities. In a second dataset, I collected other things the individuals did (let's say, discussing politics). These activities were recorded in an identical format but there are gaps.
I would now like to merge the dataset so that I know if the individual was preparing dinner or watching TV when discussing politics. I am hoping for a new variable that would say "TV" in case the individual was watching TV when discussing politics or "dinner" if preparing dinner. One issue is that it is possible, the person was first preparing dinner and then watching TV while having a long discussion about politics. Or in other words, the activities from dataset 2 are mostly, but not always, nested in the activities from dataset 1.

I hope this makes sense and any help is appreciated!

Creating tables and graphs on data relating to both the survey respondent and their partner

$
0
0
Hi Statalist.

I would appreciate a hand with how to create a scatterplot presenting data on both the respondent and their partner for categorical variables, such as life satisfaction scores or self-reported health scores. I would like to do this for one wave of data, then present this for several waves of data. Based on my search in Statalist and Stata help, etc, I have not yet found how I can achieve this.

Thank you in advance.
Chris

Macro string manipulation

$
0
0
Hi

I am trying to produce a one line summary of a categorical variable using Ben Jann's fre command.

Code:
ssc install fre

sysuse nlsw88, clear

keep if industry>8

fre industry

return list

mat M = r(valid)
forval i = 1/`=rowsof(M)' {
      local counts `counts'  `=M[`i',1]'
}
display "`counts'"

local n : word count `counts'

forvalues i = 1/`n' {
    local part1 : word `i' of `r(lab_valid)'
    local part2  : word `i' of `counts'
    if `i'!= `n' {
      local summary `summary' `"`part1' (`part2'); "'
    }
    else {
      local summary `summary' `"`part1' (`part2')"'
    }    
}

macro list _summary
Output:
Code:
. sysuse nlsw88, clear
(NLSW, 1988 extract)

.
. keep if industry>8
(1,118 observations deleted)

.
. fre industry

industry -- industry
------------------------------------------------------------------------------
                                 |      Freq.    Percent      Valid       Cum.
---------------------------------+--------------------------------------------
Valid   9  Personal Services     |         97       8.60       8.71       8.71
        10 Entertainment/Rec Svc |         17       1.51       1.53      10.23
        11 Professional Services |        824      73.05      73.97      84.20
        12 Public Administration |        176      15.60      15.80     100.00
        Total                    |       1114      98.76     100.00           
Missing .                        |         14       1.24                      
Total                            |       1128     100.00                      
------------------------------------------------------------------------------

.
. return list

scalars:
                  r(N) =  1128
            r(N_valid) =  1114
          r(N_missing) =  14
                  r(r) =  5
            r(r_valid) =  4
          r(r_missing) =  1

macros:
             r(depvar) : "industry"
              r(label) : "industry"
          r(lab_valid) : "`"9 Personal Services"' `"10 Entertainment/Rec Svc"' `"11 Professional Services"' `"12 Public Administration"'"
        r(lab_missing) : "`"."'"

matrices:
              r(valid) :  4 x 1
            r(missing) :  1 x 1

.
. mat M = r(valid)

. forval i = 1/`=rowsof(M)' {
  2.       local counts `counts'  `=M[`i',1]'
  3. }

. display "`counts'"
97 17 824 176

.
. local n : word count `counts'

.
. forvalues i = 1/`n' {
  2.     local part1 : word `i' of `r(lab_valid)'
  3.     local part2  : word `i' of `counts'
  4.     if `i'!= `n' {
  5.       local summary `summary' `"`part1' (`part2'); "'
  6.     }
  7.     else {
  8.       local summary `summary' `"`part1' (`part2')"'
  9.     }    
 10. }

.
. macro list _summary
_summary:       9 Personal Services (97); `"10 Entertainment/Rec Svc (17); "' `"11 Professional Services (824); "' `"12 Public Administration (176)"'
If you have scrolled down this far, many thanks!

I would like to strip summary of all types of quotes so that it reads:
Code:
_summary:       9 Personal Services (97); 10 Entertainment/Rec Svc (17); 11 Professional Services (824); 12 Public Administration (176)
Can anybody help?

With best wishes and thanks,

Jane

Loop to extract coefficients from regressions of increasing horizons and plotting them

$
0
0
Hi

I want to construct a loop that extracts the slope coefficients on two explanatory variables from regressions of increasing horizons. Subsequently, I want to plot these coefficients.

For example:

my first regression is :
reg f.firm_profit firm_sale firm_expense, vce(cluster firm)
the second will be :
reg f2.firm_profit firm_sale firm_expense, vce(cluster firm)
The third will be:
reg F3.firm_profit firm_sale firm_expense, vce(cluster firm)

....and so on for up to F30 (30 forecast horizons)

I need the slope coefficients on firm_sale and firm_expense from each regression and then plot these (the Y axis represents the slope coeffcient values and the X axis represents the horizon)

Can anyone please help me to construct this loop and plot the coefficients?

Thank you so much

how to check and address endogeneity issue

$
0
0
Hi..

I have one problem to identify and address endogeneity issue. I want to check relationship between sustainability (FS) and ownership structure (GOV)

DV: sustainability
IV: ownership
CV: Firm size and Firm Age


I have try using these command and result show



1. I have using correct command and step?
2. how to check either have endogeneity issue or not?

consecutive failures in panel data

$
0
0
Hi,

I was hoping to get some advice on identifying consecutive failures in panel data.

I am looking at outcomes of operations to lower pressure in the eye to treat glaucoma. I am interested in identifying individuals who have intraocular pressure (iop) >18 on 2 consecutive occasions. Is this possible ?

I’ve enclosed a simplified example dataset, showing what I’m trying to achieve (fail column)

Thanks (and Merry Xmas)

Ali

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id visit) int(date_of_surgery date_postop) byte(iop fail) float td byte(_st _d) int _origin byte(_t _t0)
1 1 21640 21642 15 .  2 1 0 21640  2  0
1 2 21640 21643 19 0  3 1 0 21640  3  2
1 3 21640 21644 17 .  4 1 0 21640  4  3
1 4 21640 21645 19 0  5 1 0 21640  5  4
2 1 21645 21646 16 .  1 1 0 21645  1  0
2 2 21645 21647 20 .  2 1 0 21645  2  1
2 3 21645 21648 20 1  3 1 1 21645  3  2
2 4 21645 21649 12 .  . 0 . 21645  .  .
3 1 21640 21650 13 . 10 1 0 21640 10  0
3 2 21640 21651 13 . 11 1 0 21640 11 10
3 3 21640 21652 15 . 12 1 0 21640 12 11
end
format %tdnn/dd/CCYY date_of_surgery
format %tdnn/dd/CCYY date_postop

Problem with capture confirm file

$
0
0
I have a folder containing some 70,000 systematically named files which I am going to merge into fewer files eventually. I start with running a simple loop using the capture confirm file construct to check which files do not exist but I always receive code 601 (not found) even for the files I can see that exist. Can anyone think of a reason why this happens?

foreach t in 17928 17956 17987 18017{
foreach f in 5560003468 5560004615 5560005331 {
capture confirm file "C:\Users\Nima\Desktop\Directors\Merged`t'-`f'"
dis _rc
use "C:\Users\Nima\Desktop\Directors\Merged`t'-`f'"
clear
}
}

Just to illustrate the problem, I have added a shortened version of the loop I am running above. All 12 files do exist above and Stata opens them one by one in the loop but first displays code 601.











Advice on interpretation of prtest (two sample test of proportion)

$
0
0
Hello everybody,

As a student assignment, we are conducting research on how different interventions influence the waste sepration behaviour of students. Our data consists of binary data inidcating whether the item was separated correctly or not. We have 3 groups of which we have this data, first a control group, and two groups which were treated with and intervention.

To see which where the best performing group, we used besides others, also the command prtest control==Treatment2.
After long research and a bit of confusion, we are not sure how to interpret the outcome table and how one would report this in an academically accepted manner.

Array

Thank you in advance for any help,
Nora

First differencing and fixed effects?

$
0
0
Dear all,

I am working with a cross-section cross-time panel dataset with N = 15 (country) and T = 23 (year), with percentage of right-wing votes as the independent variabel and citizenship law score as the dependent variabel, plus four control variabels. The dependent variable is lagged (one year ahead) to ensure X being ahead in time of Y.
I have non-stationary panels and therefore apply first differencing to my regression model to solve this problem. I also deal with heteroskedasticity and serial correlation (however not serial correlation when I apply first differencing to the regression model and run the test again).

I have a couple of questions on how to proceed, which I cannot seem to solve by reading, so I hope you can help out.

1) First of all, is it correct to assume there is no longer serial correlation because I applied first differencing to the model, and I therefore do not need to handle it by applying -cluster(country_var)- to my estimation model?

2) Is it possible to apply both first differencing AND fixed effects estimation to the model? I have read in multiple books, e.g. Wooldridge (2014), that either fixed effects estimation or first differencing can be applied to handle the same type of problems/difficulties with panel data, but I can't seem to find anywhere it states that it is impossible to use both in the same model. So, is it actually possible to apply first differencing to handle the stationary problem, but also apply fixed-effects estimation to the same model? Or is this simply just meaningless?

My apologies if the questions seems pointless, I am not great with statistics and this is quite advanced compared to what I have been taught so far academically.

Best regards,
Laura

Panal Data copying missing characteristic data

$
0
0
Hey guys,

I'm using unbalenced panal data and want to copy missing values for "date of birth". In the first questionnaire they didn't ask for "date of birth". It was added later on (year x), but the variable appears only once per person. To keep the survey on the same size new persons were added year after year. The persons hat to answer the question of "year of birth" always on their first questionnaire (year y).

I tried the comand:

Code:
bysort id (year) : replace Birth = Birth[_n-1] if Birth == -8|-5
but stata replaced all observations of "year of birth" with missing values. I'm, pretty sure that I have to replace [_n-1], but I didn't manage to replace it successfully.

I wish you guys a nice christmas

Stochastic Frontier Analysis Loop

$
0
0
Dear Statalist members,

I would like to run a SFA loop to get efficiency estimates for a worldwide sample for 28 years. I group firms based on Country and Fama and French classification and then run the following loop

egen group = group(CountryID FamaFrench)

generate eff=.

forvalue i= 1(1)20 {
sfpanel Sales CostofGoodsSold SellingGeneralAdminExpenses PropertyPlantandEquipment if group==`i'
predict temp`i', bc
replace eff= temp`i' if group==`i'
drop temp`i'
}


The loop runs for a number of groups but then stops and gives the following error:
"initial: Log likelihood = -<inf> (could not be evaluated)
could not find feasible values
r(491);
"
I have dropped observations with missing values. I also think it is not because of the number of observations, because the loop has already run for a group with less observations.
I understand that it has to do with ML maximization but I don't know whether and how I can solve it. Any ideas regarding how I could solve it?

Yours sincerely
Periklis Boumparis


Panel Data Hausman test

$
0
0

Hello, I am working with 10years and 46 countries data. All is fine until I tried the Hausman test. I realized that the random effect model estimates all my variables with collinearity issues but fixed effect drops 4 of my dummy variables because of collinearity. The Hausman test predicts fixed effect model as the best model. My question is, should I run the analysis without the variables being dropped because of colinearity in the fix-effect model or I should analyze with them since they will be dropped so that both models can have the same number of independent variables?
Thank you

Recode of two Variables

$
0
0
Hello everyone,

i have 20 different Variables for the religion of the interviewed. The Variables contains the different world religions like "christian" "Islam" "Chatholic" etc. I want to combine the different religions Variables to one Variable for the religion. The new Variable should countain the different Religions as Shapes.

Example:

Data Now:
v1 = Christian
v2 = Islam
v3 = Catholic

Goal Data:
v4 = Christian, Islam, Catholic


I already read something about the recode function but i don't know how the recode function work with more than one Variable.

Best regards,
Fritzi

About Random Effects in a Multilevel Survival Model

$
0
0
Goodmorning everyone. My training is mainly clinical, and I would be grateful that I can be helped. I assessed in a multilevel survival model the contribution of various clusters to the effectiveness of a preventive medicine intervention pertinent to the interception of new diagnoses of atrial fibrillation based on a screening program. Each cluster is identified by a primary care center and is categorized by the number of nurses working in the center. The “zero cluster” identifies a primary care center without nurses. My intention is to compare the contribution made to the diagnosis of atrial fibrillation by professional organizations provided by nurses with the cluster zero. The outcome of the analysis is the hazard of atrial fibrillation in the follow-up considered by the study. The multivariate regression model used is a multi-level parametric survival model that attributes an exponential distribution to basal hazard. I used the mestreg command for that analysis. I quantified the random component of the model relevant to each cluster (the estimated empirical Bayes residuals with their standard errors) using predict. The exponentialized difference between the point estimate of the residual pertaining to a cluster other than cluster zero with that relevant to cluster 0 should produce a cluster-specific Hazard Ratio. At this point I would also like to calculate the confidence intervals relevant to these Hazard Ratios and this is why I turn to your attention. Thanks for what you can suggest

What's the syntax for panel ARDL model with CCEMG estimator? Should the xtmg command be used?

$
0
0
Dear all,

I am learning Stata and trying to figure out how to use the xtpmg command with CCE Mean Group (CCEMG) estimator.


Recently I read the following paper:
Benos N., and Karagiannis S., “Inequality and Growth in the United States: Why Physical and Human Capital Matter”, Economic Inquiry, Vol. 56, No. 1, (2018), 572-619.

This paper runs shor-run and long-run estimates to investigates the relationship between economic growth, inequality and capital by using CCE Mean Group (CCEMG) estimator.

The model is as below:
yit=a1,iyit-1+a2,iineqit+a3,i(kit/hit)ineqit+a4(kit/hit)+uir,
where y, ineq, k and h are panel data variables. The lag length is 4.


I would like to know all the syntax used to estimate this model, but after reading the Stata help files and some articles regarding how to apply the xtpmg command, I'm still not sure what the syntax should be like.

Could anyone kindly let me know exactly each syntax I can use in this example to get the results or give some suggestions? I am using Stata 15 on Mac.

Thanks in advance!

P.S. The results (CCEMG GMM Long-Run and Short-Run Estimates, along with the Dynamic CCEMG Estimates) in the paper are as below. The syntax I want don't necessarily generate the same type of graphs as below. I put them here just to give an idea that what kind of parameters I'd like to get.




[ATTACH=CONFIG]temp_16509_1576811323794_311[/ATTACH] [ATTACH=CONFIG]temp_16511_1576811351106_809[/ATTACH] [ATTACH=CONFIG]temp_16512_1576812721518_799[/ATTACH]

Group of variables that explain most of the observations

$
0
0
Hello. I am working on a dataset where people were asked whether they use certain media outlets or not, with 50+ unique yes/no questions for each outlet. What I am trying to find is the minimum number of outlets that cover 95% of the population, which I think necessiates to use a function that stepwise adds up variables with most "yes" answers together and return a list. How can I achieve this in Stata?

Best regards.
Viewing all 72797 articles
Browse latest View live