Quantcast
Channel: Statalist
Viewing all 72758 articles
Browse latest View live

Can strL be disabled? It is breaking all my pre-16 code.

$
0
0
The strL data type is clearly useful, but the fact that it can't be used for a merge is breaking
backward compatibility with Stata 15 and earlier. I don't know if strL is new to Stata16, or if
Stata16 is just more aggressive about using it as a data type.

My prior code is breaking because variables that were formerly `str##` are now being read by Stata
as `strL`, so they cannot be used as merge keys. My research team has written about 200,000 lines of
Stata code over the last 10 years, and these breakages are happening left and right. In many cases,
the efficiency replacement is not remotely worth it-- today some code broke while running a string
replacement using a replacement file that was only 10 strings in length.

In this specific case, the 10-line string file was imported with `import delimited`, and Stata
defaulted the key string to strL, breaking the merge.

I know I can recast these to the `str##` format, but I would like to avoid updating our codebase in
thousands of places.

Ideally, I would like to ask Stata to avoid using the `strL` format except when I specifically
request it. Alternately, to avoid using the `strL` in cases when the string length is less than some
character length, like 100.

Can this be done?

Thank you!

-p

rangestat from SSC in combination with percentile function

$
0
0
Concerning rangestat from SSC

Hi,

I am looking for a way to combine rangestat with a percentile function. In the dataset below, my goal is by firm for each date to get the value of the 94%-percentile of all previous observations of the variable "value". I know that it is possible to get the median with the rangestat function, but is it also possible to combine rangestat function with a specific percentile?

Thanks a lot in advance and regards
Marian Appel-Graham


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int date str1 firm byte value
21915 "A"  1
21916 "A"  2
21917 "A"  3
21918 "A"  4
21919 "A"  5
21920 "A"  6
21921 "A"  7
21922 "A"  8
21923 "A"  9
21924 "A" 10
21925 "A" 11
21926 "A" 12
21927 "A" 13
21928 "A" 14
21929 "A" 15
21930 "A" 16
21931 "A" 17
21932 "A" 18
21933 "A" 19
21934 "A" 20
21935 "A" 21
21936 "A" 22
21937 "A" 23
21938 "A" 24
21939 "A" 25
21940 "A" 26
21941 "A" 27
21942 "A" 28
21943 "A" 29
21944 "A" 30
21945 "A" 31
21946 "A" 32
21947 "A" 33
21948 "A" 34
21949 "A" 35
21950 "A" 36
21951 "A" 37
21952 "A" 38
21953 "A" 39
21954 "A" 40
21955 "A" 41
21956 "A" 42
21957 "A" 43
21958 "A" 44
21959 "A" 45
21960 "A" 46
21961 "A" 47
21962 "A" 48
21963 "A" 49
21915 "B"  2
21916 "B"  4
21917 "B"  6
21918 "B"  8
21919 "B" 10
21920 "B" 12
21921 "B" 14
21922 "B" 16
21923 "B" 18
21924 "B" 20
21925 "B" 22
21926 "B" 24
21927 "B" 26
21928 "B" 28
21929 "B" 30
21930 "B" 32
21931 "B" 34
21932 "B" 36
21933 "B" 38
21934 "B" 40
21935 "B" 42
21936 "B" 44
21937 "B" 46
21938 "B" 48
21939 "B" 50
21940 "B" 52
21941 "B" 54
21942 "B" 56
21943 "B" 58
21944 "B" 60
21945 "B" 62
21946 "B" 64
21947 "B" 66
21948 "B" 68
21949 "B" 70
21950 "B" 72
21951 "B" 74
21952 "B" 76
21953 "B" 78
21954 "B" 80
21955 "B" 82
21956 "B" 84
21957 "B" 86
21958 "B" 88
21959 "B" 90
21960 "B" 92
21961 "B" 94
21962 "B" 96
21963 "B" 98
end
format %tdnn/dd/CCYY date

Time dummies in SYS-GMM model

$
0
0
Hi,

I estimated SYS-GMM model with time dummies for different periods (like 2006-2008, 2009-2015, 2016-2018). I chose these periods after analysing patterns in my dependent variable. My dependent and independent variables are relative to Asian average. Is it good to include these time dummies - I'd like to extract trend in my data by these dummies. Is my justification correct?
What if I add to set of independent variables some variables which aren't relative to Asian average.

adding a zero in the middle of a string variable

$
0
0
Dear Statalist users,

I have a string variable(patient id) in the form
ED6/01/001
ED6/01/002
ED6/01/003

I would like to automatically convert it to the form
ED6/01/0001
ED6/01/0002
ED6/01/0003

Your assistance is much appreciated

Panel data- xtgls vs xtreg

$
0
0
Hello!

I am conducting research into the determinants of FDI using panel data. From previous studies I have seen fixed effects is generally preferred when performing this research.

I have a panel of 16 countries and 22 years and have performed unit root tests on each variable to ensure they are stationary. After that I performed tests for heteroscedasticity and autocorrelation, and concluded autocorrelation and heteroscedasticity are present in the data. Is there any additional diagnostic tests to perform on the data before using the 'xtoverid' function to test whether to use RE or FE? (I've read in previous posts that Hausman test is invalid when autocorrelation and heteroscedastity is present).

After running
xtreg "depvar" "varlist", re vce(cluster country)
xtoverid

The p value returned is <0.05, which suggests that FE should be used.

Given my panel dynamics is the function 'xtgls' or 'xtreg,fe' to be used?

e.g
xtreg "depvar" "varlist", fe vce(cluster country)
or
xtgls "depvar" "varlist", panels(hetero) corr(ar1) force.

Many thanks if anyone can help!
Sam

multivariate regression

$
0
0
Hello,

I try to analyze panel data using the following regression model.
All variables have been defined and only valid observations are left in the sample.
alpha i are firm fixed effects, and gamma t are time fixed effects. Can you tell me the command for this regression?
I would really appreciate your help.
Thank you!
Regards
Johanna Array

inverse of odds ratio

$
0
0
Hi,
I am doing a meta-analysis exploring the association between hyperglycemia and neurodevelopmental outcomes.In one study it is reported that hyperglycemia was associated with "decreased chance of survival without disability" at 2 years of age (odds ratio: 0.41; 95% confidence intervals 0.21, 0.78).
Is it possible to inverse the odds ratios and conclude that hyperglycemia was associated with "increased chance of survival with disability" at 2 years of age (odds ratio: 2.44; 95% CI:1.27; 4.76).

Interpreting an Unexpected Negative Beta in HMR

$
0
0
Hello!
I'm hoping someone can help me interpret an unexpected negative value for a beta that I obtained in a hierarchical multiple linear regression. I do not believe it is a suppressor variable as a) the sum of the squared semi-partials are not greater than the r-squared, b) the r-squared is less than .5, and c) there doesn't seem to be anything too wonky in the correlations (technical term!)
A little background about my study... I'm trying to see if one of three executive function assessments better predict academic achievement. The 3 IVs are a performance-based measure of inhibition, a teacher's rating of inhibition, and a teacher's rating of attention. The DV is a score on an academic test (i.e., reading, math, or science-- each one run separately). The teacher's rating of inhibition has negative beta values in about half of the models (I'm looking at 3 different subjects across 2 different grades).
I'm not sure how to interpret it or what to do about it, if it is uninterpretable.
I would greatly appreciate any help.
Thank you!!
--Emily

Run Many Regressions

$
0
0
Hello Forum Members,

I have loaded my 'DATA' file and I seek to run all of these models on a simple regression.

CAT = X1 + X2, for CAT1 = 1 and CAT2 = 1
CAT = X1 + X2, for CAT1 = 2 and CAT2 = 1
CAT = X1 + X2, for CAT1 = 3 and CAT2 = 1
CAT = X1 + X2, for CAT1 = 4 and CAT2 = 1
CAT = X1 + X2, for CAT1 = 1 and CAT2 = 2
CAT = X1 + X2, for CAT1 = 2 and CAT2 = 2
CAT = X1 + X2, for CAT1 = 3 and CAT2 = 2
CAT = X1 + X2, for CAT1 = 4 and CAT2 = 2

DOG = X1 + X2, for CAT1 = 1 and CAT2 = 1
DOG = X1 + X2, for CAT1 = 2 and CAT2 = 1
DOG = X1 + X2, for CAT1 = 3 and CAT2 = 1
DOG = X1 + X2, for CAT1 = 4 and CAT2 = 1
DOG = X1 + X2, for CAT1 = 1 and CAT2 = 2
DOG = X1 + X2, for CAT1 = 2 and CAT2 = 2
DOG = X1 + X2, for CAT1 = 3 and CAT2 = 2
DOG = X1 + X2, for CAT1 = 4 and CAT2 = 2

FROG = X1 + X2, for CAT1 = 1 and CAT2 = 1
FROG = X1 + X2, for CAT1 = 2 and CAT2 = 1
FROG = X1 + X2, for CAT1 = 3 and CAT2 = 1
FROG = X1 + X2, for CAT1 = 4 and CAT2 = 1
FROG = X1 + X2, for CAT1 = 1 and CAT2 = 2
FROG = X1 + X2, for CAT1 = 2 and CAT2 = 2
FROG = X1 + X2, for CAT1 = 3 and CAT2 = 2
FROG = X1 + X2, for CAT1 = 4 and CAT2 = 2

My attempt to do it is right here:
foreach OUT in CAT DOG FROG {
for values CAT1 = 1/4 { forvalues CAT2 = 1/2 {
regress `OUT' X1 X2 if CAT1 = CAT1 & CAT2 = CAT2 }}}

But when I do this actually I do not get all of the models. I should end with 24 but only get 8. Where do I improve it?

How do I count the number of variables when using abbreviations?

$
0
0
I can see the total number of variables in my data using di c(k)



How can I use this pattern if I just want to see a subset?

Say I have:

Code:
sysuse auto, clear
 describe m*
What command can I use to return a value of 2?

Treatment effects estimation with spillovers

$
0
0
Hi.

Does anyone know if there's a Stata module to estimate treatment effects in the presence of spillovers?

I'm especially interested in outcomes where there are spillovers between different units of the treated -- rather than spillovers between the treated and untreated group. thanks

Discrete Choice Model with Alternative-Specific Variables that do not vary across individuals

$
0
0
Hello all,

I got a dataset with different choices for several individuals. I also have different individual characteristics, as well as alternative specific characteristics. The problem is that these alternative-specific characteristics do not vary across individuals, so I cannot include them in conditional logit models (asclogit command in Stata).

My question is: do a discrete choice model allowing alternative-specific variables that do not vary across individuals exist?

For helping the case, this is an example of the dataset i have:

id chosen alternative alternative_price gender income
1 0 1 10 1 500
1 1 2 20 1 500
1 0 3 30 1 500
2 1 1 10 0 750
2 0 2 20 0 750
2 0 3 30 0 750
3 0 1 10 0 250
3 0 2 20 0 250
3 1 3 30 0 250

Where id is an individual identifier; chosen =1 when individual i chooses that alternative; alternative_price is the price of each alternative (look is the same for every individual); gender is 1 if the individual is a woman 0 otherwise; and income is individual income (note that varies across individuals but not within individuals).

Thank you very much.
Best

Propensity Score Matching within cluster

$
0
0
Hi,

I have a question about implementing the propensity score matching within cluster using psmatch2.

My dataset includes the variables: treatment variable (RTW_count) , various firm-level controls (tobin_w cash_w roa_w debtratio_w lnassets_w leverage_w), and industry cluster variable (firstwo_naics). See below for the data.

I adjusted the codes provided in the psmatch2 help page in the following way:
Code:
egen g = group(firstwo_naics)
levels g, local(gr)
foreach j of local gr {
psmatch2 RTW_count tobin_w cash_w roa_w debtratio_w lnassets_w if g==`j'
}


However, this code does not seem to be working.

The stata returns the following on the screen with no further description:

. foreach j of local gr {
2. psmatch2 RTW_count tobin_w cash_w roa_w debtratio_w lnassets_w if g==`j'
3.
. }


My goal is to have the variables created after running psmatch2 without using the cluster variable, but the code above does not create new variables such as "_treated" or "_support".


Much would be appreciated if someone can comment on this!



















Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(RTW_count tobin_w cash_w roa_w debtratio_w lnassets_w leverage_w) byte firstwo_naics
0 1.4117475    .1255888   .01818895  .41935185  7.075738   11.495863 33
0 1.0394835    .0533733 -.028822854   .4413978  7.357684   1.7078707 33
0 1.5396392   .04172599   .05432045   .6400175  9.119431   4.0838575 32
0 1.8424835   .04193934  .004954757   .3715897  7.364347    3.441593 33
0  1.381001  .034328308  .030666623   .3049008  10.32833    2.401133 32
0 1.2163697  .021849006   .06155884  .28422347  7.537855    .7655392 33
0 1.1130164  .007099738  .023159573   .3064277 10.178806   1.0946697 22
0 1.0275791   .22798973   .06355332  .09566192 11.881747    .3628804 33
0 1.2394315   .05167659   .08594223   .1057986  5.716475   .23590237 33
0  2.485095  .030012894   .04960003   .1596793  7.122329    .4773329 33
0  1.338234     .127203  .021276595    .299516  9.301186     .915434 32
0 1.1242658  .002595821   .02218049  .39475325 11.834386   1.3043437 22
0 1.3676734   .06147165    .0699737   .2639509 10.800534    .9225397 42
0   1.14155   .04974028 -.035062816   .6031046  10.40765    8.101419 22
0  1.215149 .0021540194   .02763432   .3457141 10.517534   1.2508574 22
0 1.2765794   .07084684  -.05155954  .16686454  6.434964   .38130435 33
0  1.513258    .2616633  .064705886   .3403651  8.503095   1.5537037 33
0 1.1593891   .09081676   .05755396  .13889123  10.76342    .3373073 32
0   1.33144   .02634814   .06614867  .27658588 10.671626    .9008839 42
0  1.280555   .00878693   .06019711  .10917515  12.75476    .1970656 32
0  1.415927  .063277066   .08701111  .18293536 10.790967    .3847186 42
0 1.0194193   .02826197  .022338806   .3118294 10.986817    .8181334 22
0 1.7501464   .04068061   .07393116   .3680548  8.491281   1.0070763 33
0  .9703704    .1735927    .0992181  .11776894  9.241318   .23342423 32
0 1.2468325  .018325264  .036066122   .3413759 10.083974      .91991 31
0  2.302936    .1725539   .09760378   .2188524  8.080985    .4748369 33
0 1.1312692  .007453904   .02667713  .26520205  8.942069    .5346691 32
0 1.0098481   .06741735 -.016890429   .4081132  8.843183     3.89931 33
0  1.064157   .10774326  -.05540624  .39689565   8.21775   1.1760724 33
0 1.7053282   .20321023   .06226149  .19770665  7.293603    .3929206 32
0   1.86859    .1737166   .04632824  .28761142 8.7919445    .8580216 33
0   1.10013    .1099058  .015500203   .6371536 13.461824    4.024349 32
0 1.6567788   .08106069   .05258825   .2925457  7.977659     .657416 33
0  1.285767 .0007065813 -.000665061 .000621123  6.216019 .0006457226 42
0  2.438212   .09039787   .09638409  .25069538  9.713416     .746086 32
0   1.65619   .21340063   .06573533  .12288298  8.338162   .26685715 33
0  1.293333   .02008421    .0264337   .2380384 10.701287    .7798185 42
0 1.1129006   .11501994  -.07231609   .3557026 12.641514    3.549208 33
0 1.2727318  .033515073  .002939919  .56139976  8.632733    3.857965 42
0  2.100004    .3278434   .17910004          0  5.356577           0 33
end

pystata installation in Python

$
0
0
The StataCorp presentation: Call Stata from Python available at https://www.stata.com/meeting/us20/s...0_Zhao_Xu.html has the following commands to be run within Python either through IDLE or Jupyter or Spyder for configuration

Code:
import sys
sys.path.append("C:/Program Files/Stata/utilities")
from pystata import config​​​​​​​
config.init()
​​​​​​​
I tried the same using Python extension in VS Code and get the following error on line 3 of the code

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pystata'

I tried the following without any luck
Code:
 pip install pystata
A simple google search directed me to the following Github page https://github.com/cpbl/pystata-MOVED-TO-GITLAB , which seemed to be the right place but I was not able to figure out the installation process on my own.

Any help would be deeply appreciated @ Zhao Xu


Panel data, calculating a variable's growth rate for each panel

$
0
0
Dear all:

I have US nation wide county-level data and a time period from 2012 through 2017. My panel id is "FIPS". One of my variable is the county obesity rates. Now I want to calculate the growth rate of obesity rates from 2012 to 2017 for each FIPS. Here is my code:
Code:
 egen obese_change = (obese[_n] - (obese[_n-6]))/ obese[_n-6] if year==2012, by(FIPS)
But I got result says :
Code:
 unknown egen function (()

If I switch to :
Code:
 gen obese_change = (obese[_n] - (obese[_n-6]))/ obese[_n-6] if year==2012, by(FIPS)
Then I got result says :
Code:
 option by() not allowed
Could anyone offer some help?

Thanks a lot!

How to solve the problem of insufficient observations when using xtreg?

$
0
0
Hi,

I am trying to study the difference in intergenerational income mobility between Muslims and Hindus in India. I am working with panel data in long format with rounds in 2005 and 2012. I have used xtset IDHH year (IDHH is the unique household id).

xtdescribe result:

IDHH: 10201010, 10201130, ..., 3.402e+09 n = 7083
year: 2005, 2012, ..., 2012 T = 2
Delta(year) = 1 unit
Span(year) = 8 periods
(IDHH*year uniquely identifies each observation)

Distribution of T_i: min 5% 25% 50% 75% 95% max
1 1 1 1 1 1 1

Freq. Percent Cum. | Pattern*
---------------------------+----------
4528 63.93 63.93 | .1
2555 36.07 100.00 | 1.

I have 2 religions and 3 castes. I have used the command: egen group= group(year CASTE RELIGION) to create six groups. However, I get insufficient observations error when I run the command: xtreg logchildwage logheadwage i.STATEID i.NPERSONS GENDER URBAN POOR AGE headage childage2 headage2 if group==1, r.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(year STATEID NPERSONS) float(logchildwage logheadwage) int(RELIGION CASTE) float group
2005 1  8 2.4713535 2.5513964 0 2 3
2005 2  4  3.327874 2.7198846 0 1 1
2005 1 13 1.7972836 1.7972836 0 2 3
2005 2  8 1.8725867 2.2131913 0 1 1
2005 3  9 2.5477076 2.8353896 0 1 1
2012 1  8  3.310443 4.1859117 0 1 5
2005 3  7 2.0794415  2.167995 0 2 3
2012 1  6   2.70805   2.70805 0 1 5
2005 2  7 2.7186005 2.1796038 0 3 4
2005 4  4  2.512788  2.512788 1 1 2
2005 4  5  2.512788  2.512788 0 3 4
2005 5  5 3.3718674 2.5986776 0 2 3
2005 5  4 2.4183996 3.8923995 0 1 1
2005 5  3  4.509575  3.816428 0 1 1
2005 5  4 2.4837954 2.4837954 0 1 1
2012 1  9  2.525729 3.4927645 0 3 8
2012 2  9  3.218876  3.624341 1 1 6
2012 2  9   1.89712  2.931194 1 1 6
2012 2  7  2.931194  2.931194 1 1 6
2012 2  8  2.813411  2.931194 0 1 5
2012 2  3 2.3025851  .5108256 0 1 5
2012 6  4  2.931194 3.1135154 1 1 6
2012 3  3  2.484907 2.3025851 0 2 7
2012 2  7  2.748872  2.525729 0 3 8
2012 2 10  2.931194  2.931194 1 1 6
2012 5  4  3.358638  3.218876 0 3 8
2012 5  3  2.525729  2.525729 0 2 7
2012 5  5  2.813411  3.218876 1 1 6
2012 5  5  2.511935 3.4011974 0 2 7
2012 5  5  3.610918 3.6888795 0 3 8
end
label values STATEID ZonalCouncils
label def ZonalCouncils 1 "Northern Zonal Council- 01", modify
label def ZonalCouncils 2 "Central Zonal Council- 02", modify
label def ZonalCouncils 3 "Eastern Zonal Council- 03", modify
label def ZonalCouncils 4 "Western Zonal Council- 04", modify
label def ZonalCouncils 5 "Southern Zonal Council- 05", modify
label def ZonalCouncils 6 "North Eastern Zonal Council- 06", modify
label values NPERSONS NPERSONS
label values RELIGION Religion
label def Religion 0 "Hindu", modify
label def Religion 1 "Muslim", modify
label values CASTE CASTE
label def CASTE 1 "General", modify
label def CASTE 2 "SC/ST", modify
label def CASTE 3 "OBC", modify

change the low-bound interval in rangerun with a loop

$
0
0
I would like to change the low-bound interval in rangerun with a loop. The idea is conveyed in the following code but rangerun does not take it that way. Is there a workaround?


foreach w in lowA lowB {
set more off
capture program drop myprog
program myprog
reg y AD2 HD3
end
rangerun myprog, interval(order `w' 0) verbose
}




Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(order y HD3 AD2) float(lowA lowB)
  1 2 0 0  2  2
  2 1 1 0  3  3
  3 0 0 0  4  4
  4 0 0 0  5  5
  5 1 1 0  6  6
  6 1 0 0  7  7
  7 0 0 0  8  8
  8 0 0 0  9  9
  9 1 0 0 10 10
 10 3 1 0 11 11
 11 1 0 0 12 12
 12 1 1 0 13 13
 13 1 0 0 14 14
 14 3 0 0 15 15
 15 6 0 0 16 16
 16 2 0 0 17 17
 17 2 0 0 18 18
 18 0 0 0 19 19
 19 0 1 0 20 20
 20 1 0 0 21 21
 21 0 0 0 22 22
 22 2 1 0 23 23
 23 0 0 0 24 24
 24 3 0 0 25 25
 25 1 1 0 26 26
 26 1 0 0 27 27
 27 2 0 0 28 28
 28 0 0 0 29 29
 29 2 0 0 30 30
 30 1 1 0  0 31
 31 1 0 0  0 32
 32 1 0 0  0 33
 33 1 0 1  0 34
 34 1 0 0  0 35
 35 0 1 0  0 36
 36 1 0 0  0 37
 37 0 0 0  0 38
 38 0 1 0  0 39
 39 0 0 0  0 40
 40 4 0 0  0  0
 41 0 0 0  0  0
 42 2 0 0  0  0
 43 2 0 0  0  0
 44 1 0 0  0  0
 45 1 0 0  0  0
 46 1 1 0  0  0
 47 1 0 0  0  0
 48 3 0 0  0  0
 49 0 0 0  0  0
 50 2 1 0  0  0
 51 0 0 0  0  0
 52 1 0 0  0  0
 53 2 0 0  0  0
 54 1 1 0  0  0
 55 0 0 0  0  0
 56 0 0 0  0  0
 57 2 0 0  0  0
 58 3 1 0  0  0
 59 1 0 0  0  0
 60 0 0 0  0  0
 61 3 0 0  0  0
 62 0 1 0  0  0
 63 3 0 0  0  0
 64 2 0 0  0  0
 65 2 0 0  0  0
 66 1 0 0  0  0
 67 2 1 0  0  0
 68 0 0 0  0  0
 69 1 0 0  0  0
 70 5 1 0  0  0
 71 1 0 0  0  0
 72 2 0 1  0  0
 73 1 0 0  0  0
 74 2 0 0  0  0
 75 0 1 0  0  0
 76 0 0 0  0  0
 77 2 0 0  0  0
 78 1 0 0  0  0
 79 0 1 0  0  0
 80 1 0 0  0  0
 81 3 0 0  0  0
 82 4 0 0  0  0
 83 2 1 0  0  0
 84 4 0 0  0  0
 85 0 0 0  0  0
 86 1 1 0  0  0
 87 1 0 0  0  0
 88 5 0 0  0  0
 89 0 0 0  0  0
 90 1 1 0  0  0
 91 0 0 0  0  0
 92 1 0 0  0  0
 93 1 0 0  0  0
 94 1 1 0  0  0
 95 1 1 0  0  0
 96 2 0 0  0  0
 97 1 0 0  0  0
 98 2 0 0  0  0
 99 0 0 0  0  0
100 5 1 0  0  0
end

Noconstant in xtregar

$
0
0
Hello all,
I am currently running analysis for a balanced panel data using least square dummy variable model. I want to know individual effect for each unit. However, since there is one unit will be automatically dropped from the model as the reference group. How can I do something like drop the constant and include all unit-dummy variables into the model.

Thank you very much.

asclogit to run McFadden's conditional logit model. model doesn't converge

$
0
0
Hi everyone,

I hope you are good. I am trying to understand college choice for high school seniors. Specifically, I want to know the relationship between school choice and institutional characteristics. I am running McFadden's conditional logit model. Below is an example of my dataset. I have a variable called student_id which indicates a student's id. I have a variable called school_id which indicates a school's id. I have a variable called instnm which is the name of the school. I have four unique schools and the variable school_mode tells you the number of unique schools. I have two institutional characteristic variables with the variable student_services describing the amount of money allocated to student services and the variable academic_support representing the money allocated to academic support.


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float student_id long school_id str91 instnm float(school_mode Dest_chosen) long(student_services academic_support)
1 151351 "Indiana University-Bloomington"   1 1 30001999  1822905
1 170976 "University of Michigan-Ann Arbor" 2 0 19154000 32586000
1 240444 "University of Wisconsin-Madison"  3 0 18868514 25590210
1 243780 "Purdue University-Main Campus"    4 0 26458803  5262139
2 151351 "Indiana University-Bloomington"   1 0 30001999  1822905
2 170976 "University of Michigan-Ann Arbor" 2 1 19154000 32586000
2 240444 "University of Wisconsin-Madison"  3 0 18868514 25590210
2 243780 "Purdue University-Main Campus"    4 0 26458803  5262139
3 151351 "Indiana University-Bloomington"   1 0 30001999  1822905
3 170976 "University of Michigan-Ann Arbor" 2 0 19154000 32586000
3 240444 "University of Wisconsin-Madison"  3 1 18868514 25590210
3 243780 "Purdue University-Main Campus"    4 0 26458803  5262139
4 151351 "Indiana University-Bloomington"   1 0 30001999  1822905
4 170976 "University of Michigan-Ann Arbor" 2 0 19154000 32586000
4 240444 "University of Wisconsin-Madison"  3 0 18868514 25590210
4 243780 "Purdue University-Main Campus"    4 1 26458803  5262139
5 151351 "Indiana University-Bloomington"   1 0 30001999  1822905
5 170976 "University of Michigan-Ann Arbor" 2 1 19154000 32586000
5 240444 "University of Wisconsin-Madison"  3 0 18868514 25590210
5 243780 "Purdue University-Main Campus"    4 0 26458803  5262139
6 151351 "Indiana University-Bloomington"   1 0 30001999  1822905
6 170976 "University of Michigan-Ann Arbor" 2 0 19154000 32586000
6 240444 "University of Wisconsin-Madison"  3 1 18868514 25590210
6 243780 "Purdue University-Main Campus"    4 0 26458803  5262139
7 151351 "Indiana University-Bloomington"   1 0 30001999  1822905
7 170976 "University of Michigan-Ann Arbor" 2 0 19154000 32586000
7 240444 "University of Wisconsin-Madison"  3 1 18868514 25590210
7 243780 "Purdue University-Main Campus"    4 0 26458803  5262139
end

I am using Stata 15 and I ran the following line:
Code:
asclogit Dest_chosen student_services academic_support, case(student_id) alternative(school_mode)
When I ran the line of code, Stata says "not concave" and "convergence not met". Does anyone know how I fix this? Any suggestion or solutions would be greatly appreciated. I promise to pay it forward.

Annual Average by Year and Calculate Percent

$
0
0
Hello Statlisters,

I have a dataset with 42 variables and 682 observations of all the starting level employment (counts) for all industries in the state of New York. The data is being used to determine the top ten industries (by race) in the state over the entire period.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str11 county int year long(accommodationan administrativea) int agriculturefore long(artsentertainme construction11 educationalserv)
"Albany"   2008 6672  7421 19 472  904 6363
"Albany"   2009 6348  6815 36 432  846 6546
"Albany"   2010 6456  6681 43 404  784 6609
"Albany"   2011 6694  6712 42 461  920 7247
"Albany"   2012 6652  7275 27 444  947 7146
"Albany"   2013 6847  7312 35 464  962 7158
"Albany"   2014 6912  7497 33 558 1257 7211
"Albany"   2015 7532  8531 47 677 1451 7344
"Albany"   2016 8448 12286 51 748 1317 7644
"Albany"   2017 8727 12950 74 656 1376 8191
"Albany"   2018 9725 12794 65 653 1418 8144
"Allegany" 2008  181    40  0   0   18  364
"Allegany" 2009  228    39  .   3    6  371
"Allegany" 2010  237    22  .   0   17  413
"Allegany" 2011  263    21  0   0   13  456
"Allegany" 2012  278    22  0   0    3  456
"Allegany" 2013  286    30  0   0    3  483
"Allegany" 2014  388    36  0   3    8  535
"Allegany" 2015  370    45  0   0    0  541
"Allegany" 2016  391    29  .   0   13  536
end
My old code that I saved from another project that is similar has the following:
Code:
bysort county  year: egen yr_agri = sum(agriculturefore)
then,
Code:
bysort county year: egen agrisum = sum(agriculturefore)
then,
Code:
 generate agripct = (agrisum / yr_agri )*100
I assumed that after running these codes I could simply use collapse (mean) var-var, by (year) but this ends up with all the variables having the same exact percentage across the data. Where am I going wrong? I am not super comfortable with loops so if there's a brut force way to first find the sum of employment for all sectors across all years by each county and then find the overall percentage that would be great.
Viewing all 72758 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>