Regressions with multiple splits

December 25, 2019, 3:05 pm

≫ Next: More examples with codes on Econometric model forecasting

≪ Previous: Ensuring correct model specification (diff-in-diff)

Hi!

I have a question regarding regression output I would like to get as efficiently as possible. I want 18 regressions that use the same dependent and independent variables. Also, the fixed effects are the same. In this case I want to split the sample up in three levels of KZ, negative and positive cash flows and young and mature firms. I show the codes for these variables below. This way I want to create one row with all cash flows, one with positive and one with negative. Each column represents the level of KZ and these columns are split up in young and mature firms.

The last part of the code shows a loop for the KZ levels already but I don't know how I can split these again.

Thanks in advance!

Code:

g negcf=0
replace negcf=1 if cf<0

bysort gvkey: egen firstyear=min(fyear)
g firmage=fyear-firstyear
g youngfirm=0
replace youngfirm=1 if firmage<10

bysort gvkey: egen group_median=median(KZ)
gquantiles q3_KZ = group_median, xtile nquantiles(3)

levelsof q3_KZ, local(groups)
foreach group of local groups {
eststo: quietly reghdfe rd L1.tobin icf if q3_KZ==`group', absorb(fyear sic gvkey) cluster(gvkey)
}

↧

More examples with codes on Econometric model forecasting

December 25, 2019, 3:16 pm

≫ Next: Creating variables from text

≪ Previous: Regressions with multiple splits

STATA manual has example of Klein's model on macro forecasting. Also it has one model with cross section data and one with panel data. 3 models are solved with codes. If I want to see more examples of Klein's type model with codes, where can I get more help?

↧

Creating variables from text

December 25, 2019, 7:42 pm

≫ Next: Forval

≪ Previous: More examples with codes on Econometric model forecasting

Hi,

I got some survey data. The raw data is in text, like for example there was a question about education where the responses are "undergraduate degree", "postgraduate degree" etc. Now, I'd like to turn this into binary variables (one "undergraduate" variable, one "postgraduate" variable etc). I know that this can be done manually though it's quite time-consuming, but I was wondering if this process could be automated somehow using STATA.

I was thinking of generating a variable and then have the variable take the value 1 when the response was in a certain way (so the value "undergraduate" takes the value 1 when the response is "Undergraduate degree". However, when I tried this I got a "type mismatch" error.

Is there any way to get around this or should I just start translating the data into 1's and 0's manually?

↧

Forval

December 25, 2019, 8:12 pm

≫ Next: How to incorporate exposure variable in Zero Truncated Negative Binomial Model

≪ Previous: Creating variables from text

Hi,
I have this little problem with the following code:

clear all
forval i=1/1{
use 201`i'-1.dta
merge m:m Semestre Carnet using 201`i+1'-1.dta
}

I work with two datas 2011-1.dta and 2012-1.dta, but the fourth line of my code dont work. Is there any way to do that in Stata?

Thanks in advance.

↧

How to incorporate exposure variable in Zero Truncated Negative Binomial Model

December 25, 2019, 10:33 pm

≫ Next: Performing survival analysis under GSEM

≪ Previous: Forval

Hello All,

This is my first time using this platform, so apologies in advance for any faux pas. Also, I am new to model building and statistics world, so please bear with me.
I am using a ZTNB model where my outcome is the number of waivered physicians at the ZCTA (Zip-code tabulation area) level and my explanatory variables are community characteristics - median income, race, education, marital status, rurality etc. I already checked for mean=variance assumption which was not met, hence I am using negative binomial and also my outcome has no zeros due to which I am using zero truncated. However, I am completely confused about the exposure variables that needs to be included in the model in order to account for any variance which may lead to biased estimation. Please correct me if I am wrong, but I was thinking to use population size as the exposure because population size will affect the number of waivered physicians available but I am confused (a) if this exposure makes sense and (b) how to incorporate exposure variable in the code. I read about it a lot and I am really confused because most of the forums, it was mentioned to write "exposure(varname)" after the tnbreg command but in one of STATA documentation, I also read to use "vce(cluster varname) nolog" after the tnbreg command. One last method, I read about was to enter "offset (log of variable)". Therefore, it would be really helpful to understand little bit about exposure variables and what is the right way to account for them in ZTNB model. Additionally, it would be nice to know if the way to use exposure variable changes with other count models - poisson, negative binomial, zero-inflated, zero-inflated negative binomial.

I need to submit an abstract for a conference and I am stuck here, so your quick replies can help me a lot.

Thank You,
Sadia Jehan

↧

Performing survival analysis under GSEM

December 26, 2019, 1:38 am

≫ Next: Principal component analysis in panel data setting

≪ Previous: How to incorporate exposure variable in Zero Truncated Negative Binomial Model

Before I go to my questions, I'll give a short background of my research proposal. Most longitudinal BMI-mortality observational studies use Cox PH to estimate the HRs for any given BMI category (i.e., underweight, normal, overweight, obese), controlling for co-existing illness (e.g., diabetes). However, due to the possibility of reverse causality between disease and BMI (to which the so-called 'obesity paradox' and the J-shaped relationship between BMI and mortality are attributed), these studies exclude the first 5 years of their data to remove individuals who are expected to die because of their underlying disease, and not because of their current BMI per se.

In my research, I proposed to use a cross-lagged panel model approach to address the issue of reverse causality, including smoking, age and sex as confounders... and thereafter, model survival. I wanted to see if the mortality HRs for each BMI category would differ using this approach. The cross-lagged panel model which I constructed using the model builder function of Stata, likewise incorporating survival into it, is as follows... based on my understanding:
Array

In this model, the disease/comorbidity in question is diabetes. Diabetes (categorical variable, y/n), BMI (ordinal variable; underweight, normal, overweight, obese 1, obese 2, obese 3), current smoking (categorical, y/n), age (continuous) and sex (m/f) were all recorded at 3 different time points. All variables in the model are observed data, no latent variables. Note the arrows added also accounted for autoregression.

1. With this way of constructing the model, is my understanding correct if I say that in order to estimate the HRs in this case, all direct and indirect effects (coefficients) from the predictor (e.g., diabetes1) to the outcome (timedth = time to death) be taken added then converted to HR?
2. When performing survival analysis under GSEM, should the format of the data be the same as when you would do the usual survival analysis in Stata?
3. Does this take censored data into consideration automatically, or should I create a separate dummy variable for when the data becomes censored?

Thank you so very much.

↧

Principal component analysis in panel data setting

December 26, 2019, 2:32 am

≫ Next: Plot count data with SE bars on Stata

≪ Previous: Performing survival analysis under GSEM

Hello to everyone,

I have a panel of 190 industries over the 2000-2018 period. my data-set contains 4 variables (x1-x4) that are correlated and convey similar information. I would like to do a principal component analysis and extract one variable that accounts for the common variability and correlation of the 4 variables. I type the following

Code:

bysort industry: pca x1 x2 x3 x4

the Principal component analysis is done per each industry (which takes same tome, as I have 190). Then i try to predict a single component, as on average it seems to explain the variation of the x1-x4. I type the following:

Code:

bysort industry: predict p1, score

of course, i get the message

Code:

predict may not be combined with by
r(190);

i read in some previous tread that principal component "pays no attention to panel structure"

https://www.statalist.org/forums/for...-in-panel-data

Should I give up on the PCA analysis in a panel data setting. One option is to split my dataset by industry and do PCA analysis 190 times, which is nonsense.

Any suggestions?

↧

Plot count data with SE bars on Stata

December 26, 2019, 3:13 am

≫ Next: How to run conditional indirect difference test in Stata 14.2

≪ Previous: Principal component analysis in panel data setting

Hi,

I am looking at code for making a line chart where I could plot two variables--- eg: no of times visited doctor (Y axis) across registration phase (axis) grouped across 3 regions.... I have seen people using marginsplot command after regressing the data. But i want to plot unit data generally rather than do any type of prediction.. Please find a sample image attached of the type of graph I want to reproduce.

↧

How to run conditional indirect difference test in Stata 14.2

December 26, 2019, 4:03 am

≫ Next: How to run conditional indirect difference test with categorical moderator in Stata 14.2

≪ Previous: Plot count data with SE bars on Stata

Hello all,

I am having difficulty to run conditional indirect difference test in Stata 14.2. Here is my research model: my IV is Transformational Leadership, DV is Job Performance, Mediating Variable is Work Engagement and Moderating Variable is Job Meaningfulness.
I run the conditional indirect effect test with 5000 resamples and a 95% bias-corrected confidence interval to determine whether indirect effects of transformational leadership on employee job performance are conditional upon employee job meaningfulness. Results show that at both high and low levels of job meaningfulness, the mediated relationship between transformational leadership and employee job performance via work engagement was significant, where confidence intervals did not include zero at both levels (in the case of the low level of job meaningfulness indirect effect = 0.113, S.E = 0.051; 95% CI = [0.025, 0.230]; in the case of the high level of job meaningfulness indirect effect = 0.047, S.E = 0.028; 95% CI = [0.006, 0.119]. Since the mediating effect is significant at both levels of job meaningfulness, I need to run another difference test analysis that shows the coefficients for both levels are significantly different from each other. But so far I cannot figure out how that analysis is done in Stata 14.2. Can anyone please help me with the syntax for that analysis?
Thank you for your time.

↧

How to run conditional indirect difference test with categorical moderator in Stata 14.2

December 26, 2019, 4:25 am

≫ Next: Creating a graph with shared areas that cover positive and negative numbers in the y-axis

≪ Previous: How to run conditional indirect difference test in Stata 14.2

Hello everyone!

Can someone please help me to run the conditional indirect difference test with categorical moderator using Stata 14.2? My research model: IV is Transformational Leadership, DV is Job Performance, Mediating Variable is Work Engagement, and Moderating Variable is Office Design (cellular is coded as 0, and open-plan design is coded as 1).
And my hypothesis is that transformational leadership will have an indirect effect on employee job performance via work engagement when the office design is open-plan rather than cellular. Results reveal that the mediated relationship between transformational leadership and employee job performance via work engagement is significant in open-plan office design (indirect effect = 0.149, SE = 0.073; 95% CI = [0.025, 0.320] as well as in the cellular type (indirect effect = 0.035, SE = 0.020; 95% CI = [0.001, 0.083]. Now I need to run another difference test analysis which shows that the coefficients for both levels are significantly different from each other. I have looked through some resources and watched videos about moderated mediation, but cannot find the syntax for conditional indirect difference test with a categorical moderator. Please help me with the Stata syntax to run that analysis.
Thanks a lot for your time.

↧

Creating a graph with shared areas that cover positive and negative numbers in the y-axis

December 26, 2019, 8:41 am

≫ Next: Looping through Import Delimited command using locals

≪ Previous: How to run conditional indirect difference test with categorical moderator in Stata 14.2

Hello,

I created a graph that includes markers for each one of the data points and shaded areas for periods. The chart looks fine when the plotted variable takes positive values. However, when it takes negative values, the shared areas are not shown for any negative value; they are just shown from zero to any positive value.

The following is the code I am using:

Code:

preserve
    collapse var, by(group year)
    bysort group : gen first = _n == 1
    expand 2 if first, gen(newvar)
    replace year = 2008 if newvar == 1
    replace var = . if newvar == 1
    sort group year
    drop first newvar
    set scheme s1color
    separate var, by(group) veryshortlabel
    twoway scatteri -3 2002.5 1 2002.5 1 2005.5 -3 2005.5, recast(area) color(red*0.2) || ///
    scatteri -3 2005.5 1 2005.5 1 2007.5 -3 2007.5, recast(area) color(red*0.3) || ///
    scatter var? year, ms(dh oh) legend(lab(3 "A") lab(4 "B") order(4 3) position(6) col(1)) yscale(r(-3 1)) ylabel(-3(.5)1) xlabel(2000(1)2007)
restore

An example of the dataset I am using is:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(individual year group var)
 1 2000 1  -1.287641
 1 2001 1  -.8836126
 1 2002 1 -1.1622982
 1 2003 1  -1.082307
 1 2004 1 -1.1710584
 1 2005 1 -1.1796787
 1 2006 1 -1.2466114
 1 2007 1 -1.1606842
 2 2000 1 -.15043515
 2 2001 1  -.2209688
 2 2002 1 -.19620353
 2 2003 1  -.3402078
 2 2004 1  -.4571495
 2 2005 1  -.4485733
 2 2006 1 -.50437814
 2 2007 1  -.4616068
 3 2000 1  -.8550537
 3 2001 1  -.8602186
 3 2002 1  -.8605418
 3 2003 1  -.8876227
 3 2004 1  -.9352865
 3 2005 1 -1.0311842
 3 2006 1 -1.0150617
 3 2007 1 -1.0626712
 4 2000 1 -1.5768597
 4 2001 1 -1.4882183
 4 2002 1  -1.474381
 4 2003 1  -1.541484
 4 2004 1  -1.532406
 4 2005 1 -1.5681677
 4 2006 1 -1.4693305
 4 2007 1 -1.1766837
 5 2000 1  -.9059771
 5 2001 1  -.9684552
 5 2002 1 -1.0976796
 5 2003 1   -.921452
 5 2004 1  -.9920011
 5 2005 1  -1.285856
 5 2006 1 -1.2974216
 5 2007 1   -1.80346
 6 2000 1   .3173624
 6 2001 1  .25653982
 6 2002 1  .14316559
 6 2003 1  .13082898
 6 2004 1  .08094823
 6 2005 1 -.05380845
 6 2006 1  -.1298359
 6 2007 1    -.43694
 7 2000 1  -.8702447
 7 2001 1 -1.0644857
 7 2002 1 -1.0852207
 7 2003 1 -1.2581582
 7 2004 1 -1.1840831
 7 2005 1  -1.003717
 7 2006 1   -.875128
 7 2007 1  -.9327229
 8 2000 1 -1.3506677
 8 2001 1  -1.451754
 8 2002 1  -1.716019
 8 2003 1 -1.9588758
 8 2004 1 -2.0517306
 8 2005 1  -2.854429
 8 2006 1  -2.816115
 8 2007 1   -1.93434
 9 2000 1  -.4100389
 9 2001 1   -.435545
 9 2002 1 -.49928665
 9 2003 1  -.5177228
 9 2004 1   -.534188
 9 2005 1 -.58403265
 9 2006 1  -.6003072
 9 2007 1  -.5960971
10 2000 0 -1.4216475
10 2001 0 -1.4646496
10 2002 0  -1.521221
10 2003 0  -1.599065
10 2004 0  -1.688853
10 2005 0  -1.668002
10 2006 0 -1.1764842
10 2007 0  -1.364062
end

I appreciate any help you could provide.

Thanks,

Mayra

↧

Looping through Import Delimited command using locals

December 26, 2019, 10:51 am

≫ Next: Extended Cox model (Cox model with time-varying covariates)

≪ Previous: Creating a graph with shared areas that cover positive and negative numbers in the y-axis

Hi Statalist Users,

I read a text delimited file using the standard command:

> import delimited "G:\Shared drives\Bank Stock Prices\Match\Gary\All_States_Features\AK_Features_ 20191101.txt", varnames(1) encoding(UTF-8)

The command worked as anticipated.

Now, I want to import data for all 50 US states plus some territories, which I save and merge. To do this, I wrote the loop.

foreach x in AK AL AR AS AZ CA CO CT DC DE FL FM GA GU HI IA ID IL IN KS KY LA MA MD ME MH MI MN MO MP MS MT NC ND NE NH NJ NM NV NY OH OK OR PA PR PW RI SC SD TN TX UM UT VA VI VT WA WI WV WY {
local ST "G:\Shared drives\Bank Stock Prices\Match\Gary\All_States_Features""`x'""_Featu res_20191101.txt"
display "`x'"
display "`ST'"
import delimited "`ST'", varnames(1) encoding(UTF-8)
* I have some commands here that save the information that I need as as DTA files. After the loop, I will append them to each other.
clear
}

When I run this code, it returns.

**********************

. foreach x in AK AL AR AS AZ CA CO CT DC DE FL FM GA GU HIIA ID IL IN KS KY LA MA MD ME MH MI MN MO MP
> MS MT NC ND NE NH NJ NM NV NY OH OK OR PA PR PW RI SC SD TN TX UM UT VA VI VT WA WI WV WY {
2. local ST "G:\Shared drives\Bank Stock Prices\Match\Gary\All_States_Features""`x'""_Featu res_201911
> 01.txt"
3. display "`x'"
4. display "`ST'"
5. import delimited ""`ST'"" , varnames(1) encoding(UTF-8)
6. clear
7. }
AK
G:\Shared drives\Bank Stock Prices\Match\Gary\All_States_Features\AK_Features_ 20191101.txt
using required
r(100);

end of do-file

*****************

The display commands tell me that STATA put `x' correctly into the file name, but the import command isn't reading the file name correctly. Can anyone help me get the syntax correct in the important command in my loop.

Thanks

Gary

↧

Extended Cox model (Cox model with time-varying covariates)

December 26, 2019, 10:53 am

≫ Next: Generate new variable by levelsof

≪ Previous: Looping through Import Delimited command using locals

Dear STATALIST

I would like to investigate the association of baseline treatment A (binary variable) with incident stroke events adjusted for covariates X, Y, and Z.
Proportionality assumption was not satisfied in A and X by Schoenfeld residuals and log-log plot.
Therefore, I try to use covariates A and X as a time-varying covariate.
In this setting, I have no idea which following formula is correct in Stata because someone used 1 and the other used 2;
1. stcox A X Y Z, tvc (A X) nohr
2. stcox Y Z, tvc (A X) nohr

If 1 is correct, you will get two beta coefficients from main and tvc for A.
How can I summarize the result for paper?
Is it OK with me to show the result like beta (main) + beta (tvc)*_t (time)? or other choices?

Could you please answer these questions?

↧

Generate new variable by levelsof

December 26, 2019, 11:35 am

≫ Next: xtabond2 for a dynamic GMM

≪ Previous: Extended Cox model (Cox model with time-varying covariates)

format %td datevar

↧

xtabond2 for a dynamic GMM

December 26, 2019, 6:36 pm

≫ Next: generating a new var and replacing its values with the previous year's values

≪ Previous: Generate new variable by levelsof

Hello dear all,

I face a problem when I run a dynamic GMM model for a panel data. My research is about the relationship between financial development level(DIFI) with the poverty level(POV).As for my model, POV is the dependent variable, DIFI is the independent variable,HLW IS UR and GOV are the control variables. GAP and GDP are the instrument variables. The equation (2) and (3) are the instrument equation. The equation (4) and (5) are the estimated equation.
Array
The first problem is that the Stata always illustrates a warning(Number of instruments may be large relative to number of observations).

The second problem is how to introduce individuals' fixed effects into a dynamic GMM model? Which command should I use? I have looked through "help xtabond2", but I do not find some instruction related to this problem.

Thank you. Array
Array

↧

generating a new var and replacing its values with the previous year's values

December 26, 2019, 7:39 pm

≫ Next: trend and time

≪ Previous: xtabond2 for a dynamic GMM

Hi everyone,
A sample of my data looks as follows (my current data ). As you can see, the person 1's starting health status (health_1) is H (healthy) in 2000 then becomes U (unhealthy) in 2002 and again H (healthy) in 2004. I would like to generate a new health variable (e.g., health_2) and replace it with the respondent health status in the previous wave.
Thanks.

Nader

*My current data

Code:

clear all
input id year str6 (health_1)
1 2000 "H"
1 2002 "U"
1 2004 "H"
2 2000 "H"
2 2002 "H"
3 2000 "H"
3 2002 "H"
3 2004 "H"
end
list

*My goal

Code:

clear all
input id year str6 (health_1 health_2)
1 2000 "H" ""
1 2002 "U" "H"
1 2004 "H" "U"
2 2000 "H" "H"
2 2002 "H" "H"
3 2000 "H" "H"
3 2002 "H" "H"
3 2004 "H" "H"
4 2005 ""  "new sample"
end
list

↧

trend and time

December 26, 2019, 7:54 pm

≫ Next: Alternative of Cronbach alpha (for single item measures)

≪ Previous: generating a new var and replacing its values with the previous year's values

I want regress difference model in panel data.
using two year data, for example, I want to estimate the effect of independent variable with trend.
but the problem is that when I use the gen time=_n, the time order is mixed up every time, so the result became different whenever I analyze.
The other problem is that if I made the time variable , for example 1, 2 for every id, because of colinearity, it came out zero
How can I handle that?

↧

Alternative of Cronbach alpha (for single item measures)

December 26, 2019, 8:46 pm

≫ Next: Chi-square for subsample

≪ Previous: trend and time

Hi,
I've data on patients subjective health measured by a single question "How is your health today". I've to check the validity of this question so I tried 'Cronbach alpha', but it doesn't work for single item measures. Is there any alternative of it in Stata 14 that can do the trick?
Thanks.

↧

Chi-square for subsample

December 26, 2019, 8:57 pm

≫ Next: drop/keep with multiple conditional statement

≪ Previous: Alternative of Cronbach alpha (for single item measures)

Hi all,
A snapshot of my data is below (I have 950 data points, so its only a small section). I am running a chi-square to test the frequency of each stage vs. country using the command below:

HTML Code:

tab stage1 country, chi2
tab stage2 country, chi2
tab stage3 country, chi2
tab stage4 country, chi2

Now I'd like to compare only certain countries for each stage. How would I run a test, for example, to compare stage1 for country "1" and "3" only to see if there is a significant difference?
TIA

Array

↧

drop/keep with multiple conditional statement

December 26, 2019, 9:39 pm

≫ Next: Interaction terms between a dummy and continuous variable

≪ Previous: Chi-square for subsample

Hi everyone,
A sample of my data looks as follows (my current data). I am trying to keep those observations whose marital status is married in at least one year. In other words, I wanna keep him/her in the data even if his/her marital status changes afterwards. Likewise, if a person's marital status is unmarried, I wanna drop him/her out. Thanks.
Nader

*My current data

Code:

clear all
input id year str16 (marital)
1 2000 "married"
1 2002 "married"
1 2004 "married"
2 2000 "never married"
2 2002 "never married"
3 2000 "never married"
3 2002 "married"
3 2004 "divorced"
4 2002 "never married"
4 2004 "married"
5 2002 "divorced"
5 2004 "married"
end
list

*My goal

Code:

clear all
input id year str16 (marital)
1 2000 "married"
1 2002 "married"
1 2004 "married"
3 2002 "married"
3 2004 "divorced"
4 2004 "married"
5 2004 "married"
end
list

↧