Sample Selection Bias in survey-based studies

April 14, 2016, 5:57 am

≫ Next: Regression with continuous dependent variable with ordinal independent variables

≪ Previous: Decomposing differences using nldecompose for nonlinear model

Dear experts,

in my previous post I explained my research proposal a little more in detail. Since I have evaluate certain firm resources upon firm performance, I find a "standard" OLS regression suitable in my case. In addition, I believe that one indep. variable in my list is endogenous. Therefore, I find etregress a suitable tool to account for endogeneity with a growth as well as a selection equation. The concern I am having now is, whether or not my data suffers from sample selection bias that I need to account for in my calculations (Heckman, Inverse Mills Ratio).

My sample represents answers from a survey that I sent to the total population of spin-off companies a research institute (the largest and most representative one in the country) has ever spun out. From these companies I contacted, none ceised to exist (bancruptcy, merger, etc.) up to the survey date. Thus, survivorship bias is not a problem of my primary concern, is it? Of course, not all firms answered my survey, the response rate amounts to almost 90% however. Of course, firms are not alike, having a response rate of 90% should nevertheless be sufficient reason to believe that my sample is (more or less) balanced and representative.

Of course there is many sample selection bias (undercoverage bias, voluntary response bais)...but those a standard limitations in a majority of sample-based studies I believe.

My ultimate question is, do you see a bias that Iam not aware of and which I need to correct for in case to avoid having biased estimates? As mentioned, I research firm resources on firm performance (measured by revenue growth, etc.) and one endogenous variable (external investment=treatment). My sample has of course a few missing data points in the dependent variable Y (firm growth) since not all firms wanted to provide the data. AND, I do have the data only in one state, namely the growth for firms having received the treatment and for those who have not. Not vice versa. Do you believe I need to correct for sample bias via a Inverse Mills Ratio or similar measure?

Any help is much appreciated!
Alex

↧

Regression with continuous dependent variable with ordinal independent variables

April 14, 2016, 5:58 am

≫ Next: Exporting variable names and corresponding labels

≪ Previous: Sample Selection Bias in survey-based studies

Hello,

Given a dependent continuous variable and multiple independent ordinal variables (such as from a likert scale), how do I fit a regression model? I have seen a similar forum for R but I am not sure how to do this in Stata (http://stats.stackexchange.com/quest...ndent-variable). Do I need to standardize the ordinal variables? I am using Stata 12.

Any help is very much appreciated!

Thank you,
Rachel

↧

Exporting variable names and corresponding labels

April 14, 2016, 8:25 am

≫ Next: Trying to use foreach and if to define a local

≪ Previous: Regression with continuous dependent variable with ordinal independent variables

Hey,

Is there a way in STATA to export variable names and corresponding labels? I tried a few things with the "Export" and putexcel options and am not getting the right output.

Any help would be greatly appreciated.

Many thanks!

Claudia

↧

Trying to use foreach and if to define a local

April 14, 2016, 8:35 am

≫ Next: running post-estimation commands on stored models

≪ Previous: Exporting variable names and corresponding labels

Hi Stata users

I now have three days of experience with Stata and I really like the program so far. I have now run into a problem that I can't figure out how to solve, and I'm hoping one of you can help me.

I am trying to combine "foreach" and "if" to define a local value using the below code.
I get the error: "num not found"

I cant figure if I´m wrong on "foreach" or "if" or both.

Can you see what I am doing wrong? Or help me find an alternative solution?

Thank you for reading this.

Code:

foreach num of numlist 1/5 {
if num == 1 local Jname Squat
if num == 2 local Jname Squat_Jumps
if num == 3 local Jname Drop
if num == 4 local Jname Drop_Vertical_Jump
if num == 5 local Jname Static_Broad_Jump
use "`Jname'.dta"
.......
}

↧

running post-estimation commands on stored models

April 14, 2016, 8:39 am

≫ Next: Endogenous count variable

≪ Previous: Trying to use foreach and if to define a local

Hi there

I have stored a number of model estimates into Stata's memory:

streg i.var1 i.var2 i.var3 i.var4, d(exp)
est store A

streg i.var1 var2 var3 i.var4, d(exp)
est store B

etc...

I want to run the post-estimation command "estat ic" on each of these stored estimates, but I can't work out how.

Any advice greatly appreciated.

Thanks
Tim

↧

Endogenous count variable

April 14, 2016, 8:43 am

≫ Next: Summing across numerous variables if observation equals a specific value

≪ Previous: running post-estimation commands on stored models

Consider two different models.
(1) DV of a duration (a length of time in minutes)
(2) DV is a frequency (a count)

Their regressors are:
(1) volume1 volume2 and countR, where countR is endogenous
(2) control variables affect both the DVs and the regressor countR
(3) firm, day of the week and week count dummies

Furthermore, the dependent variables need to somehow be considered together. In other words, what is being studied can occur for long periods of time frequently or rarely for short periods of time, etc.

DV (1) is on a daily, per product per firm basis. There are many zeros and it is also within a range (has a cut-off maximum)
DV (2) is a total sum over the entire period (8 weeks) of data

Prior research shows two-way and 3-way interaction for volume1, volume2, and countR

My data is showing me that volume1 and volume2 are highly correlated. So I may have to use them in separate models. But in that case, how do I test for the 3-way interaction?
My contribution will be the data itself (the DVs). They are actual measures of a phenomenon, instead of estimations. So I would like to keep the model as close as possible to prior theory development. Prior work used a simulation for the 3 factors (volume1, volume2, countR) and ran ANOVA tests for their count output.

A second contribution is a 0/1 dummy that tests the effects of a new technology in this field. Using the technology *should* decrease duration and frequency of phenomenon occurrence.

If there is no user-defined command, please advise on manually running this. I've looked at ivreg2 and ivpois and am not sure on incorporating interaction effects across stages on the latter command. Please let me know if I've neglected including a detail of the setup. To reiterate, I am interested in studying the same relationships between the three variables (volume1, volume2, countR) as prior literature, just with using exact measures of the phenomenon instead of estimated or simulated ones.

Thanks.

↧

Summing across numerous variables if observation equals a specific value

April 14, 2016, 8:44 am

≫ Next: recoding sequence of variables to create total score (Griffiths Mental Development Scales)

≪ Previous: Endogenous count variable

Hey!

I have 9 different variables that I want to sum up. But I only want to sum the 1's.
I tried: egen new_varname = rowtotal(var1-var9) if var1-var9==1

But that is giving me missing values since not all observations==1.

If there e.g. three 1's in the var range, the number I should get is 3.

Can someone help?

Thanks so much.

Claudia

↧

recoding sequence of variables to create total score (Griffiths Mental Development Scales)

April 14, 2016, 9:00 am

≫ Next: How to open a ".mlib" files with Stata?

≪ Previous: Summing across numerous variables if observation equals a specific value

Hi!
I need to recode some developmental data from a scale called Griffiths Mental Development Scales for over 400 children. Each subscale includes in 90+ items, and the database is at the item level. In these scales, children have a unimportant pattern of missings (.), successes (0) and achievements (1), and then achieve a baseline level (the highest 6 consecutive successes). After that you apply the scale until the child reaches the ceiling level (the lowest 6 consecutive failures after the ceiling), followed again by a unimportant pattern of missings (.), successes (0) and achievements (1). The total score is equal to the highest item of the baseline level, plus the number of successes after that until reaching the ceiling (6 failures). Alternatively one could think that the total could be computed by calculating the total sum of successes per child per subscale if all items below the baseline level were recoded to 1, and all items above the ceiling level were recoded to 0.

It's kind of simple, but I am just having trouble programming it. Do you have suggestions?

I am using Stata 13.1 in Windows 10.

Thanks, Clara Barata

ps- I couldn't install dataex (stata says cannot write in directory c:\ado\plus\d; r(603) so I couldn't send a proper example.

↧

How to open a ".mlib" files with Stata?

April 14, 2016, 9:08 am

≫ Next: Regressors and Hausman test

≪ Previous: recoding sequence of variables to create total score (Griffiths Mental Development Scales)

Dear all,
I installed an online Stata code (mergersim) in Stata 11.
The program consists several files which were installed in C:\ado\plus\m and C:\ado\plus\l.
One of the files is a ".mlib" file. I'm trying to open it with Stata without success.
I got the following error:
file C:\ado\plus\l\lmergersim.mlib not Stata format
r(610);
Is there a way to open a ".mlib" files with Stata?
Thank you for your time,
Anat

↧

Regressors and Hausman test

April 14, 2016, 9:18 am

≫ Next: estimate weighted cross-classified models

≪ Previous: How to open a ".mlib" files with Stata?

Hello everybody,
I would really appreciate your help.

I estimated a model with OLS, however two variables (attendance at lectures and attendance at seminars) were suspected of being endogenous.

Therefore, I proceeded to the 2SLS estimation. I had three possible instruments, however only one of them prooved to be appropriate. I consulted this issue with my teacher and she told me to do not include the variable Seminars in the 2SLS estimation, as it was very insignificant in the OLS. So I estimated the model using one endogenous variable (Lectures) and one instrument.

Now I would like to do the Hausman test. The teacher told me, that I must have the same regressors. But now I do not know - should I compare the models both with the variable Seminars included or excluded? I tried many ways and it always shows the similar result - not rejecting the null hypothesis. But I have to insert one of the Hausman test in my study and I am not sure which one, eventhough the result is the same.

This connects to the next question. I found out that OLS is better. What should I mention in the conclusion? Should I interpret the results of the estimation with or the variable Seminars? I am not sure because the very first OLS estimates were with the variable Seminars.

I am really confused.

Thank you very much,
Romana

↧

estimate weighted cross-classified models

April 14, 2016, 10:47 am

≫ Next: Comparing Characteristics of Ads in Two Different Years

≪ Previous: Regressors and Hausman test

Dear all,

Is the function xtmixed able to incorporate sampling weight to estimate cross-classified models? If it is able to, could you kindly direct me to the source?

Thanks.

Ling

↧

Comparing Characteristics of Ads in Two Different Years

April 14, 2016, 10:56 am

≫ Next: 2-way cluster using logit2 by Peterson - way to get marginal effects?

≪ Previous: estimate weighted cross-classified models

Hello! Many thanks in advance for any help.

I am working on a thesis where I am comparing characteristics of ads in 1975 and 2015. I am looking at characteristics such as "Woman has Utilitarian Grip" where the variable can take on two values - 1 for yes and 2 for no. I was hoping to find a test that compares the proportion of yes to no in 1975 vs. 2015. Each observation is one ad that is either from 2015 or 1975 (I've included a picture of the data browser to better explain). Is there a test that will test if there is a change from 1975 to 2015? Thanks!

Array

Jennifer

↧

2-way cluster using logit2 by Peterson - way to get marginal effects?

April 14, 2016, 11:14 am

≫ Next: regress the stock returns on a constant

≪ Previous: Comparing Characteristics of Ads in Two Different Years

My question is similar to one that was posted a while back but unanswered: http://www.stata.com/statalist/archi.../msg00196.html

I am using the logit2 ado file by Peterson (at http://www.kellogg.northwestern.edu/...rogramming.htm) to 2-way cluster on a dataset of ~15,000 observations with 50 binary variables in Stata 14.1; however, I am unable to use the margins command following logit2. Error: e(sample) does not identify the estimation sample. Does anyone have suggestions?

Also, is there a way to get the ORs to display with logit2 (other than manually exponentiating the coefficients). Currently, I get an error that the OR option is not allowed. Thanks in advance.

↧

regress the stock returns on a constant

April 14, 2016, 12:32 pm

≫ Next: ANOVA and Post-hoc estimates using data including multiple imputations

≪ Previous: 2-way cluster using logit2 by Peterson - way to get marginal effects?

Hi guys,

I am hoping to get some help.

I have a list of 20 companies' stock returns over the last 12 months and I would like to regress the stock returns on a constant. How can I achieve that ?

Then how can I obtain the variance-covariance martrix from the regression?

Thank you for your help

↧

ANOVA and Post-hoc estimates using data including multiple imputations

April 15, 2016, 6:06 am

≫ Next: Counting events in event study

≪ Previous: regress the stock returns on a constant

Dear all,

once more, multiple imputation is the bane of my existence

I hope somebody can help me with what started out as a small little calculation. I couldn't find much info on the web and the one I found were all looking at multi-factor designs and/or repeated measures, which is not what I am trying to do and I didn't get how I could apply that to my problem. Whats more, the imputations in my data set make it very hard to adapt anything to my problem, since most commands do not work. I am using Stata 13.

My main problem is: I am simply trying to simpy compare five group means for a 4 point scale.
But since the data had a significant amount of missing data, I decided to use multiple imputations (m=5) to handle the data. I was easily able to get a table combining the means of each group over all imputations using

Code:

 mi estimate: mean scale, over(gruppe)

Now I want to know if the difference in the means are signifcant and which groups differ from others. Normally I would simply use the anova or oneway command with post-hoc tests and be done with it. The Problem is stata does not allow the anova or oneway. command with the mi: estimate

After some digging I found kind of a work around, by using the mixed command with effect coding (following the advise from the articel Ginkel and Kroonenberg, 2014).
This is the Syntax I used:

Code:

  mi estimate: mixed scale Gruppe2 Gruppe3 Gruppe4 Gruppe5 ||

[ATTACH=CONFIG]temp_4680_1460724578994_452[/ATTACH]

So fare so good. But as far as I understood, the results from this analysis only show me which groups differ signifcantly from the grand mean, but not which groups differ signficantly from each other, as I would get from post-hoc tests. Unfortunately Ginkel and Kroonenberg do not adress post-hoc tests.

In addition I discovered that I can force Stata to run the anova command anyway, by using the cmdok option. However, the result look more like the ones from the mixed model. So basically I end up with the same Information.

[ATTACH=CONFIG]temp_4681_1460724681192_402[/ATTACH]

I tried to use the contrast postestimation command to get a post-hoc like result. But, once again, that doesn't work with mi:estimate. If I try to force it to run, using the cmdok option again, I get an error message.

Code:

 requested action not valid after most recent estimation command
an error occurred when mi estimate executed contrast on m=1

Does somebody know a way around it? I simply need a measure to see if the groups differ in order to report it.

I would gladly appreciate any help or suggestions.

Thanks in advance.

P.S: Sorry for the Long post. I hope it makes any sence at all.

↧

Counting events in event study

April 15, 2016, 6:37 am

≫ Next: scatteri options

≪ Previous: ANOVA and Post-hoc estimates using data including multiple imputations

Dear all

I have a dataset looking like this:

Date	Company_id	event date	estimation_window	event_window	event_id
2015-01-01	1		1		1
2015-01-02	1		1		1
2015-01-03	1				1
2015-01-04	1			1	1
2015-01-05	1	2015-01-05		1	1
2015-01-06	1			1	1
2015-01-07	1
2015-01-08	1		1		2
2015-01-09	1		1		2
2015-01-10	1				2
2015-01-11	1			1	2
2015-01-12	1	2015-01-12		1	2
2015-01-13	1			1	2
2015-01-14	1
2015-01-15	1
2015-01-01	2		1		3
2015-01-02	2		1		3
2015-01-03	2				3
2015-01-04	2			1	3
2015-01-05	2	2015-01-05		1	3
2015-01-06	2			1	3
2015-01-07	2

I want to create the variable called event_id. This variable counts the number of events and saves the event number for all rows from the beginning of the estimation window to the end of the event window. In the dataset there is a gap of one day between the estimation- and event window. In some cases the estimation window do not have all observations for the first days and in some cases the event window do not have all observations for the last days.

This example is a simplification of my real dataset. In the real dataset the event- and estimation windows are longer.

Could someone please help me with this?

Regards

↧

scatteri options

April 15, 2016, 7:47 am

≫ Next: Case-crossover study: conditional logistic regression

≪ Previous: Counting events in event study

I am graphing 3 way interaction based on plotting points derived from calculating the adjusted mean value of peer support (y) for Black and White males and females at 2 different values of SES (4 & 8). For some reason I am running into issues drawing lines that aren't connected to each other. Does anyone know how to use the connect option to connect the points to draw 4 separate lines? When I use [connect(l)] at the end of the command the 4 separate lines are connected. Using connected after "twoway" results in error. I have looked at all of the manuals for scatteri and scatter and haven't found an example that fits. I also thought that I may have to divide the lines by [ || ] like you would if you want to have a scatter plot with multiple regression lines, but that didn't work either. I have included the syntax I am using below. Thanks so much,

Z

graph twoway scatteri ///
0.96185 4 (3)"BF" ///
0.92185 8 (3)"BF" ///
0.98085 4 (3)"BM" ///
0.78885 8 (3)"BM" ///
0.78185 4 (3)"WF" ///
0.66985 8 (3)"WF" ///
1.00485 4 (3)"WM" ///
0.92085 8 (3)"WM", title(Adj. Mean Value Peer Support, box bexpand) ///
ytitle("Peer Support") xtitle("Subjective Social Status") xlabel(0(2)10)

↧

Case-crossover study: conditional logistic regression

April 15, 2016, 8:09 am

≫ Next: cmp : Bivariate ordered probit and conditional probabilities

≪ Previous: scatteri options

I have data from a case-crossover study. Prescription data on whether individuals have a given prescription at time 1 (yes/no) and time 2 (yes/no). I am interested in a within person analysis.
I also know individual's gender, and it is the effect associated with gender on outcome prescription that I wish to estimate. I planned to analyse this via a conditional logistic regression model (clogit), however gender is not something that changes within individuals between times 1 and 2, but rather differs between individuals. Does anyone have a suggestion as to where to start on the analysis? Any pointers would be gratefully appreciated.Thank you.

↧

cmp : Bivariate ordered probit and conditional probabilities

April 15, 2016, 8:27 am

≫ Next: Simple: creating one label for many values, not dataset or variable

≪ Previous: Case-crossover study: conditional logistic regression

Dear all,

I regress a bivariate ordered model using cmp. After my regression, I would like to compute the conditional probabilities Pr(y2=x|y1=z), with y1 the first equation and y2 the second one and x,z two real numbers; same thing for conditional expectations.

I know that cmp allows to compute such things (see http://www.statalist.org/forums/foru...d-expectations or the help section in stata).

I used the coma : "predict pr(3 3) eq(result) cond(2 2, eq(Wintg))" (with result=y2 and Wintg=y1) however it failed (error message : "something required"). I think i've missed a point (maybe more !) in the syntax.

Does anybody have any advice ?
Thanks,
Olivier

↧

Simple: creating one label for many values, not dataset or variable

April 15, 2016, 10:19 am

≫ Next: Basic coding question for out of range values

≪ Previous: cmp : Bivariate ordered probit and conditional probabilities

I was not able to find this in help: How can I create one label for several values in one variable. Example: I need NAICS codes 31-35 in variable SECTOR to have one label "Manufacturing". When I follow the general rules and then tabulate SECTOR, the "Manufacturing" appears 5 times. It looks like I can not use dash or slash: 31-35 Manufacturing or 31/35.

↧