Quantcast
Channel: Statalist
Viewing all 73331 articles
Browse latest View live

fixed effects: excluding one entity?

$
0
0
Dear all,

I am dealing with a panel data set with 31 provinces over 16 years examining the amount of development finance as logged Y variable and a set of X variables. I am using 12.1 version of stata and using both reg and xtreg to examine the effect (the reason I use reg only to justify a FE analysis, it is not center of my research).

The problem is: province number 31 does not receive development finance for 11 out of 16 years, thus the logged Y variable is 0 for those 11 years. My question is how to deal with this province in the fixed effects analysis, because it changed the significant variables to be insignificant in the results?

I have tried both: excluding the province and including the dummy: i.province31 in reg. But the dummy is being omitted in the xtreg. So I tried excluding the province and compared xtreg with and without the province 31, the R squared are the following:

with province31: R-sq: within = 0.0375
without province31: R-sq: within = 0.1105


The two codes I used are:
Code:
xtreg lnFINPP_PY2 GRPpc_L rSTBDGT_L TOPOP_L rFRBDGT_L EAST , cluster(province_id) fe
Code:
xtreg lnFINPP_PY2 GRPpc_L rSTBDGT_L TOPOP_L rFRBDGT_L EAST if(province_id < 31), cluster(province_id) fe
Array





Array








I will attach both outputs, I hope you can see it and it is readable. I am new to stata and thankful for any suggestions. Thankyou


Mei

Storing the output of -margins- command

$
0
0
Hi all,
I use Stata 14.1 I ran a regression using -areg- command and then calculate the elasticities of several covariates using the -margins- command.
Is there an equivalent command to -esttab- which I can use to create a table of the -margins- results?
Thanks,
Anat

Binary Response Panel Data with Self-Selection into Binary Treatment

$
0
0
Hi all,
In a recent paper, Semykina & Wooldridge (2015) (link) suggest that Stata's -biprobit- command can be used to estimate average treatment effects (ATE) for binary responses with self selection into a binary treatment when the data has a panel structure.

I had seen previously from Austin Nichol's slides here that biprobit can be used for endogenous switching (self selection) with binary treatment and binary response, but I did not know that it could be extended to a panel data context. For the cross-sectional case, the command would be simply something like:

biprobit (Y= control_variables treatment) (treatment= control_variables Instruments)

Semykina & Wooldridge (2015) suggest that the above command can be modified for panel data, explaining briefly in a footnote:

in Stata estimating treatment effects can be implemented by pooling the data and estimating the augmented equation (with time averages) using the “biprobit” command. Standard errors robust to serial dependence can be obtained using “cluster” option.
MAIN QUESTIONS:

Can anyone here provide more details on how to implement this with stata? For example, what do they mean by “the augmented equation (with time averages)”? what are these time averages? Should I include year dummies? Is the panel data structure dealt with random effects in this method?

Also, are there ways to get ATE and ATET separately? Can we do it with the margins command? Finally, is there a way to perform the test for selection bias outlined in Semykina & Wooldridge (2015) after the biprobit command in Stata?

ALTERNATIVES:

As an alternative to the -biprobit- command, I think there must be a way to do this with the -cmp- command by David Roodman in a manner similar to that discussed in posts such as this or this. Any guidance on how to exactly implement the self selection case and calculate ATE and ATET with the -cmp- command would also be appreciated.

The -biprobit- command looks more attractive to me at this point because my panel data has a survey structure with probability weights and -biprobit- works with the -svy:- prefix. Although the requirement to use vce(cluster) noted by Semykina & Wooldridge (2015) will probably not let me use the svy prefix anyways.

Another approach based on control functions is outlined in Murtazashvili & Wooldridge (2016) but I can't find stata code for that either. I know the control function approach is what is used by stata's -eteffects- command, but that command also does not handle panel data.

References:

Murtazashvili & Wooldridge (2016) - A control function approach to estimating switching regression models with endogenous explanatory variables and endogenous switching

Semykina & Wooldridge (2015) - Binary response panel data models with sample selection and self selection

Panel Data Robust Standard Errors...

$
0
0
Hi all,

I have been running panel data fixed effects models with robust standard errors but do not get the intuition behind why I am doing this. I understand clustered standard errors for pooled OLS models (because of correlation in the error across time), but am confused with the fixed effects, so it would be great if someone could clarify why I am doing this. Essentially, all I know is that it is robust to heteroskedasticity and autocorrelation, but do not see how it applies to my panel data. Thanks in advance!

Panel Data - methodology and testing

$
0
0
Hi all,

I would like to ask you for help regarding my thesis. I am writing a work regarding FDI and their determinants. I have data for 38 countries during the period of 18 years. And my initial fixed effect model is showing very low R2. Only 6%.
So I have decided to try to take three years averages of the same data and run again the FE. The significance of the variables is almost the same however this time the R2 is almost 80%. Could you advice me what could be the reason?
I have to admit I am not very skilled in econometrics and I am not sure if FE is even the right model, I was only able to test that it is better than RE.
Second of all how can I test for heteroscedastocoty or autocorelation after running FE? And is it even necessary?

Thank you very much for your help.

Tabulate measures

$
0
0
Dear Statalist,

I prepared the following example. Consider this dataset:


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte S002EVS double S017 byte(F050 F051 F052 F053 F054 F055 F059)
Year        Weight                 Variables
1 1.0042262823081847  0 .a  0  0  0  0  0
1 1.2452805990773763  1  1  1  1  1  1  1
1  .9072044203720419 .a .a .a .a .a .a .a
1 1.0972472427211808  0  0  0  0  0  0  0
1  .9072044203720419  0  1  0  0  0  0  0
1 1.2772878112625108  1 .a  1  0  1  0  0
1  .6291417645137987  1 .a  1  0  1 .a  1
1  .6811534843146201  1  0  0  0  0  0  0
2  .3732121389114244  1 .a  1  0  0  1  0
2  .3732121389114244  1  1  1  0  1  1  1
2 2.3170253624084234  0  1  0  0  0  1  0
2  .3732121389114244  1  1  1  0  0  1  0
2  .3732121389114244 .a  0  0  0  0  0  0
2 2.3170253624084234 .a  0  0  0  0  0  0
2  .3732121389114244  1 .a  1  1  1  1  1
2  .3732121389114244 .a  0  1  0  0  0  0
2  .3732121389114244  1  0  1 .a .a  1  1
2  .3732121389114244  0  0  0  0  0  0  0
2  .3732121389114244  1  0  0  0  0  0  0
2  1.555050578797592  1  1  1  1  1  1  1
2 1.5083990614336737 .a  0  0  0  0  0  0
2  .6686717488829684  0  0  0  0  0  0  0
2 1.5083990614336737  1 .a  1 .a .a .a .a
2  1.897161706133059  0  0  0  0  0  0  0
2  .6686717488829684  1  0 .a  0  0  0  0
2  .1244040463038087  1  1  1  0  0  1  0
3 1.4629639878194398 .a .a .a .a .a  1 .a
3  1.212610284928649  0 .a .a  0  0  0 .a
3 1.0152899045483108  0  0 .a  0  0  0 .a
3 1.1133530127472218  0  0 .a  0  0  0 .a
3  2.009412926773146  1 .a .a  0  0  1 .a
3  1.119732314482523  0  0 .a  0  0  1 .a
3  .8222739038680567  1  1 .a  0  0  1 .a
3  1.316188761651862  0  0 .a  0  0  0 .a
3  .8831216549330272  0  0 .a .a  0  0 .a
3  1.119732314482523  1  0 .a  0  0  1 .a
3  1.316188761651862  1  1 .a  0  0  0 .a
3  .8831216549330272  1  1 .a  1  1  1 .a
3 .44226835369296097  1  1 .a  1  1  1 .a
4  .7997490250066603  0  0 .a  0  0  0 .a
4  .7836737805734815  1  1 .a  1  1  1 .a
4  .9710811465300937  1  1 .a  1  1  1 .a
4 1.1524775830534932  0  0 .a  0  0  0 .a
4 1.4225175893045623  1  1 .a  0  1  0 .a
4  .7997490250066603  1  0 .a  0  0  0 .a
4 1.4225175893045623  1  1 .a  1  1  1 .a
4 1.3632131789935933  1  1 .a  0  0  0 .a
4  .7997490250066603  0  0 .a  0  0  0 .a
4  .7644363603754695  1 .a .a  0  0  0 .a
4   .665211916477448  1  0 .a  0  1  1 .a
end
label values S002EVS S002EVS
label def S002EVS 1 "1981-1984", modify
label def S002EVS 2 "1990-1993", modify
label def S002EVS 3 "1999-2001", modify
label def S002EVS 4 "2008-2010", modify

It represents values of six dummy variables of a single country over four data surveys.

My goal is to reproduce this output:
Array

Note that the picture represents one country on one year but I want to show the values of variables among four data sessions.

I started to implement this code:

Code:
sort S002EVS S003
by S002EVS S003: tabulate S002 F050 [iw=S017]
But the output was:

(Example)


S002EVS = 1981-198, S003 = Belgium
no observations
In anycase my question is:

How can I reproduce that table?

I hope I was clear enough.

Thanks for the attention.

Help with Path analysis

$
0
0
Hi, I need help with Path analysis. First of all
a) im new to Stata
b) I never did path analysis
c) Im a sociologst, so i suck ath math

I downloaded pathtreg via findit command and I´ve using this FAQ http://www.ats.ucla.edu/stat/stata/faq/pathreg.htm I tried doing a very simple model with just 3 variables and i think it worked

But i need to be able to do this classic model by Blau and Duncan http://dspace.library.uu.nl/bitstrea...802/image2.gif

How can i write the hole command to get that ?

Also, how should i interpret the data ? I´ve tried reading some manuals, but there mostly for mathematicians or statisticians, i need to be able to read understand them in layman´s terms, any recomendations?

What happens if some cases have missing data?

Any other advic onPath analysis or Stata in general?

THANKS!

Export tables from Stata to Word

$
0
0
Hi Statalist users,

I would like to export statistics (mean min max sd p50 sd) from Stata to Word. I am using the
Code:
univar SIZE FFLOAT
I would like to export this to a nice Word table from Stata, how can I do this? With which command?

Also I would like to sort the variables in the descriptive statistics by a dummy variable named CRDELIST (which equals 1 for cross-delisting and 0 for cross-listing).

Thanks

running regression 1,000 times with random variables

$
0
0
Hi Everyone,

I am creating 100 random variables at the beginning as follows:

set obs 100
gen z1 = rnormal(6,1)
gen a=3
gen B=2
gen gama=1
gen y=a+B*z1+rnormal(0,1)
gen ym= y+ gama*rnormal(0,1)

I am trying to run regression of ym on z1 1,000 times. I want to record average,min,max of a and B (coefficients and constant) and R^2. Also I want to record t statistics for each regression.

Your helps will be appreciated.

Thanks in advance.
Ulas

Parallel loop execution?

$
0
0
I have the following loop in Mata


Code:
mata:
result = 0

for (i = 1; i <= 100; i++) {
   result_i = // Operations using the i-th observation
   result = result + result_i
}
end
This should be executable in parallel. Each thread takes, say, 50 observations and returns the sum, and at the end I collect each result. Something like

Code:
mata:
result_thread1 = result_thread2 = 0

// Thread 1
waitfor_thread1 = 1
for (i = 1; i <= 50; i++) {
    result_i = // Operations using the i-th observation
    result_thread1 = result + result_i
}
waitfor_thread1 = 0

// Thread 2
waitfor_thread2 = 1
for (i = 51; i <= 100; i++) {
   result_i = // Operations using the i-th observation
   result_thread2 = result + result_i
}
waitfor_thread2 = 0

// Collect
while (waitfor_thread1 & waitfor_thread2) {}
result = result_thread1 + result_thread2
end
But have the three code cunks execute in parallel. Is such a thing possible? Workarounds and hacks welcome.

Thanks!

Iteratively adding to a local

$
0
0
Hello Statalisters,

I am attempting to run some simulations, incrementing sample size and stratification factors to test for balance. The below code does most of what I need it to do.

1. It creates datasets of different sample sizes based on the values in -- local sample --
2. It creates covariate variables based on the categories in -- local cats --
3. It conducts stratified randomisation at different sizes of the dataset

The bit i'm struggling with is highlighted in red below. If you run the code you can see on the first loop that it creates what I need, iteratively adding "obs1" and then "obs1 obs2" and then "obs1 obs2 obs3", etc, until it is complete. But it then goes on because it is a nested loop. What I want to do is after each iteration stratify based just on obs1, and then obs1 and obs2, etc., until i'm stratifying my randomisation on all the observations generated in the first loop.

I've tried different things, such as moving both the bottom two snippets into the top bit of the code but to no avail. If anyone has any ideas, i'd welcome them!

Code:
**Clear memory
clear

local sample 100 200 400 800 1600 3200 6400 10000 //set up local for different sample sizes
local strata = ""
foreach size of local sample{
    preserve
    qui set obs `size' //set different sample sizes
    qui gen id = _n //generate a unique id
    local cats "`"2"' `"2"' `"2"' `"2"' `"2"' `"2"'" //set up local for category numbers of strat vars
    forval i = 1/6{
        local catind :word `i' of `cats'
        qui gen obs`i' = mod(_n,`catind') //generate 6 i number of strat vars based on values of local cats
    }
    
   qui ds obs*, skip(1)
        local stratlist "`r(varlist)'" //store all strat vars in a local
        *di `"`stratlist'"'{
        
        forval stratnum = 1/6{
            local strat: word `stratnum' of `stratlist'
            local strata `"`strata' `strat'"'
            di `"`strata'"'
            
        }
        qui egen strata=group(`strata') //gen a variable that makes unique groups based on strat vars
        set seed 31540 //setting a seed for replicability
        qui gen randomnum = runiform() //generating a random number
        qui bysort strata: egen order=rank(randomnum) //generating a rank order var based on the random number
        qui bysort strata: gen treat = (order <= _N/2) //assigning condition based on rank
            
        foreach var of local strata{
            tab `var' treat, r
        }
        des, s
        
    restore
}

Strange Problem Formatting the Contents of a local macro

$
0
0
I am having a problem formatting the contents of a local macro. Consider the following Stata code run under version 14.2:
Code:
/* Example #1 */
di %20.0fc 123456789

/* Example #2 */
local x 123456789
di `x'

/* Example #3 */
local x `=123456789'
di %20.0fc `x'

/* Example #4 */
local x : di %20.0fc `=123456789'
di `x'
The output below shows that the first three examples produce expected results. The fourth, shows the commas replaced with spaces.
Code:
. /* Example #1 */
. di %20.0fc 123456789
         123,456,789

. 
. /* Example #2 */
. local x 123456789

. di `x'
1.235e+08

. 
. /* Example #3 */
. local x `=123456789'

. di %20.0fc `x'
         123,456,789

. 
. /* Example #4 */
. local x : di %20.0fc `=123456789'

. di `x'
123 456 789
I appreciate any comments on why the fourth formatting example fails. I believe that I have used this code before.

Best wishes,
Alan

Problem: Margins reports &quot;not estimable&quot; for predictions from a linear model; H-matrix contains values outside {-1,0,1}

$
0
0
Dear Statalisters,

I'd like to use margins and marginsplot to generate conditional parallel trends graphs using marginsplot with the linear predictions from marginsfollowing linear regression:
Code:
reg ln_wage i.immiyear##ethn ///
        yoe ftexp_after c.ftexp_after#c.ftexp_after ///
        1.male ///
        1.married ///
        1.marr_male ///
        age c.age#c.age ///
        ysm ///
        1.good_german_now ///
        5.lfs ///
        2.firmsize ///
        ue_rate ///
        1.bula_neu ///
        2.regtyp ///
        1.poland 1.romania 1.ussr ///
        if inrange(immiyear,1993,1998) & inrange(immiage, 18,64), r
  
margins immiyear, at( ethn==1 ethn==0 )
marginsplot, x(immiyear) recast(line) xline(1996)
However, when executing margins, I get the following output:
Predictive margins Number of obs = 542
Model VCE : Robust
Expression : Linear prediction, predict()
1._at : ethn_ger = 1
2._at : ethn_ger = 0
Delta-method
Margin Std. Err. t P>t [95% Conf. Interval]
_at#immiyear
1 1993 . (not estimable)
1 1994 . (not estimable)
1 1995 . (not estimable)
1 1996 . (not estimable)
1 1997 . (not estimable)
1 1998 . (not estimable)
2 1993 . (not estimable)
2 1994 . (not estimable)
2 1995 . (not estimable)
2 1996 . (not estimable)
2 1997 . (not estimable)
2 1998 . (not estimable)
After reading a reply to a former statalist post on a similar problem with a Logit-model (see http://www.stata.com/statalist/archi.../msg00514.html), I let stata output the H-matrix and it turned out that it does contain values outside the interval {-1,0,1}. The post suggested turning the estimability check off by adding the -noestimcheck- option, because the Logit-model there only contained one factor variable.

When I use the -noestimcheck-option, margins does work and I get predictions, that look reasonable.

So, finally, here is my question: Is it ok to turn off the estimability check of the margin command in my case - with a linear model and quite a few factor variables (i.e. dummies)?

Thanks a lot and best regards,
Boris Ivanov

Problem: -ICE- Not Imputing for All Cases/Rows

$
0
0
Dear Statalisters,

I have been trying to impute missing values in my longitudinal dataset using -ice- package developed by Patrick Royston in Stata 13.
I have data from 1988 to 2014 for 153 countries, and 9 variables (2 dependent, 7 predictors). There roughly 15% of rows that have one or more of the variables as missing.
I have been following every rule in the book so far:

1. Data is reshaped to wide to accommodate the longitudinal nature of it during imputations;
2. I have defined a custom equation for every variable Y that lists all the other 8 variables and values from other years for Y on the right hand side;

Dry-run shows no sign of any problem, and the actual imputation command runs smoothly too.

And now the problem: When I open the imputed dataset, I see many empty rows in between different imputations. It seems like that -ice- runs from row #1 (Afghanistan) to #65 (Iran) without any problem and imputes (with valid non-missing) all the missing values, then stops and imputes all missing values for all variables in other rows. I am quite sure that this cannot be due to misspecification because even the string variable "country" has missing values for rows-countries following Iran after imputations.

Has anyone dealt with a similar situation before? I know Patrick Royston is not a member of Statalist so I emailed him directly for help. Any advice is highly appreciated!

Finding and counting duplicates across variables

$
0
0
I am more or less a beginner using stata. I am looking for a way to find and count duplicates across variables for each participant.

I have data on courses taken while enrolled in adult education. What I am trying to find out is whether a person has taken the same course more than one time. For instance, eyeballing the data tells me there are some who have taken GMATH 1 time, while others have taken it 2 or 3 times. It looks like this (they are string variables):

ID Course1 Course2 Course3 Course4 Course5 Course6... Course18
1 GMATH GREAD GSCI GSOCS . . .
2 GMATH GMATH GREAD GREAD GSCI GSCI .
3 GMATH GMATH GREAD GSCI GSOCS . .

I would like to produce the following columns/new variables for each participant (e.g. the number of times they enrolled each course)
GMATH GREAD GSCI GSOCS
1 1 1 1
2 2 2 .
2 1 1 1
There are about 20 courses available. Most people take between 1 and 6 but some people due to retaking the same courses have taken as many as 18. My goal in laying out new columns with the noted information is to find out how many people are retaking courses, if they are retaking more than one course, and which courses are retaken most often.

Is this even possible?

Greg

Predict and Robvar Functions

$
0
0
I was wondering if someone can help me out with some STATA commands. I am working on statistical analysis of panel model for the quantitative section of my thesis. I have been through the estimation by pooled regression (OLS), and by fixed and random effects methods. Now I am proceeding with the several tests on serial correlation and heteroscedasticity. I am following STATA instructions from manuals and others similars researches and the syntax so far was very clear and straightforward. However when I was doing the test for heteroscedasticity of the random effects model I stumbled on a couple of lines that I couldn't quite get. The lines are listed below:


Xtreg variable dependiente variable independiente, re

Predict pred, e

Robvar pred, by (Numero)

By reading and watching several online tutorial I could understood the predict function which I suppose is used to predict a new variable. In that case a variable that should be the variance of the errors to check the null hypothesis of homoscedasticity, i.e., no existence of heteroscedasticity. Another manual the same command predict is used with che which I couldn't understand either. Both command have generated a new variable in my database that I don't know what it is. The numero I finally understood that is the entity or group... In my case 85 "codecompany" groups of 485 observations. I would like to understand what STATA is doing in the command "predict pre, e" or "predict che, e".

I would much appreciate a support on that matter.

Best regards,

Luiz Alfredo Santos

Nearest Neighbour Matching command: nnmatch

$
0
0
Hi

I am using the nearest neighbour matching command. I would like to match on two variables - varX and varY. For the varX I want an exact match and thus can use the ematch() option. However for the varY I would like a match which is between, say, 80-120%, of varY.

Any hints or suggestions on how to match within a boundary?

Thanks,
Laura McCann

Test Retest (ICC) reliability with missing data

$
0
0
Hello

I am working with a test and retest of my schedule of 30 items. I will be summing them to form a total score, which will be the subject of an ICC calculation. At least in part because of the population I am working with, I have a lot of missing data for this postal questionnaire. For 101 administrations of my schedule only 59 have complete data at both administrations. It's a lot of data to throw away (and will result in bias).

I'm trying to think of a way of running my ICC calculation without relying on complete cases only.

I have a few ideas, but before I run away with them, I wondered if fellow statausers might have some insights. Or can point me the right way.

Thank you

Mark Wilberforce
PhD candidate, University of Manchester, UK

Looping If/Or Through all Variables in Dataset

$
0
0
Hello. I have a dataset containing a rather large number of variables each representing a term in a contract. So one observation may have 90 contract terms. Each term has a string variable identifying the generic type of the term and a second string variable identifying the term itself. To make the dataset more conducive to analysis, I am trying to generate a new set of indicator variables one for each generic contract term type and one for each specific contract term. I would like to look across a single observation and see whether any variable contains the string. Writing the loop with if and or seems rather clunky. I know that anycount does something similar for numeric values. Does anyone have any ideas on how to extend this to string data?

Thanks!

Seeking .ado for -epctile-

$
0
0
Hi all,
I'm seeking the .ado file for the command -epctile-. It does not seem to be available on the repec site and not supported by Stata 14.1. At least I cannot install easily on my remote desktop situation.

I checked on the developers personal website for the .ado but I can longer find the link. Here is his site: http://web.missouri.edu/~kolenikovs/stata
If someone has the .ado can you please send it to me? Alternatively I am looking for instructions on how to get this command from my own version of Stata 13.1/IC on my personal computer desktop (not the remote one). TIA Rebecca
Viewing all 73331 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>