Quantcast
Channel: Statalist
Viewing all 73259 articles
Browse latest View live

Heterogenous Treatment Effects

$
0
0
Hi there,

I did a completely randomized experiment and have pre and post experimental data. The average treatment effect of the experiment is basically zero. Moreover, I checked as much of other observable variables for potential heterogenous treatment effects. Without any effect. However, you can think about tons of unobservable variables that may influence the treatment effect and lead to heterogeneity. I was wondering whether you are aware of an easy tool to check this?

Of course I did my homework and checked the web. Solutions that appear there seem to require some decent knowledge on baysian statisitcs (which I do not have).

My idea was then to try to estimate individual treatment effects and then look at the distribution of those. However, estimating an individual treatment effect (this, 1 treatment observation) (even with many pre treatment periods and many control individuals) seem to be highly biased using OLS with time fixed effects.

Anyhow, I would love to discuss this issue on how to tackle potential unobservalbe heterogeneity in treatment effects (preferable with methods that are implemented in Stata).


Looking forward to your answers and many thanks!

Loading an ado file from third party creator

$
0
0
Good afternoon,
I have downloaded an ado file from the site of a third-party and I am trying to load it into Stata. I have put the ado file in the BASE ado directory as it should be done but it does not work.
Is there something else I should do to make it working?
Thanks

Alessandro

How to break -title()- into several lines in esttab output

$
0
0
Hi Statalist

I am having problems formatting the output of esttab when outputting to a .txt file. I would like to (i) insert a subtitle, i.e. break the text in the -title()- option over two lines; and (ii) insert blank space between tables when I concatenate tables using the -append- option, which I thought would be most easily done by adding another blank line at the top of the title. Web searches and playing around with typical escape characters (_n, \n, _newline...) haven't helped.

Thanks in advance,

Sébastien Willis

Size of mlib libraries in Stata 14

$
0
0
Dear All,

is it normal to have even a simple Mata library in Stata 14 to be compiled as an approximately 280k file?

I recall in Stata 9-10 the files were much more compact, around 10k. The mlib file contains a huge frame of binary zeroes inside, which I believe is an unused space allocated for some reason, and I would like to minimize the size of the library.

Thank you, Sergiy

bilateral trade panel data

$
0
0
Hi there,

I am still new to stata. Here is my question:

1) I am dealing with bilateral trade data for panel data in several years. I have a pair of countries and trade (import and export) as well cost with a country-specific characteristic such as distance, transportation and population of each country. The question is how to generate data for a ratio between the import by a country and export to the same country that is not. so, if X =EXPORT, M =IMPORT and i have country i and country j, i need to make a ratio of (Mij/Xij). i.e. import of country i from country j divided by the export of country j to country i.

2) so, basically, how can I generate my existing data in this particular structure:

Mij Xij Mji Xji

Mij: Import of country i from country j

Xij: Export of country i from country j

Mji: Import of country j to country i

Xji: Export of country j to country i


Hopefully, with your great knowledge, can help me to solve this matter

Many Thanks,

Mohd

After collapse: Fillin , Contract?

$
0
0
Hello guys,

I posted something similar before but this time is a little more complicated. I tried to use the contract and fillin command to do this but it didn't work (I lost some info). I collapsed some data based on clinic, date, and illness severity (high low medium , missing) variables. I need the number of people in each group. The issue is that there are some dates where any of the patients were in the illness category of "Max". So this combination didn't appear after the collapse. For example, DOP only have min and med illness categories. I new a row for the clinic DOP of Max illness with 0 people. Also I would like to keep all the variables.

Clinic Room RUNDATE2 Type clevel all agecat1 agecat2 agecat3 agecatm
AM MAIN 2-Jan-14 general Min 33 0 0 33 0
AM MAIN 2-Jan-14 general Med 15 0 1 14 0
AM MAIN 2-Jan-14 general Max 1 0 0 1 0
AM MAIN 6-Jan-14 Intensive Min 30 0 0 30 0
AM MAIN 6-Jan-14 Intensive Max 3 0 0 3 0
DOP lower B 9-Jan-14 NA Min 25 0 0 25 0
DOP lower B 9-Jan-14 NA Med 12 0 0 12 0


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 clinic str7 room float date str9 type str3 clevel byte(all agecat1 agecat2 agecat3 agecatm)
"AMKC" "MAIN"    -16800 "general"   "Min" 33 0 0 33 0
"AMKC" "MAIN"    -16800 "general"   "Med" 15 0 1 14 0
"AMKC" "MAIN"    -16800 "general"   "Max"  1 0 0  1 0
"AMKC" "Lower"   -16796 "Intensive" "Min" 30 0 0 30 0
"AMKC" "Lower"   -16796 "Intensive" "Max"  3 0 0  3 0
"AMKC" "lower B" -16793 "NA"        "Min" 25 0 0 25 0
"AMKC" "lower B" -16793 "NA"        "Med" 12 0 0 12 0
end
format %td date

I would appreciate any help.

Thank you,
Marvin

esttab command

$
0
0
Hi,
ɪ have run a fixed effects regression and would like to export my results to a table in word.
I have shown my results and then my attempt to export to a table below, my esttab command was successful, originally but I wanted to change the significance stars from the default, so I attempted to so by adding the starlevel option but this is not working , as can be seen in the screenshot below.
Does anyone know why this is , and how ɪ could change the significance starsʔ

Array

Any help on this would be much appreciated,

Many thanks

Within and Between Design in Panel Data

$
0
0
Dear Members,

I have run a market experiment which involves both a within and between subject design. I have 6 separated sessions. In each experimental session, each subject goes through 9 tasks (treatments), playing over 7 periods within each task (treatment). So I have 63 periods in each session. Having 6 sessions poses a problem of serial correlation, since observation within each session might exhibit more correlation than observations between sessions (indeed the same group of subject plays within a session). I want to run a panel model (e.g. Pooled OLS, or fixed effects or random effects), where period is the time unit. Is it correct to set the subject as cross sectional unit and then clustering the standard errors at session level (to account for serial correlation)? My concern comes from the fact that I have a within design in each session but, since I have 6 separated sessions (involving six different groups of subjects), it also seems to me that I have a between subject design in place.

Many Thanks!!!



More efficient loop to rename variables based on ending

$
0
0
Hello,

I have a set of variables like this: H1a01 H1a02 H1a03 I61 I62 I63 J1a01 J1a02 J1a03 J1a11 J1a12 J1a13. I need to rename the variables depending on the ending character as follows:
if the last character is 1, replace the 1 with agree
if the last character is 2 replace the 2 with neutral
if the last character is 3 replace the 3 with disagree.

So far this does the job:

Code:
foreach v of varlist *1{
    label var `v' `v'agree
    }
    
foreach v of varlist *2{
    label var `v' `v'neutral
    }
    
foreach v of varlist *3{
    label var `v' `v'disagree
    }
    
    foreach v of varlist Hb1a01{
        local l: var label `v' 
        rename `v' `l'
}

But, is there a more elegant way to achieve this, maybe a loop within a loop?

Thanks.

Claudia



get a list of groups used in a mixed model

$
0
0
I’m running a mixed model where the group variable is subject ID.

Whenever I add in my 4th effector variable, the model drops 3 subject IDs. I can’t tell on the raw data which 3 subject IDs are different though as there are 4000+ lines of data and sorting by the 4th effector variable doesn't reveal any differences in the subjects.

Is there a way to get a list of the group variable (subjects) used in the mixed model I just ran? I think this way I can get a list of the subject IDs in the 1st model and 2nd model and see who is dropped and figure out the why from there.

Sufficient model for discrete choice data when IIA is likely violated - clogit, mixedlogit, other?

$
0
0
Hi everyone,

I have collected data in a form of unlabelled discrete choice experiment, in which agents chose one alternative from choice sets consisting of 6 different alternatives. There were different choice sets and each agent only chose from 2 of these. The underlying theory and hypothesis to test suggests that only alternative-specific variables affect choice probabilities.

Given these circumstances I was opting for a Conditional Logit estimation using clogit, with no alternative-specific constants (because of the unlabelled nature of alternatives) and coefficients of the alternative-specific variables constrained to be equal across alternatives. So basically the I guess the simplest version of conditional logit model, only utilising its ability to estimate coefficients based on data from different choice sets.

The explanatory variables are all binary/dummy variables describing qualitative characteristics of the alternatives and the model includes various interaction terms of these.

The aim of the analysis is to:
a) Determine whether the alternative-specific variables and interactions have a significant effect on choice probabilities
b) Given these effects, establish a sort ranking of probability for different alternatives according to the values of their variables, using coefficient tests.

Tests so far using this approach have delivered intuitively meaningful results in coherent with the observations from the descriptive data analysis and other statistical tests done on it in the preparation phase.

However:
Both common sense about the choice process in this particular case and observations of the distribution of choices within the different choices suggest a violation of the IIA assumption underlying the clogit model. When comparing two choice sets, the substitution pattern appears almost sequential, with the majority of choices for the alternative giving the most preferable combinations of variable values in the respective set, with only few, almost randomly distributed choices of other alternatives, regardless of their variable values.
To use the "red bus/blue bus"/train analogy:
Agents were observed to chose between using the train, the red bus and the blue bus probabilities of p_t= 0.2, p_rb=0.4 and p_bb=0.4, with ratios 1:2:2.
Now, the blue bus gets an upgrade and is twice as fast/comfortable/other as the red bus and we would expect the majority of bus customers to be drawn to the red bus, however not proportionally (or at all) from the train users, so that the ratio of train to blue bus differs from the 1:2 we have seen before. In our case this is not due to unobserved factors linked to the alternatives, as they are unlabelled and appropriately randomized regarding their presentation, it seems truly down to the substitution patterns, which colloquially speaking works along the lines of "Only the best available option matters", whilst the criteria for defining "best" appear to be very homogeneous among agents.

Now my question with regard to the aim of the analysis:
  • Does the model still at least give correct information on the significance of the coefficients, even though they may give wrong predictions on the substitution?
    All information of the effect of violations of the IAA I could find in the available literature was that "predictions will be wrong". What does that mean in the context of my analysis?
  • If not, what would be an suffiecient alternative model to use, given that the specifications are after all rather simple?
    Is it mixlogit? I understand mixlogit can accomodate all kinds of substitution patterns, but not how the individual-specific variables would come into play here, given that they should not be relevant. (From a brief literature review I understand that nested logit could be of help, but given the complicated structure of sub-nesting my data would require I am a bit hesitant on that end. After all, I only need a model giving coefficients testable coefficients)

Thanks a lot for your help in advance!
Best regards,
Fabio

using functions in ado files??

$
0
0
How can I best use functions, like subinstr, _getfilename, in ado programming?

For example, I don't know when to use the function subinstr(s1, s2, n) with parenthesis and when to use subinstr local var s1 s2, all without parenthesis, which is a completely different syntax. Is it considered a function or a macro extended function?

Why can't I use the regular subinstr() in a ado file, or can I?

Also how to declare a variable that receives the return values of the function in ado programming. I tried "local var:" "local var =" "global var=" nothing works. it seems it's different if you're typing as a command line, in a do file or an ado file.

Can anybody clarify? sorry I'm new to stata programming.

Thanks
Emmanuel Segui

Infix problem

$
0
0
My old Stata commands for reading in a DAT file with infix aren't working. I'm using a dictionary because it's a multi-record data set.

I've entered three versions of the same command intended to read the dictionary file. They differ only in the use of quotation marks:

infix using "C:\WP.DOC\Jews\Yankelovich1981\Yank8225.dict

infix using "C:\WP.DOC\Jews\Yankelovich1981\Yank8225.dict"

infix using C:\WP.DOC\Jews\Yankelovich1981\Yank8225.dict

Each command returns the same error (601) which is not helpful:

file C:\WP.DOC\Jews\Yankelovich1981\Yank8225.dict not found

The dictionary file is a text file created in Notepad (attached). There is nothing wrong with the directory. The following command (with an intentional error) located the data file in the same directory:

. infix using "C:\WP.DOC\Jews\Yankelovich1981\YANK8225.DAT
000114233121213122121212 22 24411514777113137411133312333 1111211122222222221
file does not contain dictionary
r(613);


I'm using Stata IC 14.2 with Windows 7.

Any suggestions? Thanks.

ASROL : New Version - Speed Advantage

$
0
0
Thanks to Kit Baum, version 3 of ASROL is available on SSC now. New users can install it by
Code:
ssc install asrol,
and existing user can update the version by
Code:
adoupdate asrol
.

Description

asrol calculates descriptive statistics in a user's defined rolling-window. asrol efficiently
handles all types of data structures such as data declared as time series or panel data,
undeclared data, or data with duplicate values, missing values or data having time series
gaps.

asrol uses efficient codings in the Mata language which makes this version extremely fast as
compared to other available programs. The speed efficiency matters more in large data sets.
This version also overcomes limitation of the previous version of asrol which could calculate
statistics in a rolling window of 104. This new version can accommodate any length of the rolling
window.

Speed Comparison
On panel and time series data, ASROL and RANGESTAT ( from SSC) perform pretty well, with marginal speed efficiency for ASROL. However, ASROL outperforms RANGESTAT significantly in panels with duplicate observations. See results of the following tests using Stata 14.2 on about 5 million observations. I have not tested using different windows as of now.

Code:
clear
set obs 1000
gen industry=_n
gen year=_n+1917
expand 5
bys industry: gen country=_n
expand 1000
bys ind: gen company=_n
gen profit=uniform()
tsset company year

timer clear 
timer on 1
asrol profit, s(mean) w(100)
timer off 1

timer on 2
rangestat (mean) profit, interval(year -99 0) by(company) 

timer off 2

timer list
/*
  1:    119.22 /        1 =     119.2250
  2:    130.56 /        1 =     130.5600
*/

cap drop mean100_profit profit_mean

timer clear 
timer on 1
asrol profit, s(mean) w(year 100) by(country industry)
timer off 1

timer on 2
rangestat (mean) profit, interval(year -99 0) by(country industry) 

timer off 2
assert mean100_profit== profit_mean

timer list
   1:     40.98 /        1 =      40.9840
   2:    681.49 /        1 =     681.4930

Creating neighbor information variables in country panel data

$
0
0
Hi everyone,

for my panel data set that's unit of analysis is countries, I am trying to create variables that describe information of contiguous countries. I.e., I want a variable that indicates the average GDP of each of the countries' contiguous neighbors. As an example, speaking for an individual observation, i want the variable to show: what is the average GDP of all of Germany's neighboring countries in year 2000? How do I create this variable?

I was adviced to look at spatial lagging but as far as I have understood, the concept of spatial lags uses longitudes and latitudes and then sets a threshold as of with which distance, a country is considered a neighbor. As I only care about contiguous neighbors though, I don't see how spatial lagging would help me here.

I would be very happy about some advice. Thank you in advance!

Cash Flow Volatility

$
0
0
Hello,

I want ro compute Cash flow volatility with just a few details.

Cash volatility as the standard deviation of industry cash flow to assets is computed as follows:
1. For each firm-year compute the standard deviation of cash flow to assets for the previous 10 years (at least three observations must be available)
2. Average the firm cash flow standard deviations each year across each four-digit SIC code

Hope you can help me!
My codes so far:

bysort gvkey year :generate uncf= (oibdp - txt - txditc - xint - dvp - dvc)/at
label var uncf "Undistributed Cash Flow"

xtset gvkey year
ssc install tsegen
tsegen sd_firm= rowsd(L(1/10).uncf, 3)
label var sd_firm "Standard Deviation of Cash Flow"

bysort SIC year: egen mean_SD= mean(sd_firm)
label var mean_SD "Industry Cash Flow Volatility"

Thankful for any advice.

How to: xtabond2

$
0
0
Hello everyone,

Me and my group (MSc students) are working on a project on inequality and growth. We have a panel dataset and want to perform System-GMM.

However, we find it hard to construct the xtabond2 command, since it is our first time using it.

Our dependent variable is growth rate and our regressors are: lagged GDP per capita , gini coef (as our measure of inequality) and our control variables are: price level of investment, female secondary education, and male secondary education.

Could you help us with that?

Thank you very much in advance!

Array

natural response rate in probit

$
0
0
Hi

I was wondering if there was a way to adjust for the natural response rate, or background rate, when using probit regression in Stata.

Example- in the study of mortality in insects exposed to various doses of a toxin, there may be a baseline rate of mortality in insects that were not exposed to the toxin at all. In SPSS and SAS there are options that adjust the probit regression based on this baseline rate.

Is there a way to adjust for this in Stata?

Thank you

Missing Response Categories from Logistic Analysis

$
0
0
Hi

I am new here and also a novice in statistical analysis. I have a problem I hope someone can help me address.

I realise the number of response categories for some of my predictor variables changes to three (including the reference variable) instead of four responses when I run the logistic test. What could be the cause? The variables of concern are the PSEX_10 and SAA_101. I have attached a screen shot of the variable in question. Thanks

Using -collapse- to get grouped statistics that AREN'T cumulative?

$
0
0
Hi,

I'm trying to aggregate some individual data to state level. I'm using the 2012 Current Population Survey.

I have sorted my data based on my grouping variable, and used the following syntax:
collapse (sum) prcitshp native_born foreign_born foreign_noncit foreign_nat [pweight = a_fnlwgt], by(gestfips)
where "gestfips" is the FIPS State number, "a_fnlwgt" is the final sampling weight, "prcitshp" is a citizenship indicator, and the four variables following this are 1/0 based on the citizenship variable.

When I run this syntax, I get a dataset that has a cumulative/running sum. I need data that are NOT cumulatively summed.

For example, the new data is such that:
SUM FREQ
637054.6 1
728817.7 1
757748.7 1
etc... etc..
There is one observation for each state, which is ideal. But I need the count for each state, separately.

How can I achieve this? Is there another way, other than -collapse-?

Thank You!
Viewing all 73259 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>