Quantcast
Channel: Statalist
Viewing all 73257 articles
Browse latest View live

keep variables, if several depending properties are in spezific rows for each variable

$
0
0
Hello,

After searching in the forum without a soultion this post. I have a timeseries with many variables (more than 1000). The variables have more than 15 different criterial to distinguish. My goal is to merge these variables depending on these properties.
Something like this example:
.
property_date var1 var2 var3 var4
property2 Failure Unscheduled Failure Failure
property3 01.01.2016 02.04.2011 01.01.2016 03.04.2017
property4 444 2321 22 22
01.01.2017 . . 3 .
01.02.2017 . 4 3 2
01.03.2017 . 4 3 2
01.04.2017 7 4 . 2
01.05.2017 7 4 . .
.
It is an easy way, to give a name depending on the properties and then drop or keep the variables depending on the properties before merging.
(e.g. for the Variable "var_failure_444" the command "keep *failure*" "keep *444*" and so on.)
But the problem is, that I have too many properties and the variablename would be too long to put all the properties into the variablename. Thats why my question: How is it possible to to keep or drop the variables, if the depending properties are in a spezific row of each variable. Or ist there another way to do this? I would be very grateful if someone could give me an answer


xtreg

$
0
0
Hello everbody
Can I use the xtreg when the dependent variable is binary?
Thanks.

Propensity Score Matching

$
0
0
Hi,

I'm doing a propensity score matching using the psmatch2 command in STATA.
My cohort consist of 17,435 patient of whom 8,474 (49%) have gotten treatment and 8,961 (51%) have not.

After using the psmatch2 command and nearest neighbor matching (caliper 0.2) I end up with a cohort consisting of only 4,584 patients. So only 26% of my total cohort.
Does anyone know what the problem might be? How much patient do I need minimally after Propensity score matching? I've lost 19 patients due to very high age predicting succes perfectly.

Thanks!

t-stat of the main and interaction effect

$
0
0
Dear Stata experts,

In one of my analysis, I got lower t-stat of the main variable compare to interaction. Is there any issue or explanation. For example, I got t-stat for DISC which is 5.780 (Beta is 0.823) whereas I got t-stat for the interaction variable, DISC*SIZE is 3.280 (Beta is 9.582). t-stat is lower for the interaction term, however, beta coefficient is higher for the interaction term. Is there any explanation here because firms with larger SIZE will make more disclosures?

Please help me by giving your expert opinion?

Thanks.

Kind regards,

Aryan

Problems w/ tenure calculations in panel data

$
0
0
Hi there!


I am having problems calculating tenure in unobserved (few missings) panel data. I tried a couple of options for several hours, but the answer still eludes me.

I have 3 companies (categorized by gvkey), from 2002-2010. For each year I have data on individuals that are part of one specific work group. If they stop working for company x in year y, then they are not featured in the group in the following year y+1.
I formatted work_start and work_end as:
gen int date_of_work_start = daily(work_start, "DMY", 2016)
gen int date_of_work_end = daily(work_end, "DMY", 2016)
format date_of_work_start %td
format date_of_work_end %td

Data example:
gvkey ; year ; companyname ; individual_name ; date_of_work_start ; date_of_work_end
1 ; 2004 ; Stapconsulting ; Marc Foster ; 03jan1988 ; 12sep2006

In this case the tenure should be 2004 "minus" 03jan1988. If there is a solution like this, there is proabably a need to format the year to calender year end (31.12.2004), or else Stata won't like it. Also, I guess, date_of_work_end only becomes relevant when the year is the same year of date_of_work_end. For example let's look at the same company, same person 2 years later:

Data example:
gvkey ; year ; companyname ; individual_name ; date_of_work_start ; date_of_work_end
1 ; 2006 ; Stapconsulting ; Marc Foster ; 03jan1988 ; 12sep2006

Tenure then should be 12sep2006 "minus" 03jan1988.

Is there an elegant way to calculate tenure like this for each gvkey and each year? If it helps I would be willing to drop days and months of work_start and work_end, and just focus on years. Sadly, I'd lose information that way ...

Thank you in advance.

BR

How to use tabulate frequencies to build a density variable

$
0
0
Good afternoon,
I am trying to build a density variable using the frequencies I have computed using tabulate command. I have found that a way to do that is
Code:
tab birth_date [iweight=wgt], matcell(x)
svmat x
But in this way I obtain a variable that has the ordered elements of the matrix as entries but the frequencies are not associated to the values of the variable of interest.
How can I obtain a variable made of frequencies of a variable of interest such that its values corresponds to the values of the original variable?

ATT with OLS with third-party control

$
0
0
Hello,

How can I estimate ATT with an OLS with third-party control? What advantages does matching with OLS have with third-party control?

Thank you!

Return list inside a loop

$
0
0
Dear Statalist,

I am running my codes but there is something wrong that I am not able to fix. It is something quite easy but I have not found answers taking at look at previous questions.
I have several years (from 2000 to 2016) and I have the wage for each year. I would like to have the ratio between the minimum wage (I have this variable) and the average wage, for each year so that I can create a graph with year on x-axis. I guess the problem is in the return list that it goes to take the last r(mean) and not the value that I have for each year and then divide it by the minimum wage but I do not understand where the mistake is.
I run the following loop:

foreach num of numlist 0/16 {
su wage, de
return list
ge kaitz=mw/r(mean)
}

I looking forward your advice to fix the code.
Thank you very much.

With kind regards,
Simona

xtreg and countries used

$
0
0
Hi guys! I have a panel of 185 countries but when I use fixed effect models Stata tells me that the number of id used is 132-127, it depends on the controls that I used and its variability I think. But the question is, how can I verify which country is included in the model in order to verify the features of the sample? There is some post-estimation command that tells me something? Thank you!

Decomposition of (pseudo) R-squared

$
0
0
Hello,

I am using shapley2 to decompose a measure of fit after a probit regression. I would like to use it for IV or panel calculation, but Shapley2 doesn't work (yet) with panel data. Trying LSDV results in a "matrix has missing values" error. Also, I tried shapleyx, which seems to not work with probit. There's a relevant discussion here: http://www.statalist.org/forums/foru...-decomposition, but it doesn't help me with decomposition after ivreg2.

I was wondering if anyone can suggest ways of decomposing pseudo R-squared (or another measure of fit) after ivreg2? Thanks!

Here's an example code:

Code:
* this works
sysuse auto
shapleyx weight i.foreign  length, result(e(r2)):ivreg2 price @
Code:
* this doesn't work "Error in the called program!"

shapleyx weight i.foreign (i.rep78 =mpg) length, result(e(r2)):ivreg2 price @

Time series models with binary outcome

$
0
0
I'm wondering about the best way to do a time series model where the outcome is binary (depressed/not depressed), and there are three waves.
  1. The simplest approach seems to be a random effects model: "xtlogit depressed X X2 X3, re." But some of my regressors only have data for 2 out of 3 waves, so if I include them, Stata won't use all three waves of data (I can tell because the output says the "max" number of obs per group is 2). Is there any way to fix this—i.e., to use the max number of available waves per regressor?
  2. One potential option is to type: "logit L0.depressed L(0/2).X L(0/2).X2 L(0/1).X3, where X3 is the variable that doesn't have any data in wave one. But would that work? Would it produce unbiased estimates?
  3. Another option is to observe how a change in X affects the change in depression: "logit D(0/1).depressed L(0/2).X L(0/2).X2 L(0/1).X3." But can this model account both for people who become depressed (0 --> 1) and people who become undepressed (1 --> 0)? It seems like the "logit" command wouldn't work here since the values can be either -1, 0, or 1 depending on whether the person became depressed, undepressed, or remained the same, whereas logit assumes a binary outcome.
  4. Do any of these models help reject the possibility of reverse causality? For example, if the random effects model shows that X–X3 really are associated with depression, can I know that they lead to depression rather than depression leading to them?
Any thoughts on a better model would be greatly appreciated!
Max

autoregressive model and saving residuals

$
0
0
Hey,

I have daily panel data for 225 stocks over 3 years and have trouble doing the following regression:
Array



As a measure for volatility I want to regress each separate log return (logr) on the 12 lags of the respective log returns together with its day-of-the-week dummy variable, in order to save the absolute values of the residuals afterwards. I created seperate variables for the 12 lags (logr_1; logr_2 etc.) and dummies for each trading day of the week (monday, tuesday etc.).
Since I'm new to stata:
1) I don't know how I should implement these kind of models efficiently without having to do them separately for each day and stock in my data
2) I don't know how I should save the residuals as a variable afterwards.

Kind regards,

Gianni

Creating a new variable depending on 2 other variables

$
0
0
Hi there. So, I'd like to create a new variable that takes a specified value, say 1, for all x1 if there is a 3 or an 8 in x2. To illustrate, there should be four successive 1s for all rows where x1 is 100106 ,and six 0s for all rows where x1 is 100110.

Thank you for your time and consideration.

Code:
x1     x2
100106 .
100106 .
100106 3
100106 3
100110 .
100110 .
100110 .
100110 .
100110 .
100110 .
100202 .
100202 .
100202 3
100202 3
100205 .
100205 .
100205 3
100301 .
100301 .
100301 .
100301 8
100307 .
100307 .
100307 3
100401 .
100401 .
100401 3
100401 .
100404 .
100404 .

WIOD Data using R or STATA or Matlab

$
0
0
Hi everyone!
I am working on my research paper and I am using WIOD dataseries to determine the following:
a) Gross world trade
b)Gross world value added trade

Then I am zooming into Brazil as my country study to determine:
Domestic Value added in other countries exports
Foreign Value added
Trade in intermediates
the participation and position of Brazil industries - upstream vs downstream

I am familiar with R - so I have imported the R dataseries and cleaned the matrix. However, I am not sure how to get the above values. For examples the world gross exports the sum of all rows and column? Any guidance would be appreciated

Thank you in advance,

Ross

foreach/forval loops - non consecutive values

$
0
0
Hi Statalist,

I'm trying to use foreach and forval commands along with local macros for an assignment (I can't use other commands, unfortunately). I have panel data from 1978-2002 for three different variables, but it is unbalanced - there is missing data, and what's more, the missing years of observation aren't the same for each variable. I'm trying to create average measures (mean, max, and min) over 5-year periods.

Here is a sample of my data, for the variable BMI, which I use in my examples.
Code:
   Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     bmi1981 |      5408    23.14015    3.794426   12.41162   51.48926
     bmi1982 |      5402    23.41694    3.909643   14.87748   54.86472
     bmi1985 |      5374    24.23713    4.373466    15.5041   57.03351
     bmi1986 |      5214    24.61282    4.530016   15.44678   52.89976
     bmi1988 |      5153    25.26213    4.956609   10.97429   61.23257
-------------+--------------------------------------------------------
     bmi1989 |      5184    25.44633    4.967208   15.05543   60.51997
     bmi1990 |      5158    25.87399    5.137704     15.545   54.33413
     bmi1992 |      5159    26.41051    5.437539   8.319527   58.23669
     bmi1993 |      5149    26.56395    5.391187   12.84161   61.02365
     bmi1994 |      5126    26.79813    5.518233   7.830252    61.0214
-------------+--------------------------------------------------------
     bmi1996 |      5086    27.26668    5.749034   16.13866   67.78237
     bmi1998 |      4982    27.63579     5.83412   12.91224   65.29601
     bmi2000 |      4974    28.08289    5.987594   10.14982   70.17795
     bmi2002 |      4935    28.36394    6.085327   7.601644   70.84908
I've tried the following pieces of code so far:

Code:
   foreach var in bmi hrs pov {
        foreach start of numlist 1978/2002 {
            local end=`start'+4
            cap egen mean`var'`start'`end'=rowmean(`var'`start'-`var'`end')
            cap egen max`var'`start'`end'=rowmax(`var'`start'-`var'`end')
            cap egen min`var'`start'`end'=rowmin(`var'`start'-`var'`end')
        }
    }
The problem with the above code, of course, is that it creates a new five-year period for every year (so, 1978-1982, 1979-1983...) when I need it for every 5 years (1978-1982, 1983-1987).

I then tried the below code:
Code:
forval num=1981(5)2002 {
local end=`num'+4
cap egen meanbmi`num'`end'=rowmean(bmi`num'-bmi`end')
}
The problem with this is that if there are missing values for the year, it skips it completely. So, for example, I get 1981-1985, 1986-1990; but then since there is no data in 1991 (but there is in 1992-1994...) it completely skips it.

I'm very lost as to how I could overcome this using the aforementioned commands (foreach, forval, local). I would very much appreciate any help I can get!

Thank you!

Propensity Score Matching

$
0
0
Dear All,

I have three samples of people that I will call A, B and C. I want to match each person in A with a person in B and a person in C. Is there a way to do that in Stata? Thanks a lot!

Insignificant results of Poisson regression when adding country fixed effects

$
0
0
Dear all,

I will try to explain my query as precisely and accurately as I can, but please don't mind any inefficient communication due to my language barrier.

I am evaluating the impact of foreign aid from new donors on World Bank conditions. I have 54 African countries and 33 years in my panel. The dependant variable is 'Average number of World Bank loan conditions per project' (count variable). The main explanatory variable is 'Foreign aid from new donors'. The count nature of the dependant variable gives rise to a Poisson model. I decided to use fixed effects as the Hausman test suggest that random effects are strongly inconsistent. Also, I want to control for the systematic differences in countries which may influence the dependent variable.

The problem is when I am using fixed effects or i.country, the main explanatory variable is turning insignificant. However, when I only use either i.year or running the model with random effects, the results are significant.

Let me show to you what I did:

A) Setting the data as panel
. xtset country year

B) Running poisson regression with fixed country and year effects
. xtpoisson DV IV, fe
. poisson DV IV i.country i.year

(To the best of my knowledge, there is no difference in using either xtpoisson, fe or poisson i.country i.year, so I tried using both of them)

*Both the regressions are leading to insignificant value of main independent variable.

C) Running poisson regression with random effects and including year dummies only

. xtpoisson DV IV, re
. poisson DV IV i.year

*Both the regressions are getting significant value of main explanatory variable.

As far as I have understood, the problem lies in using country fixed effects. But on what grounds can I exclude country fixed effects when I have reasons to include them in the model, both according to logic as well as Hausman test? Am I messing up the commands which is causing this problem?

I look forward to your guidance.

Problem with the option "osample" from the command "teffects nnmatch"

$
0
0
Dear Statalist,

My team and I are running a propensity score matching analysis on agricultural data in order to assess the impact of credit in several farmer outcomes. For this, we are using the psmatch2 command, but I was frustrated as we never got to completely and globally balance the treatment and control groups on the observable covariates considered. So we decided to switch to the command teffects nnmatch, that allows to perform the matching exactly on the observable covariates, which is useful for categorical variables for example.

Anyways, we are now getting an error after running the command, which is similar to the one of this thread:http://www.statalist.org/forums/foru...match-question

We first run a line of code that looks like the following:

teffects nnmatch (dep_variable x_covariate) (treatment) which gives the following error message: no exact matches for observation 23472; use option osample() to identify all observations with deficient matches
We then include the option osample: teffects nnmatch (dep_variable x_covariate) (treatment) , osample(newvar) After Stata identifies the observations for which an exact match cant be found (they have a 1 in newvar). We do the following:
teffects nnmatch (dep_variable x_covariate) (treatment) if newvar == 0 Unfortunately, we end up with an error message similar to when we first ran the command, that is, the Stata claims it didnt find exact matches for a certain observation, different than the one at the beginning. Each time I run it, it takes about 2 hours to finish, so I'm not sure how to deal with this issue without having to run several times the command with the option osample.
Does anybody know a solution for this problem?


Best regards,


Juan

Diff in Diff Power Analysis by Simulation

$
0
0
I was working on a simple multi period diff -in diff . I have a policy implemented at the state level so the diff in diff is clustered by state.

Yst = a + B1Xst+ B2 (Treats) + B3(Postt) + B4Year + B5(Post * Treat) + est

I found no statistically significant effect and one of my explanations was that the treatment group was too small, and that the treatment effect is too small to be detected at the state level with proper precision. Power analysis by simulation (varying the size of the treatment group) seems the best way to back up my claim. I've learned on my own about simulation and am near the point of writing my data generating process, but have a few concerns:

1) Is there a package that would make this easy?

2) Can I use runiform() for treat / post? Do I have to make treatment correlated with X (I would think not since this is taken care of by the treat group dummy)?

2) How can I ensure that my post treatment variable is conditional on being in the treatment group?

4) Is there literature on this type of simulation?

Best Regards,

Esteban Fernandez

Using putexcel to output titles in multiple columns on sample statistics

$
0
0
I'm attempting to generate ~40 sets of sample statistics, and as such would like to have Stata output the name of each series into excel. My current code looks like this:

//INPUTS
local Peer1 PEER1_PV_PS PEER1_PV_PE PEER1_PV_EVEBITDA PEER1_PV_EVEBIT
local ncol = 2
local nrow = 2

//LOOP
local col1: word `ncol' of `c(ALPHA)'
local ++ncol
foreach var in `Peer1'{
local name1 `var'
summarize `var', detail separator(0)
local col: word `ncol' of `c(ALPHA)'
putexcel `col'`nrow'=(`name1') `col1'`nrow'=rscalarnames `col'`nrow'=rscalars using"V:\SummaryStats.xlsx", modify
local ++ncol
}

Where each of PEER1_PV_PS PEER1_PV_PE PEER1_PV_EVEBITDA PEER1_PV_EVEBIT are series (of which there are at ~40 of), and the sections in green are those which I'm having issues with.

The current output looks like this: Array

The output I want looks like this: Array
Viewing all 73257 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>