Quantcast
Channel: Statalist
Viewing all 72781 articles
Browse latest View live

Independent variables based on same variable

$
0
0
Hi!

When it comes to Stata, I am a complete rookie. I have a question on which type of method I should be using in Stata with the current dataset I want to analyze. Please excuse my poor English.

I have gathered secondary data on the economic situation (net worth minus the worth of real estate) of 20 000 households in a city in my country, divided into 18 districts. I also have secondary data on the housing prices in the city, divided into the same 18 districts, and also separated into three categories: apartments, detached houses and townhouses.

Using the net worths and housing prices, I want to find out how much cash an average household in a specific district have when they sell their house, apartment etc.

The dependent variable is the household's worth when they sell their real estate, and I guess the household's net worth (before selling their estate) and the housing prices are independent variables. But what about the 18 districts and the three categories of housing? And how would this be done in Stata? Which method can be used? Every reply means a ton, Thanks in advance.

Create graph with different colors when y<0

$
0
0
Hello,

I would like to know the code to create a graph (lines) with the y values (in this case "diff12") in another color (for example in red) when y<0,
I send the example of the data below:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int year float diff12
1995  .5012169
1995  .5012169
1995  .5012169
1995  .5012169
1995  .5012169
1995  .5012169
1995  .5012169
1995  .5012169
1996         .
1997         .
1997         .
1997         .
1997         .
1998         .
1999   .832201
1999   .832201
1999   .832201
1999   .832201
2000 -.3124819
2000 -.3124819
2000 -.3124819
2000 -.3124819
2000 -.3124819
2000 -.3124819
2000 -.3124819
2000 -.3124819
2000 -.3124819
2000 -.3124819
2003  .4381542
2003  .4381542
2003  .4381542
2003  .4381542
2003  .4381542
2003  .4381542
2003  .4381542
2003  .4381542
2003  .4381542
2003  .4381542
2003  .4381542
2003  .4381542
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2004  -.623652
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2005 -.4752033
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2006 -.2263384
2007 -1.281406
2007 -1.281406
2007 -1.281406
2007 -1.281406
2007 -1.281406
2007 -1.281406
2007 -1.281406
2007 -1.281406
2007 -1.281406
end
format %ty year
Thank you in advance.

Metapow for diagnostic accuracy study sample size estimation

$
0
0
Hi, please how do I use metapow command to calculate sample size for diagnostic accuracy study? What codes should I use? Thanks.

Saving results as a variable using loops

$
0
0
Hello,

I am trying to bring a program that I have written in to loops to make it easier to change the variables it is using. I am trying to get a list of odds ratio for a list of different risk factors (This form is required for another program).

So the program should loop through a list of variables, perform a logistic regression and then save the odds ratios into a variable. So far i have got really close, the first variable does exactly what i want it to, however when you add the second variable the results of the first don't seem to be saved. I was hoping someone would be able to help me retain the previous results so i have one variable that lists all of the odds ratios for all of the variables.

Here is the code i have so far:


Code:
local variables a b c
gen output = .

foreach var of local variables {

   by `var', sort: gen `var'counter = 1 if _n == 1
   replace `var'counter = sum(`var'counter)
   local num = `var'counter[_N]

   foreach i of numlist 2/`num' {

      replace output = exp(_b[`i'.`var']) in `j++'
   }
}

Thanks
Cydney

Merging &quot;Payment Diary&quot; Datasets for Panel Dataset creation.

$
0
0
Hi everyone,

I would like to merge 4 publicly available datasets from the "Diary of Consumer Payment Choice" series.

The datasets are for the following years: 2015,16,17,18.

My questions:

1) 2017 and 2018 have each been split into 3 datasets: transactional-level data, individual-level data, and day-level data. 2015 and 2016 are both whole datasets that include variables from all of the above three categories, and so I don't believe there to be an issue with these two. How would I merge the three categories for 2017 and 2018? As I would then be able to merge all 4 years together once they are in the same format, as the same pool of individuals have been used to complete the surveys across time, enabling me to use panel data.

2) The payment diaries track individuals spending behaviour across three days, and so there may be up to 3 rows for the same individual (under the same "id" column) across three different days for that same year (under a "date" column). Other individuals may only have one or two entries if they did not report spending data for more than one or two days across the allocated survey dates. If I were able to merge all four datasets together, would I end up with e.g. a maximum of 12 rows for the same "id" but with 12 different date values? And would it be cleaner if I averaged out the spending (and other variables) for each year. For example, for person with "id==1020" they may spend $10 on 3rd Oct 2016, $20 on 4th Oct and $30 on 5th Oct. If I averaged these values I would then get "average daily spend across 3 days" and could label this as "2016" under a new "time" variable for each "id". After merging the four years together I would then have only four "time" entries: 2015, 2016, 2017, 2018. I am not sure how this would work for individuals who only report one date entry, for example, as their average spend would be the same as their single entry.

The aim of the project is to figure out if contactless spending in the U.S. has a causal relationship with spending values.

Here is a link to the data series: <https://www.frbatlanta.org/banking-a...e.aspx?panel=2>

I am using STATA/MP 15.1.

I would really appreciate any help and advice you may have.

Thank you in advance.

Jack


Test of Strict Exogeneity

$
0
0
Dear Statalisters,

When carrying out the test of strict exogeneity after accounting for endogeneity by use of lead variables, how does one determine the number of leads to consider?

From Roberta and Yotov (2016)"Estimating trade policy effects with structural gravity" :

In order to test whether the specification with pair fixed effects has accounted properly for possible ‘reverse causality’ between trade and RTAs, we follow Jeff Wooldridge and Baier and Bergstrand (2007) to implement an easy test for the “strict exogeneity” of RTAs. Specifically, we add a ‘future lead’ of RTAs,
Code:
RTAij,t+4
, to specification (28) and estimate
Code:
Xij,t = exp[β5RTAij,t + β6RTAij,t+4 + πi,t + χj,t + µij ] + ij,t
Their findings reveal that the estimate of the future level of RTA is neither economically nor statistically different from zero. Similar to Baier and Bergstrand (2007), we interpret this as evidence that reverse causality is not present in our specification.

In my case, I have run a model similar to the one above. However, when I include a future lead of RTA {RTAij, t+3) or RTAij, t+4}, the cofficients are statistically significant implying presence of reverse causality. When I use RTAij, t+5, the coefficients are statistically insignificant showing reverse causality is not present.

I am at a crossroad on whether I am right to presume reverse causality has been addressed based on results of the coefficient of the '5th lead of RTA'



Help on running xtivreg with interaction

$
0
0
Dear Statalist,

I have a question on the implementation of an interaction term in an IV-regression. I found some previous discussions but no solution yet, that´s why I am posting this question.
I have a panel of firms and run the following main regression where initial_size is a dummy denoting the initial size of a firm:

Code:
xtreg outcome c.tariff_rate c.tariff_rate#i.initial_size i.year, fe robust
Now I want to instrument the tariff_rate but I am unsure what the correct Stata syntax would be. I tried the following command with tariff_rate_instrument being my instrumental variable (for now the lagged tariff rate)

Code:
xtivreg outcome i.year (c.tariff_rate c.tariff_rate#i.initial_size = c.tariff_rate_instrument c.tariff_rate_instrument#i.initial_size), fe vce(robust)
but it gave me the error that "depvars may not be interactions". I don´t understand this error and on a more general level am unsure on how to exactly code in Stata IV regressions in a panel with interactions.

Any help would greatly be appreciated!

All the best
Leon

Recreating table from estpost tabstat and esttab

$
0
0
Hello,
Yesterday I failed to save my do-file and I am trying to recreate a table I made.

I have the LaTex output:
Code:
\begin{table}[htbp]\centering
\def\sym#1{\ifmmode^{#1}\else\(^{#1}\)\fi}
\caption{}
\begin{tabular}{l*{3}{c}}
\hline\hline
                    &\multicolumn{1}{c}{(1)}&\multicolumn{1}{c}{(2)}&\multicolumn{1}{c}{(3)}\\
                    &\multicolumn{1}{c}{pre}&\multicolumn{1}{c}{post}&\multicolumn{1}{c}{change}\\
\hline
2                   &       4.59         &       5.7         &       4.6         \\
                    &     (3.221)         &     (3.654)         &     (2.55)         \\
[1em]
3                   &       22.57         &       17.84         &       2.54         \\
                    &     (3.609)         &     (4.283)         &     (12.18)         \\
[1em]
Total               &       22.29         &       23.77         &       4.6         \\
                    &     (4.77)         &     (6.7)         &     (11.97)         \\
\hline
Observations        &         200         &         200         &         200         \\
\hline\hline
\end{tabular}
\end{table}
and my best attempt in recreating this table is:

Code:
eststo clear
eststo: estpost tabstat pre post change if id2 ==1 | id3 ==1 , by(id) statistics(mean sd)
esttab est1 using "post.tex", replace main(mean) aux(sd)
However, this is not inputting the data as required?
Can anybody spot where this is failing?

Thank you.

Varying results based on model specification and cluster option with panel data

$
0
0
Hi,


I am doing a study to understand the policy effects and I tried different models and I have been getting different results with different specifications. It was little unclear why I am getting some varying results.

My dataset has observations of users for different time periods and I have a policy effect which affects one group and another is control group.

When I use the following I get the expected results, that is I can see the effect of the treatment. c.id is the firm fixed effects. These are individual fixed effects so I used user in the xtset

HTML Code:
xtset user week
xtnbreg activity i.post##i.treat tenure tenuresqr i.week i.cid, fe
The number of observations in the dataset is 43433 but with this regression number of observations are dropped (and there are about 16016 observations.

HTML Code:
note: 2109 groups (27417 obs) dropped because of all zero outcomes
note: 14.week omitted because of collinearity
note: 0.cid omitted because of collinearity
As I have panel data and I probably have to specify the cluster (user). When I do this with nbreg or xtreg with cluster I am not getting the expected results.


HTML Code:
 xtreg activity i.post##i.treat tenure tenuresqr i.week i.cid, fe cluster(user)


HTML Code:
note: 1.post omitted because of collinearity
note: 1.treat omitted because of collinearity
note: 14.week omitted because of collinearity
This is giving me 43433 observations but my results are not significant. I was wondering, should I flag those observations with non-zero outcomes and cluster on the remaining observations, will that provide with similar results without the cluster option that I specified in the first model show above.



Thanks so much,
Veeresh

Data interpretation

$
0
0
Hi all,

hope to find you well in quarantine.
I have a problem which I don't know if strictly belong to Statalist domain but since I am going crazy with that I will give a try.
Basically, I matched two datasets: DB_1 went from 1996 to 2003; the other, DB_2 went from 2004 to 2015. I checked that yearly the 2 databases did not differ too much and then performed an xtreg with fixed effects and time dummies in the form i.Year. The variable y is log of sales which was increasing in the Year dimes (i.e. when I collapsed by Year both databases). Now, the output of the regression seems to me kind of weird I the time dummies. They appear all statistically significant and increasing from 1996 to 2015.

Is there something I am doing wrong here? I mea, is it normal that time dummies are all statistically significant and increasing when sales are as well? It seems that all the "increasing trend" of sales is captured by dummies (i.e. is it as if time dummies explain completely the increasing behavior of sales).

Thanks a lot,

Federico

Different results between ivreg2 and xtivreg2

$
0
0
Hi, im trying to use instrumental regression with panel data.

At first, I use xtivreg2:
xtset firm year
xtivreg2 y x1 (x2 = a b), fe robust

Then, I try to replicate the results, with ivreg2
ivreg2 y x1 (x2 = a b) i.firm i.year, robust
ivreg2 y x1 (x2 = a b) i.year, cluster(firm) robust

But both results are different from xtivreg2, fe
3 results also have same number of observations, so no singleton is removed
May I ask which function xtivreg2, fe exactly runs?

Safe drinking water, forecasting

$
0
0
Greetings!

I am currently working on a project at my university, where i am supposed to estimate the safe drinking water availability for different nations by 2030, to help figure out what nations to help bring water to.

My team and I have gathered data for the time-period 2000-2017 on a country-basis for all countries in the world from reliable sources like WASH. Our overall dependent variable is Safe water availability, and it is determined by three criteria; water must be available on premises, when needed and be free from contamination.

We've decided to follow the methodology used by the UN, meaning that we run separate regressions on the three criteria. The lowest out of the three criteria being the estimate for safe water availability.

I've run into a problem when it comes to the structure of the analysis. I know what i want to do, but the way of getting there is blurry to me. I have data on a lot of different aspects such as GDP in 2010 dollars, the urban population share, the poverty share, the life expectancy, the Government stability score etc, and i am certain that i am working with panel data because the countries remain the same after all.

I'm leaning on two different options;
(1) using fixed effects, trying to cluster for the individual countries. The trick here is that i believe that i can figure out how the beta coefficients change over time, or take the overall average of the beta coefficients and just assume that they are unchanged by 2030, then use 2030 estimates for the independent variables to calculate the availability based on the beta coefficients.

(2) try and use time series, because it is easier to extend to the water availability by 2030, however, i have little routine in using time series.

I am also dealing with a lot of missing observations, which i highly doubt i will find from additional researching, but can use interpolation or missing dummies to solve atleast partially.

I appreciate any suggestions that might come out of this, because i'm honestly swamped.

Kind regards,
Mikkel

Scaled regression variables

$
0
0
Hi all,

For my masterthesis, I want to investigated to impact of the NSFR (main independend variable) on investments made by banks. The outcomes of my regression showed a significant and positive relation between NSFR and total invesments. People then told me to 'scale' my regression, so I used the same regression but with LN(total investments) and LN(total assets). The outcomes of that regression are totally different are not significant. Did I make a mistake or is it normal that outcomes change when taking the natural logarithm of the dependend variable and/or a control variable?Array
Array
Thank you in advance,
Thomas

Command for confidence intervals of proportions for meta-analyses

$
0
0
Using Stata 16.1
Aim: Performing a Meta-analysis of proportions where there are 2 different denominator options and do a sub-analysis (ie 3 separate analyses by season)
I am preparing my data (I have 150 observations in a similar format to below):
ID Season Success Samplesize_by_1st_definition Samplesize_by_2nd_definition MalesPercent
Autumn 4 50 20 40
Winter 10 110 30 70
Spring 50 1080 680 30
Spring 5 210 28 62
Autumn 20 480 250 76
Spring 3 28 10 38
I have generated 2 new variables of proportion1 = Success/ Samplesize_by_1st_definition and proportion2 = Success/ Samplesize_by_2nd_definition

Q1) How do I generate a new variable for the 95% CI for all observations? (I would generate a separate variable for each of the 2 possible sample size options)
Using cii proportions I would have to work out each observation individually and wonder if there is a faster way. Thanks

Q2) Is there a way to use the Meta suite point/click in Stata 16.1 that would allow me to incorporate Logit transformation as it is proportion data rather than a controlled trial data and Clopper-Pearson CI into my meta-analysis? Or would I have to use the command box?
Thanks

SUCRA command in network meta analysis

$
0
0
Hello,

I have a question about SUCRA command in stata for network meta analysis. I have run my analysis with sucra prob command with one reference treatment and then I have chosen an other reference treatment, and the only changes are the results with sucra command. The first best treatment was the same in both analysis but the others' ranking was diffent.

Is that correct ?

favplots updated on SSC

$
0
0
Thanks to Kit Baum, favplots has been updated on SSC.

The original announcement in 2011 is copied here with minor edits.

Thanks to Kit Baum as usual, a new package favplots has been posted on SSC. As promised in http://www.stata.com/statalist/archi.../msg01294.html this is just a program to get a friendlier-formatted alternative to avplots after regress. Stata 9 up is required. The program is indicative, and certainly not definitive, and intended to underline that you can change the presentation of graphs if you don't like the default. Thus avplots will happily emit axis titles with coefficient estimates and t statistics to 8 decimal places and 4,5,6 significant figures, which is not what I want myself or my students to be looking at. This might be considered a belated update of avplots2 (SSC), which is for Stata 7.
The update is to match avplots in handling factor variables appropriately. Back in 2011 Dave Airey reported that it didn't do that. More recently Nikos Kakouros flagged the issue again, so thanks to them both.

Data management, frequent download of an online data set, detect changes between different versions

$
0
0
Hello All,

Every 6 to 8 hours I have to download a data set that contains information on health status of unique individuals (health status in this dataset can change very rapidly). After each download I need to create a health report listing any changes that occurred between the previous and current version.

I am struggling with making the report creation process easy and simple to repeat. I've tried the following strategy:
1) after each download create a variable that flags the health status of individuals as AM or PM depending on the time it was downloaded.
2) try to merge data together to create a difference score (unfortunately I am running into multiple merge issues, that are making this process daunting):
a) None of the unique identifiers uniquely identify each observation, for some individuals this information is missing
b) some individuals appear up to 4 times in each dataset, and in order to decide which observation to keep for these individuals I have to look at the date of assessment of their health status and keep the most recent one. (I have done this literally by reading the information listed, there must be a way to do this more efficiently?)
c) when unique identifiers are missing I try to use first and last name but that's not trustworthy as the datasets contains many individuals with similar first and last name, so I end up having to look at other demographic or personal information, phone numbers, address, to try and confirm that my merge is working (this requires a lot of back and forth between datasets and a lot of attempts to have the merge work perfectly)
d) sometimes after the last download unique identifiers are added or altered in the online database so I have to keep track of these changes to make sure the next time I merge the most recent file with the old one I add these identifiers (At this point I am unable to keep track of these changes).

Are their different steps that I may able to take? I am considering trying to append the two datasets instead of merging them and treating the file as a long file instead of wide, but I am not sure what I would do next to identify changes in health status without being stuck dealing with multiple duplicates. (I would expect that each individual has one or two observations in the appended file).

Any suggestions or thoughts on the most efficient strategy would be much appreciated.

Best wishes,
Patrick

Add mean and sd column in correlation matrix in Stata

$
0
0
I'm trying to create correlation matrix that also includes means and sd's of each variable.
```
** Set variables used in Summary and Correlation
local variables relationship commission anxiety enjoyment negotiation_efficacy similarity_values similarity_behaviors SPT_confidence own_SPT_effort


** Descriptive statistics
estpost summarize `variables'
matrix table = ( e(mean) \ e(sd) )
matrix rownames table = mean sd
matrix list table

** Correlation matrix
correlate `variables'
matrix C = r(C)
local k = colsof(C)
matrix C = C[1..`=`k'-1',.]
local corr : rownames C
matrix table = ( table \ C )
matrix list table

estadd matrix table = table

local cells table[count](fmt(0) label(Count)) table[mean](fmt(2) label(Mean)) table[sd](fmt(2) label(Standard Deviation))
local drop
foreach row of local corr {
local drop `drop' `row'
local cells `cells' table[`row'](fmt(4) drop(`drop'))
}
display "`cells'"

esttab using Report.rtf,
replace
noobs
nonumbers
compress
cells("`cells'")
```


If it helps, this is what the correlation code looks like:

```
asdoc corr relationship commission anxiety enjoyment negotiation_efficacy similarity_values similarity_behaviors SPT_confidence own_SPT_effort ranger_SPT_effort cooperative_motivation competitive_motivation, nonum
```

This correlation matrix looks exactly how it should, but I'm essentially hoping to add means and sd's to the beginning.

*This is cross-posted here: https://stackoverflow.com/questions/...44775_61471719

2SLS Panel data instrument test

$
0
0
Good Afternoon,

I have used the following code to do a 2SLS IV regression.

xtivreg lnXt (visits=cost) lngdp_cap_const_d lngdp_cap_const_o member_eu_joint member_wto_joint i.year ,fe

I would now like to test my instrument (cost) relevance and weakness. how do I go about this?

Thanks,

Sophie

Graph X axis values in descending order

$
0
0
Hi,
I am trying to use the data below to graph two variables (Payout1 and payout2). I am using the following command:

graph twoway line Payout1 Payout2 Payoutratio

However, I need X axis values (Payoutratio) in descending order. Can you please advise on how to do this?

Thank you.

----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(Payoutratio Payout1 Payout2)
                 3             4217.6             4217.6
               2.9             4217.6             4217.6
               2.8             4217.6             4217.6
2.6999999999999997             4217.6             4217.6
2.5999999999999996             4217.6             4217.6
2.4999999999999996            2530.56            1897.92
               2.4            2530.56            1897.92
               2.3            2530.56            1897.92
2.1999999999999993            2530.56            1897.92
 2.099999999999999            2530.56            1897.92
                 2            2530.56            1897.92
               1.9            2530.56            1897.92
               1.8            1687.04 1265.2800000000002
               1.7            1687.04 1265.2800000000002
               1.6            1687.04 1265.2800000000002
               1.5            1687.04 1265.2800000000002
               1.4 1687.0400000000002 1265.2800000000002
               1.3 1687.0400000000002 1265.2800000000002
               1.2  843.5200000000001             632.64
1.0999999999999983  843.5200000000001  632.6400000000001
                 1  843.5200000000001  632.6400000000001
                .9  843.5200000000001  632.6400000000001
                .8  843.5200000000001  632.6400000000001
                .7  843.5200000000001             632.64
                .6                  0                  0
                .5                  0                  0
                .4                  0                  0
                .3                  0                  0
                .2                  0                  0
                .1                  0                  0
end
------------------ copy up to and including the previous line ------------------
Viewing all 72781 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>