Highly Correlated variables

November 16, 2016, 3:43 am

≫ Next: replace if id== (60 difference ids)

≪ Previous: repeated measure ANOVA error in repeat option

Hi,

I am struggling to decide if I should include two variables that are highly correlated (0.75) in regressions. Lets say the two variables are variable A and variable B. Variable A is theoretically identified as an important variable for my work. When I run a regression using just variable A along with other independent variables it is strongly significant ,however when I include variable B ,variable A turns insignificant.

Hence my question is whether it is possible to argue that since variable A and Variable B are highly correlated I am excluding variable B from my Model as it introduces multicollinearity.

Regards,
Naveed

↧

replace if id== (60 difference ids)

November 16, 2016, 10:24 am

≫ Next: Entering names in the panel variable data so that they appear in the graphics

≪ Previous: Highly Correlated variables

I have id sorted as numbers (111, 112, 113, etc)..
i need to replace the value of the variable (dummy) for these 60 countries.

i can do

replace dummy=1 if id==111 | if==112 | if==113 .... and go on

but is there a faster way to include all the numbers without repeating "if" and "|"?

Thanks!

↧

Entering names in the panel variable data so that they appear in the graphics

November 16, 2016, 11:22 am

≫ Next: hausman (not positive definite)

≪ Previous: replace if id== (60 difference ids)

Hello all,

I'm working with a panel database consisting of 27 states ("ufs" variable) and 5 years ("ano" variable). Since to declare in the Stata the database as panel data the "ufs" variable has to be numeric, I would like to know how can I generate graphs with the "xtline" command so that the states are identified in the graphics by their names and not by numbers.

The following is an example of the data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(ufs ano c_agua c_sanea c_lixo) double c_eletri
11 1995  70.13055 69.979805  64.66196 92.16683959960938
11 1999  78.71894  65.56843  79.75066 94.88819122314453
11 2003  79.97647  29.19688  86.09749 98.85565948486328
11 2007  80.35242 26.919937  66.68182 96.37512969970703
11 2011   89.5533 74.009575  71.45464 97.26209259033203
12 1995  41.06366   43.6436 67.637566 90.82560729980469
12 1999  54.21292  54.83765 67.468445 92.68186950683594
12 2003  48.37075  55.72659  78.12277 96.14187622070313
12 2007   54.3322  54.97054  60.93433 79.70413970947266
12 2011  66.42459  46.31048  71.11994 65.96186065673828
13 1995   70.4156  41.83774  61.66943 95.74156188964844
13 1999  77.66714  44.24916  74.29386   97.355712890625
13 2003  75.16041  65.98423   84.0844 95.04579162597656
13 2007  68.67624  63.70018   67.1271 62.38432693481445
13 2011  75.88032  60.19961  70.12315 92.59435272216797
14 1995  80.12115  45.72259  78.20289 94.86805725097656
14 1999  90.13356  59.84703  90.12817 94.23001098632813
14 2003   81.2544  81.56107  90.99955 92.30663299560547
14 2007  80.02519  82.05774  74.84228  90.9218978881836
14 2011  91.30391  92.30712  78.60506 79.00890350341797
15 1995  57.16278  30.63802  49.59181 93.66167449951172
15 1999  65.73316  37.51514  72.52907  93.9507064819336
15 2003  60.46532  54.61437  81.88491 98.43669128417969
15 2007   56.1209  48.68232  68.92506 74.73282623291016
15 2011   68.3638  47.06928  62.93219 93.52276611328125
16 1995  83.26266  52.88682  85.04466 87.99190521240234
16 1999  85.65773  62.45413  87.29572                50
16 2003  86.85825  8.416875  92.61077               100
16 2007  85.59927  56.31346  89.56208 81.36116027832031
16 2011  81.02116  47.72298  82.34124 93.74565124511719
17 1995  57.86197  38.76949  48.79651 77.99098205566406
17 1999  64.24211  39.57572  58.11267 78.25991821289063
17 2003    68.249 20.311735  65.71319  80.6869888305664
17 2007   78.2776 34.449173  71.33123 85.90726470947266
17 2011  84.74978  35.94188  73.55253  97.3940200805664
21 1995  47.83818  54.68647  36.12455  70.2873306274414
21 1999  54.96838  51.89187  46.13671 84.13594055175781
21 2003   45.9462    64.474  46.00704 72.51520538330078
21 2007  56.86871   66.0366  54.18975 84.07508087158203
21 2011 64.260994  59.06073  44.78274 97.91349029541016
22 1995  39.83717  62.85266 26.950205  64.4334945678711
22 1999  48.18757 76.191376  36.14882 71.33477783203125
22 2003  46.62004  71.95264 37.724396  70.8060302734375
22 2007  58.11373  84.53735  44.69562 82.36566162109375
22 2011  75.41342  90.93296  55.73922 77.72025299072266
23 1995  61.11352  65.31702  60.02096 79.82911682128906
23 1999 67.293686  59.87697  65.40835 85.84603881835938
23 2003  62.99915  46.83268  62.96225 90.56932830810547
23 2007  75.81334  49.28032  65.70203 96.02381896972656
23 2011  77.00421  50.69102  69.38615 98.89627075195313
24 1995   49.6605 31.497255  58.40273 88.69646453857422
24 1999  67.71037  39.57728  70.93141 93.20174407958984
24 2003  79.18073  49.64762   76.8028  96.9113998413086
24 2007   82.6077  50.12773   80.8027 95.53111267089844
24 2011   90.5584 74.935165  84.47235 99.00155639648438
25 1995  68.84746  41.17607  58.79748 89.35157012939453
25 1999  70.54264  46.37498 69.355675  95.9611587524414
25 2003  73.40282   41.7138  68.43815 96.85930633544922
25 2007  80.05762  55.15846 75.144554 96.29634857177734
25 2011  82.20724  64.00437   80.3877 76.19192504882813
26 1995  60.12439 35.653267  52.62579 87.18683624267578
26 1999  65.83828  38.53943  64.52209   93.156494140625
26 2003  65.95528  41.25827  67.19721 96.39022064208984
26 2007  71.75505  54.85276  68.34307 98.18158721923828
26 2011  83.52392  65.24072  79.17237                95
27 1995  66.58415  49.33341 65.113464 90.81238555908203
27 1999  76.26701  52.71275  77.42204    92.88916015625
27 2003   60.0394  19.58231  62.75574 89.88362884521484
27 2007  72.07579  29.77365  69.59839  94.1175765991211
27 2011  78.00479   47.9293  71.99599 98.63948822021484
28 1995  73.88713  46.21758 64.132256 92.01268005371094
28 1999  76.86752  54.97613  68.26872 96.78164672851563
28 2003 76.011536  57.04014 70.802505 89.77570343017578
28 2007  88.88799  67.34151  79.07401 97.68515014648438
28 2011   84.9326  65.43551 37.954727 98.82212829589844
29 1995  57.61999  52.68351  49.43821 77.36774444580078
29 1999  63.07106  51.67653  55.54785 81.17224884033203
29 2003  60.65941  49.01187  59.97225 83.09835815429688
29 2007  76.93422  60.18959  66.15131 93.24917602539063
29 2011  83.74115  56.02823  71.43936 97.15894317626953
31 1995  83.02977  68.29614  65.79865 89.90357208251953
31 1999  90.93318  72.87088  76.89732 95.85769653320313
31 2003  92.24457  73.25545   80.2851 96.85916900634766
31 2007   95.0182  78.06746  83.16222 98.77991485595703
31 2011  97.54518  76.90763  86.75912 99.90019226074219
32 1995  81.36028   61.2541  59.83765 93.92584991455078
32 1999   91.7849  66.39647  73.15999 98.93997955322266
32 2003   95.0909 65.931015  72.76338    94.50634765625
32 2007  98.87093  74.40137  82.13373 97.91439056396484
32 2011  91.30183  80.01942  88.24581          99.27065
33 1995  85.88965  80.74239  74.17772 95.69561767578125
33 1999  91.08992  80.60247  87.83428 98.25041198730469
33 2003  97.41634  82.95869  96.22091 99.92850494384766
33 2007  98.32677  88.61712  97.33032          96.22874
33 2011  98.47787  88.09235  96.70547             99.38
35 1995  90.41308  80.09817  86.46069 96.92806243896484
35 1999  93.28075  83.39911  91.29987  98.1938705444336
35 2003  98.99474  89.26913  96.75125 99.88768005371094
35 2007  99.32756  91.70065  98.14366 98.79551696777344
35 2011   99.1378  95.42812  98.72699          99.50583
end

and the syntax used to generate the graphics:

xtline c_agua c_sanea c_lixo c_eletri, ytitle(Taxa de Cobertura) yscale(range(0 100)) ttitle(Ano) tscale(range(1995 2015))

Thanks in advance.

Girlan Oliveira

↧

hausman (not positive definite)

November 16, 2016, 11:23 am

≫ Next: Predicted probability for another probit regression

≪ Previous: Entering names in the panel variable data so that they appear in the graphics

Dear Statalist-users,

I need your help. here is the model I use on stata to measure the effect of ESG (Environnmental, Social, Governance) performance on qtobin (performance):
I have an unbalanced panel with large N, small T.

I am trying to choose between RE and FE model however when I conduct an hausman test, the following appear : (V_b-V_B is not positive definite)
Also i downloaded the module xtoverid to account for heteroskedasticty and autocoreelation (source : Mark Schaffer) however it does not work after : "xtreg, re vce(cluster panel id)"
I read some of the post that advised to use hausman with sigmamore and sigmaless however the result are different from the original hausman test, which one should I use please ?

1 attempt with hausman test :
xtreg qtobin esg levier tventes logassets i.year, fe
estimates store fixed
xtreg qtobin esg levier tventes logassets i.year, re
estimates store random
hausman fixed random
chi2(17) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 8.81
Prob>chi2 = 0.9461
(V_b-V_B is not positive definite)

2 attempt with hausman test and sigmamore:
xtreg qtobin esg levier tventes logassets i.year, fe
estimates store fixed
xtreg qtobin esg levier tventes logassets i.year, re
estimates store random
hausman fixed random, sigmamore
Test: Ho: difference in coefficients not systematic
chi2(17) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 94.27
Prob>chi2 = 0.0000

3 attempt with hausman test and sigmaless:
xtreg qtobin esg levier tventes logassets i.year, fe
estimates store fixed
xtreg qtobin esg levier tventes logassets i.year, re
estimates store random
hausman fixed random, sigmaless
Test: Ho: difference in coefficients not systematic
chi2(17) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 94.99
Prob>chi2 = 0.0000

4 attempt with xtoverid:
. xtreg qtobin esg levier tventes logassets i.year, re vce(cluster companynum)

Random-effects GLS regression Number of obs = 8,729
Group variable: companynum Number of groups = 871

R-sq: Obs per group:
within = 0.1687 min = 2
between = 0.2265 avg = 10.0
overall = 0.2046 max = 14

Wald chi2(17) = 778.44
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

(Std. Err. adjusted for 871 clusters in companynum)
------------------------------------------------------------------------------
| Robust
qtobin | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
esg | .0040501 .0008212 4.93 0.000 .0024407 .0056596
levier | -.0001237 .0025782 -0.05 0.962 -.0051768 .0049295
tventes | -.0002569 .0000141 -18.16 0.000 -.0002846 -.0002291
logassets | -.5623834 .0389293 -14.45 0.000 -.6386835 -.4860833
|
year |
2003 | .2826151 .0543909 5.20 0.000 .1760109 .3892194
2004 | .1633955 .0536977 3.04 0.002 .0581498 .2686411
2005 | .2557374 .0535302 4.78 0.000 .1508202 .3606546
2006 | .3497602 .0573786 6.10 0.000 .2373002 .4622202
2007 | .455518 .0602367 7.56 0.000 .3374563 .5735796
2008 | -.1981683 .0537734 -3.69 0.000 -.3035623 -.0927743
2009 | -.0686121 .0555378 -1.24 0.217 -.1774641 .0402399
2010 | -.0442643 .0587347 -0.75 0.451 -.1593821 .0708535
2011 | -.0372261 .0591157 -0.63 0.529 -.1530907 .0786386
2012 | .0322761 .058474 0.55 0.581 -.0823308 .1468831
2013 | .2631256 .0613331 4.29 0.000 .1429149 .3833363
2014 | .2181927 .0608679 3.58 0.000 .0988937 .3374917
2015 | .2377145 .0656265 3.62 0.000 .1090889 .3663401
|
_cons | 10.23278 .6051246 16.91 0.000 9.046753 11.4188
-------------+----------------------------------------------------------------
sigma_u | 1.048925
sigma_e | .61969319
rho | .74127255 (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid
2002b: operator invalid
r(198);

Thanks again for your time

↧

Predicted probability for another probit regression

November 16, 2016, 11:23 am

≫ Next: Sorting a Range of Variables Using gsort?

≪ Previous: hausman (not positive definite)

Hi all,

I am performing the following task:

1. Perform a probit regression on a binary variable.
2. Predict the probability from 1.
3. Use the predicted probability in another probit regression on another binary variable as one of the explanatory variable.

The outcome of the marginal effects is however disturbing as the predicted variable has a marginal effect that is larger than 1 and statistical significant. Does anyone have an idea how to solve this?

Thanks in advance.
Felix

↧

Sorting a Range of Variables Using gsort?

November 16, 2016, 12:13 pm

≫ Next: Reference year inflation rate applied to different years.

≪ Previous: Predicted probability for another probit regression

Hi there,

I need to sort on a large number of variables, including one variable that needs to sorted from largest to smallest (descending). Can you do something like below, where the data are sorted ascending on all variables between make and trunk (including both of these variables) and then sorted descending on foreign?

Code:

sysuse auto, clear
gsort make-trunk +foreign

Trying to avoid having to write the following:

Code:

gsort -make -price -mpg -rep78 -headroom -trunk +foreign

Thanks,
Erika

↧

Reference year inflation rate applied to different years.

November 16, 2016, 2:11 pm

≫ Next: Fixed effect model doesn't work...Where is the mistake?

≪ Previous: Sorting a Range of Variables Using gsort?

Hi all,

I am analysing some trial data of respondents who have filled in questionnaires over three separate years. I wish to inflate the data from 2013, 2014, 2015 to the equivalent 2016 dollar amount.

For example someone who filled in a questionnaire on 12/5/2013 i wish to inflate there cost by 2.7%.Eg 100*102.7.
Someone with another date 1/2/2014 i wish to inflate by 2%.

Is there a way i can make stata assign values for dates and then multiply by the relevant inflation amount?

Thanks

↧

Fixed effect model doesn't work...Where is the mistake?

November 16, 2016, 3:27 pm

≫ Next: Propensity score matching in cross sectional study?

≪ Previous: Reference year inflation rate applied to different years.

Hey everybody,

you probably heard it a thousand times but I have serious troubles with my fixed effect model. But lets start from the beginning:

I'm currently doing a panel regression with 23 different european countries for the time period 2001-2012. In my research I want to explore the impact of sovereign credit ratings on the M&A activity. I there fore gathered all the data including GDP, Inflation, Interest rates, Exchange rates, M&A volume and some other control variables. However, from a logical and theoretical perspective I would say that one would apply a fixed effect model for this kind of thematic (thats actually also want all other researches did so far). However, when using a fixed effect model I only get very small t-statistics (around 0) for all my variables, what is pretty surprising. At least for GDP I should find something significant... The Hausman test also suggests to take a random effects model. When using an re I indeed get better results but the standard error and coefficient of the rating is extremely high and all other coefficients are extremely low. So currently I'm not really sure how to go one. One the one hand I have more or less good results when using a random effects model but in theory a fixed effect model should be appropriate. I don't know where the mistake could hide as I think my data is pretty good and also the correlation table totally makes sense.

My code which I used for the fixed and random effects model:

egen country1 = group(Country)

xtset country1 Year, yearly

xtreg MAvolume LnGDP LnRating Inflation InterestRates ExchangeRates TradeOpeness, fe

I also posted my results and I hope that you understand my issue

Thank you very much for your help!

↧

Propensity score matching in cross sectional study?

November 16, 2016, 5:54 pm

≫ Next: Bivariate probit regression_Willingness to pay

≪ Previous: Fixed effect model doesn't work...Where is the mistake?

Hello,

Can I use propensity-score matched (PSM) design for reducing selection bias in cross sectional study?
Some studies use PSM in cross sectional study, however PSM was originally developed for longitudinal cohort study.

Thank you
Tyler Rim,

↧

Bivariate probit regression_Willingness to pay

November 16, 2016, 11:55 pm

≫ Next: creating parallel loops for two sets of varlists

≪ Previous: Propensity score matching in cross sectional study?

Dear all,

I am using bivariate probit regression, but there is a problem here.

I want to estimate "Willingness To Pay" using contingent valuation method(double dichotoumous question).

In order to do this, I need to use bivariate probit regression, but for some reason, STATA does not show the result.

It only says "not concave"

- dependent variables : response1 response2
- independent variables : price1 price2 gender age number_of_family etc

If I put only one price, either price1 or price2, it shows the result.

However, as fas as I know, I need to put "price1" and "price2" together.

Do you have any idea why it doesn't show the result?

There was no missing data.

Do I have to set options? In that case, what kind of options should I set?

Attached is small part of the whole data file.

Please help.

Thank you.

↧

creating parallel loops for two sets of varlists

November 17, 2016, 1:38 am

≫ Next: Generate age variable

≪ Previous: Bivariate probit regression_Willingness to pay

Dear statalisters,

I am having some difficulty with parallel loops (not nested), really hoping to get some help!

Here is a subset of my dataset

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 firm str5 id str25 name byte a double waistbeltattachknsbyfolder int waistbelttopstitchkns byte(waistbeltattachsn waistbelttopstitchsn b)
"7001" "43"  "MALEKA"             . . .  .   . .
"7001" "78"  "MUSA. FUL BANU"     . . .  .  50 .
"7001" "95"  "RAJU MIAH"          . . . 80   . .
"7001" "101" "MOUS.MOMINA"        . . .  .   . .
"7001" "208" "OSNA"               . . .  .   . .
"7001" "225" "JAHANARA"           . . .  .  50 .
"7001" "227" "MAKSUDA"            . . .  .   . .
"7001" "313" "KULSUM"             . . .  .   . .
"7001" "383" "HABIBA"             . . .  .   . .
"7001" "390" "MUSA. JESMIN BEGUM" . . .  3 100 .
"7001" "391" "RINA"               . . .  .  55 .
"7001" "398" "MUSA.AKHI"          . . .  .   . .
"7001" "406" "JESMIN AKTAR"       . . .  .   . .
"7001" "672" "MS.MARJIA AKTER"    . . . 50   . .
"7001" "699" "MS.RINA"            . . .  .   . .
"7001" "783" "MS.SAHERA BEGEUM"   . . .  .   . .
"7001" "817" "NASIMA"             . . .  .   . .
"7001" "883" "MS.SAJEDA BEGUM"    . . .  .   . .
"7001" "885" "MS.HOSNEARA KHATUN" . . .  .   . .
"7001" "896" "MS.HUJERA BEGUM"    . . .  .   . .
end

The variables a-b are different processes. Each of which uses a different machine. My end goal is to identify how many machines each individual uses. If a number is reported for an individual for a certain process, it tells us that they are able to use the machine assosciated with that process (for instance the individual with id 95 can only use the machine associated with waistbeltattachsn) So far I have this:

Code:

foreach var of varlist a-b {
qui gen `var'm = "1" if `var' !=.
}

foreach var of varlist am-bm {
qui replace `var' = "Kansai" if "`var'" == "waistbeltattachknsbyfolderm"
qui replace `var' = "Kansai" if "`var'" == "waistbeltattachknsbyfolderm"
qui replace `var' = "Kansai" if "`var'" == "waistbelttopstitchknsm"
qui replace `var' = "SNLS" if "`var'" == "waistbeltattachsnm"
qui replace `var' = "SNLS" if "`var'" == "waistbelttopstitchsnm"
qui replace `var' = "SNLS" if "`var'" == "waistbeltmouthclosem"
}
* This part creates a new variable which identifies the machine associated with the process - ie Kansai machine for 'waistbeltattachknsbyfolder' -

Kansai, SNLS are the types of machines. The issue here is that the new variables var'm' all contain the machine type regardless of whether the individual has a score reported on the process.
What I would like to do is something like this, however I'm not too sure how it could be written

Code:

foreach var of varlist am-bm {
replace `var' = "" if subinstr(`var',-1,1) ==.
}

Here, I would hope it would take the var say for the first instance "waistbeltattachknsbyfolderm" and replace it with an empty cell if the the variable "waistbeltattachknsbyfolder" (without the m at the end) = .

However, I get invalid syntax. I think if I was able to assign another varlist where the loop would not be the product of the two varlists it would work perfectly, but not sure how to do that!

Not sure if I've taken a far too long approach here or whether anyone has a better idea of how to do this!
Could really do with some help,

Thanks statalisters!

↧

Generate age variable

November 17, 2016, 1:59 am

≫ Next: Control Function Approach and Nonlinear Estimation

≪ Previous: creating parallel loops for two sets of varlists

Dear Statalisters,

I have a data setswith different waves. Now I want to generate an Age variable. I have the birth year and the birth month and the day, month and year of the interview. Theses are all individual variables, which are numeric and discrete.

Thanks for your help

Best,

Friederike

↧

Control Function Approach and Nonlinear Estimation

November 17, 2016, 2:00 am

≫ Next: Days b/w dates in long dataset with all dates in one column

≪ Previous: Generate age variable

Hi,
I have a question regarding using a control function approach to deal with endogenous regressors in nonlinear estimation. Especially comparing a two-step approach with "ivpoisson cfunction".

Suppose you have a set of exogenous variables X and instruments Z. Y2 is endogenous. Say you want to regress Y1 on X and Y2 using poisson regression.
Then estimating
E(Y1|X,Y2) = exp( b1 X + b2 Y2)
will give inconsistent estimates.

As far as I know, you could use a control function approach using OLS to regress Y2 on X and Z to calculate the residuals u_hat and use the residuals in the Poisson Regression
E(Y1|X,Y2,u_hat) = exp( b1 X + b2 Y2 + b3 u_hat)
to produce consistent estimates. (To calculate robust standard errors one has to apply e.g. bootstrapping.)

I compared the latter with the command "ivpoisson cfunction". I was wondering why my point estimates are different. Any ideas?

↧

Days b/w dates in long dataset with all dates in one column

November 17, 2016, 2:13 am

≫ Next: Does it makes sense to use robust VCE clustering at aggregate level for cross-country data?

≪ Previous: Control Function Approach and Nonlinear Estimation

Hello,

I have long cohort data and would like to know number of days between visit dates for each unique patient id without reshaping the data. How can I have number of days between two consecutive visits and number of days between first and some random visit?
The data looks like:

PatientID	Visit#	DateofVisit
10-915	1	1-May-15
10-915	2	13-Dec-15
10-915	3	18-Feb-16
10-915	4	21-Mar-16
10-915	5	19-Apr-16
10-915	6	17-May-16
10-915	7	16-Jun-16
10-915	8	11-Jul-16
10-986	1	30-Aug-15
10-986	2	1-Sep-15
10-986	3	20-Sep-15
10-986	4	1-Dec-15
11-1050	1	2-Jul-16
11-1050	2	3-Sep-16
11-1050	3	14-Oct-16
2-109	1	4-Mar-15
2-109	2	25-May-15

Best,
Preeti

↧

Does it makes sense to use robust VCE clustering at aggregate level for cross-country data?

November 17, 2016, 2:53 am

≫ Next: Rcall command: "too many numeric literals, r(130)"

≪ Previous: Days b/w dates in long dataset with all dates in one column

Hello Statalists,

Stata manual mentiones that

The cluster–robust VCE estimator requires that there are many clusters and the disturbances are uncorrelated across the clusters. The panel variable must be nested within the cluster variable

Also, some econometrics materials suggest two conditions under which robust VCE clustering at aggregate level should be used:
1. Data are nested (e.g. student-classroom-school)
2. Explanatory variables are at aggregate level, but dependent variable is at individual level.

So I feel that robust VCE clustering at aggregate level (rather than individual level) applies more to survey data that has a clear nested structure.

My question is that for cross-country database with each country as panel variable, does it make sense to cluster at continental level -vce(cluster continent)-, rather than at country level -vce(robust)- or -vce(cluster panelvar)-? Here continent refers to regions like Southeast Asia, North America......

The relationship between country and continent is somewhat similar to that between, for example, student and class. But two differences are:
1. the nested structure of country-continent is naturally formed, unlike the nested structure of student-class that may be organised by people's decision.
2. both depvar and indepvar are at the same level (country-level depvar is regressed on country-level indepvar).

So these two points doubt the use of robust VCE clustering at continental level for cross-country data. But i am not quite sure if this doubt holds.

Thank you very much.

↧

Rcall command: "too many numeric literals, r(130)"

November 17, 2016, 3:36 am

≫ Next: Merging two data files

≪ Previous: Does it makes sense to use robust VCE clustering at aggregate level for cross-country data?

I have started to use Rcall-command in Stata 14.2 and I get after each R: command message "too many numeric literals
r(130);". Otherwise the commands go throught well. Any idea how to get rid of this error message?

↧

Merging two data files

November 17, 2016, 8:09 pm

≫ Next: Extracting data from multiple excel files

≪ Previous: Rcall command: "too many numeric literals, r(130)"

I have two data files (2012 and 2013) on Medicare Provider Utilization and Payment Data (from data.cms.gov). The example of top ten highest and lowest observations is given below:

HTML Code:

. hilo npi code ama
10 lowest and highest observations on npi

  +-------------------------------+
  |        npi    code        ama |
  |-------------------------------|
  | 1003000126   99222     135.25 |
  | 1003000126   99223     198.59 |
  | 1003000126   99231      38.75 |
  | 1003000126   99232      70.95 |
  | 1003000126   99233     101.74 |
  |-------------------------------|
  | 1003000126   99238      71.06 |
  | 1003000126   99239     105.01 |
  | 1003000134   88304      11.64 |
  | 1003000134   88305   37.72996 |
  | 1003000134   88311       12.7 |
  +-------------------------------+

  +-----------------------------+
  |        npi    code      ama |
  |-----------------------------|
  | 1992999825   99213    76.05 |
  | 1992999825   99214    80.25 |
  | 1992999825   99214   112.18 |
  | 1992999825   99215   150.24 |
  | 1992999874   99221     96.1 |
  |-----------------------------|
  | 1992999874   99222   130.23 |
  | 1992999874   99223   191.48 |
  | 1992999874   99232    68.59 |
  | 1992999874   99233    98.33 |
  | 1992999874   99239   101.33 |
  +-----------------------------+

For each npi (primary identifier) there recorded several different codes (code) and respective outcomes (ama). As such, when I use npi as a match variable in -merge- it results in an error (r[459]).
Please suggest a correct way to merge such files. Thank you!

↧

Extracting data from multiple excel files

November 17, 2016, 10:58 pm

≫ Next: ppml and multicolinearity

≪ Previous: Merging two data files

Hi guys,

Thank you in advance for any advice any of you have. The problem I have encountered is running a loop to extract data from multiple excel files into 1 Stata file. The task I wish to complete is described below:

I have approximately 800 excel files in one folder in my computer. Each .xls file is a quarterly report for a specific bank in my dataset. The name of each file corresponds to the specific bank, year and quarter as follows: 'bank-name-year-quarter.xls'. From each file I need to extract 2 columns (the variable names in 1, and the values in the other). I also need to get stata to recognise the year and quarter from the file names in order to create a panel data set.

Ideally the panel data set would have the following structure in long form:

Bankname Year Quarter {variables 1-n}.

I am new to writing a loop for this purpose, and have not had much success in writing the appropriate code.

Hopefully someone has dealt with a similar issue before!

Regards,

Sascha

↧

ppml and multicolinearity

November 17, 2016, 11:00 pm

≫ Next: Separating my minimum value?

≪ Previous: Extracting data from multiple excel files

hi, I have a problem with my data set. I have observed severe multicolinearity between independent variables (VIF>10). Can I apply PPML method for the data set with out considering this multicolinearity. (Correlation between independent variables)

↧

Separating my minimum value?

November 17, 2016, 11:25 pm

≫ Next: Is it possible to alter the stata default factor variable list {1 0} coding to {1 -1} ?

≪ Previous: ppml and multicolinearity

Hello,

I am finding difficulty finding information on this, likely because I believe I am not wording the question properly. If I have a large data set of heart rates, and I want to separate based on two groups (i.e. people with heart rates less than 100 and people with heart rates greater than 100), how would I do that? Thank you for your time.

↧