Loop through each observation of a string variable located in another dataset

November 29, 2015, 6:46 pm

≫ Next: Bonferroni-corrected correlation matrix to Excel

≪ Previous: Displaying specific output in a loop

Hi all, I need to loop through each observation of a (string) variable of another dataset. What is the best way of doing this?

I am trying to do:

foreach x in var_ext {
replace var1 = strpos(var2,x)
}

where var_ext is a variable in another dataset while var1 and var2 are variables in the dataset that is open.

↧

Bonferroni-corrected correlation matrix to Excel

November 29, 2015, 6:58 pm

≫ Next: xtoprobit, how to solve the missing command fe

≪ Previous: Loop through each observation of a string variable located in another dataset

Dear all,

I want to create a couple of correlation matrices (with Bonferroni correction) and include these into MS word.
I want to use Bonferroni adjustments since I have around 20 variables.

Stata's -pwcorr- command allows for Bonferroni adjustments, but does not seem to offer saving the output into a file.
And due to the amount of variables, the Stata output is quite distorted which makes it unreadable.

The stata command -mkcorr- allows to store the output into a file, but seems not to allow the Bonferroni correction.

Is there any alternative way to solve this problem?

Many thanks,
Andreas

↧

xtoprobit, how to solve the missing command fe

November 30, 2015, 8:20 am

≫ Next: Multi-level model (xtmelogit) vs. adjusting for PSU and strata (svy: logit)?

≪ Previous: Bonferroni-corrected correlation matrix to Excel

Hi, Stata users!
I have panel data and I have ordinal dependent variables, so the best econometric model to use is ordered probit. Unfortunately, Stata does not have the fixed effects option. Does anyone have a suggestion on how should I solve this? Is it feasible to do xtoprobit depvar indvar1 indvar2 i.Country i.Year (since I want to specify fixed effects at the country and year level)?

↧

Multi-level model (xtmelogit) vs. adjusting for PSU and strata (svy: logit)?

November 30, 2015, 8:27 am

≫ Next: Stata cutting off first letter in string for observations for 1 variable

≪ Previous: xtoprobit, how to solve the missing command fe

What is the appropriate way to specify models that incorporate two levels of clustering (if that is the right term)? I initially used xtmelogit (level 1=child, level 2=sibling groups, level 3=counties). These are experimental data; the intervention was implemented separately in 9 counties and served children (many in sibling groups). A colleague recommended that, since I don’t care about estimating county-level impacts, xtmelogit might be overkill and I could run models simply adjusting for strata (county) and PSU (sibling group) which I then did using svy: logit. (If I understand correctly, this suggestion is also made by the authors of GLLAMM.) However, results using the two approaches differ, which makes me think either that I’m doing something wrong, or that one approach is better than the other. Can anyone advise? Thank you in advance!

Below I've provided some sample output and definitions of my key variables.

EXPER: 1=treatment, 0=control (Independent variable of interest)
MOMCLOSE: 1= good outcome, 0=bad outcome
Siteid=county identifier (level 3 id, with dummy indicators called site# )
randcid = case id/sibling group id (level 2 id)
fpcvar =(fpc, calculated per county, number of respondents divided by number of youth in the original sample)

. svyset randcid, strata (siteid) fpc(fpcvar)

pweight: <none>
VCE: linearized
Single unit: missing
Strata 1: siteid
SU 1: randcid
FPC 1: fpcvar

MODEL 1

.
. foreach var in momclose {
2. svy: logit `var' exper, or
3. }
(running logit on estimation sample)

Survey: Logistic regression

Number of strata = 9 Number of obs = 303
Number of PSUs = 263 Population size = 303
Design df = 254
F( 1, 254) = 0.21
Prob > F = 0.6451

------------------------------------------------------------------------------
| Linearized
momclose | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exper | 1.053215 .1184012 0.46 0.645 .8440495 1.314214
_cons | .6947368 .0540147 -4.68 0.000 .5961066 .8096862
------------------------------------------------------------------------------

MODEL 2
.
. foreach var in momclose {
2. svy: logit `var' exper site268 site269 site271 site272 site273 site274 sit
> e275 site276, or
3. }
(running logit on estimation sample)

Survey: Logistic regression

Number of strata = 9 Number of obs = 303
Number of PSUs = 263 Population size = 303
Design df = 254
F( 9, 246) = 6.36
Prob > F = 0.0000

------------------------------------------------------------------------------
| Linearized
momclose | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exper | 1.135454 .1281456 1.13 0.261 .9091681 1.41806
site268 | 1.169825 .282946 0.65 0.517 .7265324 1.883593
site269 | .5536779 .1434254 -2.28 0.023 .3324338 .9221662
site271 | .8257458 .1561435 -1.01 0.312 .5690086 1.198323
site272 | 1.252144 .280899 1.00 0.317 .8049818 1.947701
site273 | 1.53346 .2869998 2.28 0.023 1.060719 2.216892
site274 | .5546835 .1121758 -2.91 0.004 .3724597 .8260591
site275 | 3.282409 .9750136 4.00 0.000 1.828688 5.891772
site276 | .7522824 .1431926 -1.50 0.136 .5171112 1.094405
_cons | .6761679 .0953349 -2.78 0.006 .5122318 .8925705
------------------------------------------------------------------------------

. svyset, clear

.
MODEL 3

. xtmelogit momclose exper || siteid: || randcid: , or

Refining starting values:

Iteration 0: log likelihood = -206.31694 (not concave)
Iteration 1: log likelihood = -203.61326
Iteration 2: log likelihood = -202.51347

Performing gradient-based optimization:

Iteration 0: log likelihood = -202.51347
Iteration 1: log likelihood = -202.47848
Iteration 2: log likelihood = -202.4783
Iteration 3: log likelihood = -202.4783

Mixed-effects logistic regression Number of obs = 303

--------------------------------------------------------------------------
| No. of Observations per Group Integration
Group Variable | Groups Minimum Average Maximum Points
----------------+---------------------------------------------------------
siteid | 9 20 33.7 57 7
randcid | 263 1 1.2 4 7
--------------------------------------------------------------------------

Wald chi2(1) = 0.11
Log likelihood = -202.4783 Prob > chi2 = 0.7348

------------------------------------------------------------------------------
momclose | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exper | 1.144452 .455809 0.34 0.735 .5243044 2.49811
_cons | .5649229 .1772804 -1.82 0.069 .3054012 1.044979
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
siteid: Identity |
sd(_cons) | .2495608 .4194843 .0092555 6.729036
-----------------------------+------------------------------------------------
randcid: Identity |
sd(_cons) | 1.9571 .7909925 .8863118 4.321548
------------------------------------------------------------------------------
LR test vs. logistic regression: chi2(2) = 6.42 Prob > chi2 = 0.0404

Note: LR test is conservative and provided only for reference.

.
.
end of do-file

↧

Stata cutting off first letter in string for observations for 1 variable

November 30, 2015, 8:44 am

≫ Next: Create variable with distinct prices

≪ Previous: Multi-level model (xtmelogit) vs. adjusting for PSU and strata (svy: logit)?

Hi everyone,

For whatever reason, Stata is cutting off the first character in my county variable (which is my first variable, if that makes any difference):
ex. Baldwin County shows up fine in my CSV, but when I import to Stata, it shows up only as "aldwin County"
All my other variables are fine.

Is there any way I can fix this?

Thank you,
Diana

↧

Create variable with distinct prices

November 30, 2015, 9:26 am

≫ Next: Zero-inflated negative binomial models taking forever

≪ Previous: Stata cutting off first letter in string for observations for 1 variable

Dear all,
In my data set I have the variable “board”, which counts the size of boards of directors, with prices 4-34.
I want to create the variable “board_sizeC” which will have the prices:
S if “board” = 4-7
R if “board” = 8-10
L if “board” = 11-34 How can I do it?

I use StataMP 13.1 in Windows 10.

↧

Zero-inflated negative binomial models taking forever

November 30, 2015, 9:34 am

≫ Next: Transforming negative values in order to solve heteroskedasity

≪ Previous: Create variable with distinct prices

Dear Stata users,

Help would be much appreciate if you could help with figuring out what's wrong with my ZINB models.
Here's the syntax I'm using:

zinb DV controls (16 of them) c.IV1##c.IV2, inflate(inflation variable) vuong

Here, IV1 is a categorical variable, and I have 3 versions of it. In version 1, it's a 2-group variable (IV=0, 1); in version 2, it's a 3-group variable (IV=0, 1, 2); in version 3, it's a 9-group variable (IV=0, ..., 8).
The models when using the 2-group variable run fine.
However, when using the 3-group and 9-group variables, the models take forever to run--they don't even start the convergence process.

I do have a lot of observations (~650K), but I doubt this is causing the problem.
I also tried running loops using the -if- command, instead of using the categorical variables, but the problem stays.

What could be the possible causes for these long running times? Please let me know if you need further information to make any suggestions or guesses.

Many thanks,
Daniel

↧

Transforming negative values in order to solve heteroskedasity

November 30, 2015, 11:25 am

≫ Next: Very different results between Heckman Maximum likelihood vs. Twostep

≪ Previous: Zero-inflated negative binomial models taking forever

Hi all,

Just a question regarding heteroskedasity. I am using the level of interest rates, which appear to be negative for some countries during my sample period. The problem is that my dependent variable and this specfic variable have high levels of heteroskedasity. I was wondering how I can solve this issue by transforming the variables. Single transformation of the dependent variable does not help, therefore I also need to transform my indepdent variable.

Further, I am using a fixed effects model and am aware of the fact that vce(robust) will also deal with heteroskedasity. However, the standard robust errors and the normal errors show large differences where for I want to control the heteroskedasity in order to prevent model misspecification.

I can not use another variable which only has positive values. I was wondering if it is possible to rescale my variable by adding a constant to all variables of at least the minimum value+0.00001 (x+costant).

Thanks a lot,

Daniel

↧

Very different results between Heckman Maximum likelihood vs. Twostep

November 30, 2015, 12:22 pm

≫ Next: Foreach command help

≪ Previous: Transforming negative values in order to solve heteroskedasity

I ran the Heckman selection model, using both ML and twostep to estimate the returns to education,
but i get completely different results.

For ML method, my t-stat is stat.sig.different from zero, but
for twostep method, my t-stat on educ became highly insignificant!

How can they give out such different results?
My dependent variables and exclusionary variables for both methods are identical.

Help please?

↧

Foreach command help

November 30, 2015, 12:48 pm

≫ Next: heckman selection model

≪ Previous: Very different results between Heckman Maximum likelihood vs. Twostep

Hi all
I have a table in long format that has a municipality descriptor and a column representing the year of observation and a income variable
I would like to have generate a column having the average income of each municipality for all the years (mean for each muncipality)

As an example, the table is like:

mun | year | income | average income
a 2000 10 ??
b 2000 12 ??
c 2000 12 ??
a 2001 561 ??
b 2001 51 ??
c 2001 65 ??

I have been trying to use foreach command without success.
Any help ?
thanks a lot

Augusto

↧

heckman selection model

November 30, 2015, 1:06 pm

≫ Next: Minimum value of remaining observations

≪ Previous: Foreach command help

heckman lwage educ exper expersq nwifeinc age, select( inlf= educ exper expersq age faminc )
heckman lwage educ exper expersq nwifeinc age, select( inlf= educ exper expersq age faminc ) twostep

(inlf= takes a binary value, either you are in labor force or not)

I ran the Heckman selection model, using both ML and twostep to estimate the returns to education,
but i get completely different results.

For ML method, my t-stat is stat.sig.different from zero, but
for twostep method, my t-stat on educ became highly insignificant!

How can they give out such different results?
My dependent variables and exclusionary variables for both methods are identical.

Help please?

↧

Minimum value of remaining observations

November 30, 2015, 1:32 pm

≫ Next: Propensity Score Matching on Panel Data

≪ Previous: heckman selection model

I have a dataset with panel observations of patients' tests result and I want to mark patients who meet certain two criteria:

Two consecutive tests, with at least two weeks apart, both fall under 60.
Tow non-consecutive tests, at least 3 months apart, fall under 60.

The panel is sorted on patid and testdate. So far, I've only been able to generate the first condition.

bys patid: gen ir = 1 if ///
test < 60 & ///
testdate[_n-1] <= testdate-14 & test[_n-1] < 60

In the second condition any subsequent value works, not only the next one, so I can't use something like testdate[_n+1]. Is there a way of referring to all remaining observations of a group?

↧

Propensity Score Matching on Panel Data

November 30, 2015, 1:41 pm

≫ Next: HEGY Seasonal Unit root

≪ Previous: Minimum value of remaining observations

Hi all,

I'm currently looking to perform a propensity score matching (PSM) estimator on panel data. My study consists of 39 countries over a 23 year period (1990 - 2012), and I'm trying to ascertain the impact of my treatment variable, which is a particular policy. To give you an idea of the treatment variable data, if the Czech Republic implemented this policy in 2002, it would be assigned a dummy value of 0 before 2002 and a 1 from 2002 to 2012. I was wondering if it would be appropriate to conduct PSM on this dataset as it is? In this case, for example, the Czech Republic in 2002 would be matched with several countries that are most similar to it (based on my control variables), with the only difference being that other countries did not implement this policy. Could I perhaps specify my matches, so that the Czech Republic in 2002 is only able to match with observations in 2002 (this way I could account for heterogeneity across time)? I would really appreciate your help. Thank you very much.

Duke

↧

HEGY Seasonal Unit root

November 30, 2015, 1:59 pm

≫ Next: Reshape Help

≪ Previous: Propensity Score Matching on Panel Data

Hi,

Just as a suggestion, It would be very useful a routine to perform seasonal unit root tests for monthly time series.

Thanks.

↧

Reshape Help

November 30, 2015, 3:29 pm

≫ Next: could not calculate numerical derivatives -- flat or discontinuous region encountered

≪ Previous: HEGY Seasonal Unit root

Hello Statalist Users,

I am using Stata 13.1 and am trying to organize county level data. I was able to reshape my data into a long form so that the year data is a variable; however, I need to reshape the data wide in order to create new variables which are under the "Description" variable. Below is a portion of the data using two counties and a few years.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int fips str11 geoname str52 description int YEAR str5 y_
1001 "Autauga, AL" "Cash receipts from marketings (thousands of dollars)" 1970 "10026"
1001 "Autauga, AL" "Cash receipts from marketings (thousands of dollars)" 1971 "12390"
1001 "Autauga, AL" "Cash receipts from marketings (thousands of dollars)" 1972 "13945"
1001 "Autauga, AL" "Cash receipts: Crops"                                 1970 "4608" 
1001 "Autauga, AL" "Cash receipts: Crops"                                 1971 "6846" 
1001 "Autauga, AL" "Cash receipts: Crops"                                 1972 "6697" 
1001 "Autauga, AL" "Cash receipts: Livestock and products"                1970 "5418" 
1001 "Autauga, AL" "Cash receipts: Livestock and products"                1971 "5544" 
1001 "Autauga, AL" "Cash receipts: Livestock and products"                1972 "7248" 
1003 "Baldwin, AL" "Cash receipts from marketings (thousands of dollars)" 1970 "27591"
1003 "Baldwin, AL" "Cash receipts from marketings (thousands of dollars)" 1971 "30553"
1003 "Baldwin, AL" "Cash receipts from marketings (thousands of dollars)" 1972 "33743"
1003 "Baldwin, AL" "Cash receipts: Crops"                                 1970 "17462"
1003 "Baldwin, AL" "Cash receipts: Crops"                                 1971 "20442"
1003 "Baldwin, AL" "Cash receipts: Crops"                                 1972 "21546"
1003 "Baldwin, AL" "Cash receipts: Livestock and products"                1970 "10129"
1003 "Baldwin, AL" "Cash receipts: Livestock and products"                1971 "10111"
1003 "Baldwin, AL" "Cash receipts: Livestock and products"                1972 "12197"
end
label var fips "FIPS" 
label var geoname "GEONAME" 
label var description "DESCRIPTION"

I need the output to look like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int fips str11 geoname int year long cashreceiptsfrommarketingsthousa int(cashreceiptscrops cashreceiptslivestockandproducts)
1001 "Autauga, AL" 1970 10026  4608  5418
1001 "Autauga, AL" 1971 12390  6846  5544
1001 "Autauga, AL" 1972 13945  6697  7248
1003 "Baldwin, AL" 1970 27591 17462 10129
1003 "Baldwin, AL" 1971 30553 20442 10111
1003 "Baldwin, AL" 1972 33743 21546 12197
end
label var year "YEAR" 
label var cashreceiptsfrommarketingsthousa "Cash receipts from marketings (thousands of dollars)" 
label var cashreceiptscrops "Cash receipts: Crops" 
label var cashreceiptslivestockandproducts "Cash receipts: Livestock and products"

I tried to use a reshape code as I have used in previous examples; however, when I execute this one, the data disappears and says there is no data for description.

Code:

reshape wide y_, i( fips geoname YEAR) j( description ) string

What am I missing? Should I have not reshaped long first?

Thank you for any help.

Amie Osborn

↧

could not calculate numerical derivatives -- flat or discontinuous region encountered

November 30, 2015, 6:44 pm

≫ Next: Calculate the asset growth rate for quarterly panel data

≪ Previous: Reshape Help

Hello Statalist Users,
I am encountering a problem while estimating double hurdle model using <craggit> command, I receive <could not calculate numerical derivatives -- flat or discontinuous region encountered> message. Frustrating part is that STATA was producing good result 3 days back with same variables. I have made no changes in variable list and observations.
I found some discussion about this issue in STATALIST but was not helpful for my issue. Is there any solution for this problem?
Thank you,
Dadhi

↧

Calculate the asset growth rate for quarterly panel data

November 30, 2015, 7:45 pm

≫ Next: Line plot 2 Y axis

≪ Previous: could not calculate numerical derivatives -- flat or discontinuous region encountered

Dear Statalists,

My main target here is to calculate the asset growth rate over the last quarter for a specific firm, the data is quarterly panel, for example:

firm_id	year	quarter	assets
37	1997	1	58891
37	1997	2	57317
37	1997	3	57993
37	1997	4	60834
37	1998	1	61299
37	1998	2	61260
37	1998	3	60447
37	1998	4	61351
37	1999	1	61655
37	1999	2	62144
37	1999	3	63308
37	1999	4	63963
242	1997	1	19925
242	1997	2	20764
242	1997	3	19845
242	1997	4	20676
242	1998	1	20663
242	1998	2	20898
242	1998	3	20761
242	1998	4	21914
242	1999	1	21404
242	1999	2	21404
242	1999	3	21824
242	1999	4	22885
279	1997	1	89224
279	1997	2	88541
279	1997	3	89206
279	1997	4	87353
279	1998	1	86906
279	1998	2	86241
279	1998	3	88516
279	1998	4	90896
279	1999	1	90206
279	1999	2	92940
279	1999	3	92302
279	1999	4	93211

firm_id represents a unique firm identifier, so, is "egen" function appropriate here?

All I need is to generate another variable named as assets_growth_rate=(Assets in Quarter_n+1-Assets in Quarter_n)/Assets in Quarter_n.

Many thanks for your help in advance.

With kind regards,

Cong

↧

Line plot 2 Y axis

December 1, 2015, 7:06 am

≫ Next: LSDVC - too many instruments

≪ Previous: Calculate the asset growth rate for quarterly panel data

Dear all,

Imagine I have groups of 5 variables called gdp1 gpd2 gdp3 gdp4 gdp5, population1 population2 population3 population4 population5.... with many observations in each variable, corresponding to 5 different years (I have the variable year, ranging 1 to 5)

I can't find how to plot the mean and standard deviations in a line with year in the horizontal axis. It would be also perfect to have two different axis, but I don't even manage with one

How can I do this?

thank you very much

joan

↧

LSDVC - too many instruments

December 1, 2015, 7:54 am

≫ Next: How to get an overall p-value for an independent categorical variable using binary logistic regression ?

≪ Previous: Line plot 2 Y axis

I have a problem with the implementation of LSDVC method on Stata software. I have two panels ( N=8 and T= 120 and N=16 and T=180), but when I am trying to implement the command the software say that there is an error of two many instrument. Can I reduce the number of lag with the command LSDVC?

↧

How to get an overall p-value for an independent categorical variable using binary logistic regression ?

December 1, 2015, 8:14 am

≫ Next: Using outreg2 to Generate a Table with Dependent Variables at the Head of Each Row

≪ Previous: LSDVC - too many instruments

Hello,

I would like to know how to get an overall p-value for an independent categorical variable using binary logistic regression ? When I run the binary logistic regression model I just get p-values for each group of the categorical variable, instead an overall p-value for the variable itself...

Could somebody help me???

Thank you very much!

↧