Quantcast
Channel: Statalist
Viewing all 73049 articles
Browse latest View live

nested logit model cannot converge

$
0
0
Hi, all

I’m running nested logit models and I always encounter the following outcome:
……
Iteration 444: log pseudolikelihood = -4999.7542 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian
Iteration 445: log pseudolikelihood = -4999.7542 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian
Iteration 446: log pseudolikelihood = -4999.7542 (backed up)
cannot compute an improvement -- flat region encountered

Does anyone know what might be wrong? It is quite disappointing after waiting a long time for the iteration and getting such results.
Best regards,
Mei

calculating OR with 95% CI from margins output

$
0
0
How can one calculate 95% CI of OR using output of margins command? For example in the margins output below-
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at#var |
1 0 | .3653013 .0049853 73.28 0.000 .3555303 .3750724
1 1 | .409233 .017422 23.49 0.000 .3750866 .4433794
2 0 | .3928284 .0064312 61.08 0.000 .3802235 .4054332
2 1 | .5142021 .023704 21.69 0.000 .4677432 .5606611
------------------------------------------------------------------------------

The OR for 2 0 vs 1 0 would be .3928284/.3653013 but how can we calculate the corresponding 95% CI of this OR?

dataset for panel data analysis-turning rows into columns?

$
0
0
Dear all,

my question might seem pretty basic, at first.

I have dataset which looks as follows:

Code:
 
firm revenue2013 revenue2012 revenue2011 revenu2010 revenu2009
1 4234 234 111 423 313
2 2342342 5676 44 43534 3453
3 2342 232 33 34534 4534
4 0 88 222 57567 567
5... 234 234 555 68678 867
Variable "firm" represents firm ID.
Variables revenue 2013-revenue2009 represent firms' revenue for the period 2013-2009. I would like my dataset to be modified for panel data analysis. Hence, I would like the follow dataset "format":

firm year revenue
1 2013 4234
1 2012 234
1 2011 111
1 2010 423
1 2009 313
2 2013 2342342
2 2012 5676
2 2011 44
2 2010 43534
2 2009 3453
.. .. ..
I know that the following command is needed to declare data a panel
Code:
xtset firm year, year
However, this does not solve my issue.

How I can modify current dataset format (table 1) into dataset format given in table 2.

Thank you in advance, whoever knows the answer to my question!

Mina

# of observations in my panel data set

$
0
0
Hello,

I have a panel data set of, according to stata, 5296 observations. This can be seen either using the count command or at the right bottom in the data editor.

My question is the following: How can it be that according to Stata I have 5296 observations, which is equal to the total number of columns, while I have multiple variables?

According to my knowledge a observation is just a data point in a data set, so either a value of (in my case) total assets of firm X in year T, total assets of firm X in year T+1 and, total assets of firm Y in year T and so on.

It is not logical to me, to see that I have 5296 columns, each column representing firm X, Year T quarter t, while I have more than 12 variables.

Does stata means something else with observations with respect to how I see a observation? I read something about _n and _N, maybe thats explaining it.

Thank you in advance for providing a answer to my questions.

Yannick

To be more clear, this is how my dataset looks like


1 Bank X Year Q1 assets equity
2.Bank X Year Q2 assets equity

........
3. Bank Y Year Q1 assters equity

...................
....................

5296 Bank Z year q1 assets equity

Generating variables for time series data

$
0
0
Hi, I have to generate a .do file that creates a pretend set of time series data, in order to be able to answer a set of questions using that data. So far, I've set the number of observations and I've set a seed... the next instruction tells me to 'generate covariates'. I can't figure out how to generate y. Obviously I am very new to this, any help is very much appreciated!
Thanks.

How can I save a spatial weight matrix into a .dta file?

$
0
0
Hello everybody,

forgive me for my stupid question, I would just kindly like to know if there is a possibility, in Stata, to save as a .dta file a spatial weight matrix.

For instance, in my case, after generating the spatial weight matrix with the command spatwmat

"spatwmat, name(W) xcoord(longitude) ycoord(latitude) band(0 x) standardize"

I would like to save this generated spatial weight matrix into a .dta file.

Is there a way?

Thank you very much!

best,

Kodi

How to write Stata Command corre

$
0
0
Dear Stataliters,
I'd want to obtain a weight vector by writing a stata command to realize its compute process. I have carried out these by a Stata .do profile, but I want to convert it to a Stata command. To run these code, we should read dataset first, but I think it doesn't be a barrier to convert it. The Stata code are listed as follows:

mata
w=st_data(.,.)
cols(w)
w=st_data(.,3..32)
cols(w)
ei=st_data(.,33)
rows(ei)
T=14
N=30
w1=J(1,1,.)
for (i=1;i<=T;i++) {
t1=(i-1)*N+1;
t2=i*N;
w2=st_data(t1::t2,3..32)*st_data(t1::t2,33);
w1=(w1\ w2);
}
J=J(421,1,1)
I=I(421,1)
w1=select(w1,J-I)
w1
end

I would feel very appriciate that anyone who can help me to fix this problem. Thx.
Best wishes!

ARMAX models: Effect Size and R-squared

$
0
0
Hi everybody and happy Xmas

Is there an easy way in Stata to get the percentage of the variance explained by an ARMAX model? (Similarly to the adjusted R-squared in multiple linear regression)

Moreover, working with unstandardized predictors, is there a way to find which one has the strogest effect on the dependent variable? (effect size)

Thanks a lot,
Andrea.

User written Command &quot;dups&quot; - doesnt alsways generate _count // _group variable

$
0
0
Hey Everyone,

I have a pooled dataset which shows how many applications one graduate gets by a company (company is interested in the graduate --> sends application = invitation)
each graduate has an unique id which can occur several times in the dataset if this graduate got more than 1 invitation. all in all I habe 30.000 observations

I want to count how many invitation every graduate got and found the user written command "dups"

Code
dups graduate_id, unique

sometimes this command generates a _count and _group variable. The _count variable is exactly what I need since I have the total number of observations within each group = per graduate. But most times I only get the output but stata 12 does not generate the new variables.

this is what I want
graduate_id _count _group
11 4 1
11 4 1
11 4 1
11 4 1
33 1 2
21 3 3
21 3 3
21 3 3
55 2 4
55 2 4


As an alternative I tried to use the command:


sort graduate_id
quietly by graduate_id: gen dup = cond(_N==1, 0 , _n)

This gives me the number of the observation within the group. But I am only interested in the overall number of duplicates within the group.
graduate_id dup
11 1
11 2
11 3
11 4
33 0
21 1
21 2
21 3
55 1
55 2


Can you help?

About time dimension of an unbalanced panel data set in xtabond2

$
0
0

Dear Statalis Members,


Is there any requirements about missing values (i.e. at least five consecutive annual observations of dependent variable should be exist without any missing values) to be satisfied while estimating a small T large N unbalanced panel data set with xtabond2 command?

Thanks,

Rümeysa Bilgin

Interactions with Plausible Values

$
0
0
Hey!
I'm using the pv prefix in order to correctly work with PIAAC data. So far all regression outputs make sense, but I don't know how to include interactions with plausible values.
For example I want to know if the returns to numeracy skills for women are different to those of men.

Here is the command without interaction term (lwage as the dependent and some independent variables as gender, age etc. are in the macro called "base"):

pv, pv(PVNUM*) jrr jk(1) weight(SPFWT0) rw(SPFWT1 - SPFWT80): regress lwage $base @pv [aw = @w]

Does anybody know how to deal with interactions with plausible values? Any idea would be highly appreciated!

Country dummies

$
0
0
Hello all,
I am struggling with a model. I am regressing log of GDP/capita and other economic variables (expenditure, investment). The analysis is on 28 EU countries with a panel data for 25 years. I have to mention that the panel is unbalanced. Because the first tests regarding stationarity - unit-root Fisher and Im-Pesaran-Shin showed that i have some indvar that are not stationary I decided to difference the model.
The Hausman test for the differences model suggested to use a fixed effect model. According to the Pasaran CD, the Wald test and Breusch-Pagan / Cook-Weisberg , the residuals are not correlated and the model is heteroskedastic.
I have two questions
1. It seems that the Woodridge test does not work. Is there another thest for autocorrelation?
xtserial D.(y x2 x3 x6 x7 x8 x9 x10 x11 x12 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23 x25 x30 x31 x32 x37 x39 x40 x41 x42 x44 x45 x47 x48 x49 x51 x52 x53 x54 x55 x56 x59 x61 x62 x63 x64 x65 x66)
factor variables and time-series operators not allowed
r(101);
2. x61 x62 x63 x64 x65 x66 are 6 dummy variables. I want to use also country variables: I have already in my excel 4 dummies. East Europe, South Europe, West Europe and North Europe. Weast Asia (Cyprus) is not used because of collinearity. Simply regression the model yields positive outcomes. But trying the differenced vars they are all omitted.

reg y x67 x68 x69 x70

Source | SS df MS Number of obs = 686
-------------+------------------------------ F( 4, 681) = 131.80
Model | 289.752692 4 72.438173 Prob > F = 0.0000
Residual | 374.289011 681 .549616757 R-squared = 0.4363
-------------+------------------------------ Adj R-squared = 0.4330
Total | 664.041703 685 .969403946 Root MSE = .74136

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x67 | -1.305004 .1603857 -8.14 0.000 -1.619913 -.9900939
x68 | -.0524679 .1575831 -0.33 0.739 -.361875 .2569391
x69 | -.2082225 .1587413 -1.31 0.190 -.5199036 .1034587
x70 | .6369881 .1601524 3.98 0.000 .3225363 .9514398
_cons | 9.673973 .1482723 65.24 0.000 9.382848 9.965099
------------------------------------------------------------------------------

I tried to regress only East Europe with log gdp/cap but it is also omitted.

reg D.y D.x67
note: _delete omitted because of collinearity

Source | SS df MS Number of obs = 658
-------------+------------------------------ F( 0, 657) = 0.00
Model | 0 0 . Prob > F = .
Residual | 7.68248688 657 .011693283 R-squared = 0.0000
-------------+------------------------------ Adj R-squared = 0.0000
Total | 7.68248688 657 .011693283 Root MSE = .10814

------------------------------------------------------------------------------
D.y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x67 |
D1. | 0 (omitted)
|
_cons | .0573657 .0042156 13.61 0.000 .0490881 .0656433
------------------------------------------------------------------------------

Panel Analysis with Index Dependent Variable

$
0
0
Hi there. I have a question that I hope will be simple for someone to answer.

I am working with panel data (2 waves only) that looks at 10 methods of how people engage in their communities (voting, community meetings, etc.). Each of these 10 methods is a binary variable that indicates whether they engaged in that way or not (0=no; 1=yes). I have created a composite index variable that summarizes the total amount of a person's engagement; for example, if the respondent selected "yes" for 3 methods of engagement, their index score is 3. The minimum index score is 0, the maximum index score is 10, and all scores are non-negative integers. I want this index score to be my dependent variable in a regression model with a handful of predictive independent variables.

What statistical method for regression should I employ for this panel data that will make sense for having a summary index dependent variable described above?

How do I debug an r(9999) error when loading a C plugin?

$
0
0
I'm following the instructions here (http://www.stata.com/plugins/ ) for creating a Stata plugin. I'm using Stata 14.1 MP on a Windows 7 64-bit machine, and Visual Studio 2012. I downloaded hello.c, stplugin.c, and stplugin.h, followed the instructions for creating a C DLL project in Visual Studio, and compiled the plugin. It compiled successfully. However, when I place "hello.dll" in my working directly and run this code in a do-file:

Code:
capture program drop myhello
program myhello, plugin using("hello.dll")
plugin call myhello
I get this error:
Code:
Could not load plugin: .\hello.dll
r(9999);
The error code -r(9999)- doesn't have a help file either. How do I begin debugging an error like this? The C plugin compiles successfully, the do-file and the resulting DLL file are placed in the same directory, and as far as I can tell, I'm following the instructions from Statacorp correctly. Where do I start? The -r(9999)- error message isn't helpful, and there aren't any debugging instructions on Statacorp's page.

Where do I start looking for the problem? Without any information besides "Could not load plugin", I don't even know how to begin debugging this.

Kleiberg-Paap rk Wald F stat in ivreg2 with aweight

$
0
0
I am currently running an IV model using ivreg2 version 4.1.09 (23Aug2015). The IV model has 3 endogenous variables, standard errors are clustered. Each observation is weighted using [aweight="Name_weight_var"] . The LM and Wald version of the Kleiberg-Paap rk statistics that I obtain are extremely low (i.e. in the order of e to the -10) . Those statistics are however inconsistent with the F-stats on the excluded instruments from the first stage regression that never go below 8 ( this is true for both F and APF in the matrix e(first)). This inconsistency between the two sets of statistics disappears when observations are not weighted. An earlier post suggests that there used to be a bug of the same type in an earlier version of ivreg2 and that was subsequently solved (http://www.stata.com/statalist/archi.../msg00058.html) . The problems seems however to persist with "aweight". Is there any way to obtain correct Kleiberg-Paap rk statistics in ivreg2 when observations are weighted, standard errors are clustered and the number of endogenous variables is greater than 1 ? Thanks in advance!

Overlaying a line with a specific slope on a scatterplot

$
0
0
Dear all

I am trying to overlay on a scatterplot a straight line that passes through the origin with a specific slope. The code that i use is the following:

twoway scatter var1 var2 || function y=20000*x , range(-0.50 0.50)

var1 has a min of -500 and a max of 500
var2 has a min of 0.50 and a max of 0.50

The problem that i have is that the line is always a 45 degrees line and the scatterplot is distorted with the points concentrated. This is because the values of the y axis are -predictably- going through the roof. The x axis that shows var2 looks fine.

I was wondering if there is a way to draw the line without the y axis taking the values of the function. Instead i would like only the slope of the line to change (beyond 45 degrees) and keep the original values of the variable (var1). This would not distort my graph.

Any ideas?

Thank you!!

Cynthia

Rename entire list of variables

$
0
0
I have a .dta file with 390 variables of widely varying names containing both letters and numbers etc and in no patterns or order. I want to change the variable names to the generic v1, v2, v3, ... in the order in which they appear in the variable list (in the dta file). How to I do this?

​Bootstrap to adjust for clustering in 16 cluster sample (with incomplete replicates) or simple multilevel model?

$
0
0
Hello,

I am using Stata 12, working with a dataset of 585 observations in 16 clusters (service locations) in 3 countries. There are only 2 types of service location; country and type of service location are in the model and are obviously correlated with cluster, because there are 16 clusters split up among 6 unique combinations of country/service location. (ie one country/service location combination is represented by only 1 cluster; other country/service location combinations are made up of 2 or 3 clusters)

I'd like to obtain correct 95% CI for model estimates and am attempting to use bootstrap in this situation with so few clusters as follows:

bootstrap, reps(300): logistic depvar covariates, vce(clusted cluster_var)

When I run this code, I'm seeing a number of red "X" marks indicating that in those bootstrap replications, "collinearity in replicate sample is not the same as the full sample, posting missing values" and these replications are not used in the S-E estimates, with the following error message: (i.e, if there were 106 red X marks): "Note: one or more parameters could not be estimated in 106 bootstrap replicates; standard-error estimates include only complete replications." Assume this is because of high correlation between covariates and clusters (service locations).

Have tried to do 1,000 reps, but runs will not complete. I have been able to get several runs with 300 reps (of which about one third to one half are incomplete replicates). From the several runs with 300 attempted replications that I have done, 95% CI are similar (and obviously beta estimates are the same). Would it be reasonable to use these estimates?

Have also thought of using a multilevel logit with the 16 unique service locations forming 16 groups for level 2, and include country and type of service location in the model as covariates. (keeping structure simple with random intercept only, for unique service location at level 2, but no random slopes, cross level interactions, etc., in the model).

Any advice would be greatly appreciated.

Many thanks!

Computing area of envelopes around CDFs

$
0
0
Dear Statausers,

I am using Stata v.12. I have plotted cumulative distribution functions of childrens' math test scores across four sub-groups. The graph is attached for your reference Array . As the cdfs cross each other, I want to be able to compute the area above the left envelope of the cdfs to establish stochastic dominance. Would appreciate inputs on how I may be able to do this.

Namrata

Sign test

Viewing all 73049 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>