Margins meologit: After 12 hours I have no results!

August 13, 2019, 5:39 am

≫ Next: Forecast of Markov Switching model

I want to ask you the following question about Stata 15:

- The data matrix I manage has almost 200,000 observations

- I estimate a multilevel ordinal logit, with 8 integration points. The estimate takes me 2 hours on my computer.

- I save the estimate and ask for the average marginal effect of all the exogenous variables through the following instruction:

margins, dydx (c_sexo_r c_edad c_casado_pareja c_educacion_primaria c_educacion_secundaria c_educacion_superior c_desempleado c_salud_7 c_religion c_indigena c_junta_vecinos c_numper c_p_1_10_lrenta__f_e_d_f_ec

after more than 12 hours I don't get results, when the estimate converges around 2 hours

1.- What can it be due to?

2.- Give me some clue to get the result.

Thank you

↧

Forecast of Markov Switching model

August 13, 2019, 5:47 am

≫ Next: R2 -xtreg, fe- vs R2 -areg, absorb(id)

≪ Previous: Margins meologit: After 12 hours I have no results!

Hello everyone,

I need help to check out of sample, a markov switch model that I estimated.

Markov-switching dynamic regression

Sample: 1 - 3000 No. of obs = 3,000
Number of states = 2 AIC = -3.5325
Unconditional probabilities: transition HQIC = -3.5282
SBIC = -3.5205
Log likelihood = 5304.716

------------------------------------------------------------------------------
B | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
State1 |
_cons | .0536254 .0005334 100.54 0.000 .05258 .0546707
-------------+----------------------------------------------------------------
State2 |
_cons | .1901039 .003082 61.68 0.000 .1840632 .1961446
-------------+----------------------------------------------------------------
sigma1 | .0208053 .0003912 .0200525 .0215863
-------------+----------------------------------------------------------------
sigma2 | .100986 .002102 .0969491 .105191
-------------+----------------------------------------------------------------
p11 | .9836962 .0029618 .9767509 .9885909
-------------+----------------------------------------------------------------
p21 | .0266072 .0047537 .0187185 .0376927

I need to estimate the smoothed probabilities of this model and not the fitted series of this model.

↧

R2 -xtreg, fe- vs R2 -areg, absorb(id)

August 13, 2019, 5:52 am

≫ Next: SURE model ,seemingly unrelated regression

≪ Previous: Forecast of Markov Switching model

Dear all,

I am running a panel regression with individual fixed effects. When using the -xtreg, fe- command, I get very low R2 (both for within, between and overall, all ranging from 0.000 to 0.10)

However, on the much quoted page https://www.princeton.edu/~otorres/Panel101.pdf that when reporting the R2 for fixed effects models, the R2 from - areg, absorb(id) - is preferred to the R2 obtained from -xtreg, fe-. Is this indeed the case, and if so, why? Indeed, with -areg, absorb(id)- I get an adjusted R2 that is much more sensible (0.63). However, since my standard errors have change between the two specification, I am a bit confused as to whether it is appropriate to report the R2 from this regression when using the standard errors from the xtreg,fe specification.

Many thanks in advance.

Edit: I should have mentioned that I am clustering standard errors that the geographical level, hence why I think the standard errors from xtreg,fe are more appropriate as explained in this post https://www.stata.com/statalist/arch.../msg00596.html

↧

SURE model ,seemingly unrelated regression

August 13, 2019, 6:19 am

≫ Next: Expand the observations based on a value

≪ Previous: R2 -xtreg, fe- vs R2 -areg, absorb(id)

Estimation via SURE model(Seemingly Unrelated Equation Models)

Today, 00:43

		dependent variable
	delta_cash	short term debt	long term debt
intercept
working capital
R&D
Net income

My questions are based on the above table. Suppose, I have equations like
delta_cash=intercept+working capital+R&D+net income+error, so on and so forth. Thus I have 3 equations.
I need to estimate this by SURE(seemingly unrelated regression equations model). I would like to impose certain restrictions like
-α_1+α_2 +α_3==0(That is -ve of the intercept in first equation+ intercept in the second equation+intercept in the third equation should be equal to zero. Similarly -β_11+β_21-β_31=1

The first subscript of the beta coefficients indicates the column (i.e., the dependent variable) while the second subscript denotes the row (independent variable).
I am not much experienced with Stata hence I used the menu way (Statistics<linearmodels......) for SURE but I am unable to put my constraints. Kindly help me here. I couldnt solve this myself for past 1 week

↧

Expand the observations based on a value

August 13, 2019, 6:32 am

≫ Next: Need help with unequal datasets combining.

≪ Previous: SURE model ,seemingly unrelated regression

Hello fellow Stata users,

I have a very particular question that I have been trying to resolve last couple of days, using FAQ and search, but eventually I need some guidance.

So I have a dataset of coded sentences that looks like the following (I have >2000 observations). DOC_id represents the newspaper ad, CS_id represents the sentence within the ad, so there could be several CS_id per one value of DOC_id. Name is the code for the candidate.

Code:

case_id   DOC_id  CS_id  Name
 1        120700 200831 4350
 2        120701 200833 4350
 3        120703 200275 4350
 4        120704 200276 4350
 5        120705 200277 4350
 6        120727 200882 4233
 7        120728 200889 4233
 8        120738 200980 5034
 9        120738 200979 5034
10        120739 200981 5034
11        120739 200982 5034
12        120740 200984 4210
13        120740 200983 4210
14        120741 200985 4210
15        120741 200986 4210

To replicate:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float case_id double(DOC_id CS_id SACT_name)
 1 120700 200831 4350
 2 120701 200833 4350
 3 120703 200275 4350
 4 120704 200276 4350
 5 120705 200277 4350
 6 120727 200882 4233
 7 120728 200889 4233
 8 120738 200980 5034
 9 120738 200979 5034
10 120739 200981 5034
11 120739 200982 5034
12 120740 200984 4210
13 120740 200983 4210
14 120741 200985 4210
15 120741 200986 4210
end

Ideally, I need to expand the observations based on how much DOC_id repeats, e.g. 120700 would repeat twice, 120701 would not repeat, 120738 that already repeats three times would have another 5 repetitions per cluster, so 3x5 = 15. This information is stored in a separate dataset. I know that a simple expandcl can help me, but I face the following problems:

1) As I said, this information is external, so for now I have to specify the code for each DOC_id manually, and I would like to avoid doing that for this number of observations...
2) Second, I need to assign new DOC_id for each new (set of) observations. The newvar created by expandcl creates the numeration for clusters and does not contain information about old and new observations. My aim is to see how many ads a candidate has published, including repeating ones, but having the same DOC_id for duplicate ads is not informative.

I know that there is a possible easy solution to this problem, but at this moment searching for the answer did not help me much. I would be grateful for your suggestions.

↧

Need help with unequal datasets combining.

August 13, 2019, 8:03 am

≫ Next: Metan - invalid label specifier for xlabel()

≪ Previous: Expand the observations based on a value

Hello,

So let me first describe what I am working with and the steps I took.

One dataset has for every company every year stated like, 1992, 1993, 1994, 1995.. and so on. with financial information about companies.

the next dataset is about CEO financials, and there it's 1992, 1992, 1992, 1993, 1993, 1993, 1993.. and so on. Because some companies have more or less CEO's per year.

I have used 1:m merge, and it looks like its merged. I then looked at the CEO GENDER, which is main part of my study, and deleted the rows which did not match.

I checked whether it removed rows of information, but it kept the information of Dataset 1 and moved/multiplied the information from the 'company financials years' to CEO financials. Like this: [merged]

Year....CEO FIN....Company FIN

X1......X1-Male............X1

X1......X1-Male............X1

X1......X1-Female............X1

Now it's of course counting all the years multiple times and therefore the sample size looks deceitful, it's giving something like 293,123 as sample size only because the years are counted more than once.

Is there a way to shorten the rows when using it in calculations? And secondly, I thought to only keep 1 row per year of the information when there is a Female CEO that year (if more, I summarize) I'll use only X1-Female, and otherwise I'll use Male (X1-Male). So that per company year there will be one row. But am kinda clueless how to perform such task.
Does anyone know how to do this, or even if this is possible. Thanks.

↧

Metan - invalid label specifier for xlabel()

August 13, 2019, 8:37 am

≫ Next: Insufficient observations for imputation

≪ Previous: Need help with unequal datasets combining.

Hi All,

I am new to this forum and looking for some advice! I am conducting a systematic review meta-analysis using metan command, and for some reason when I try to add xlabels to my command I constantly get an invalid label specifier error message which looks like this:

My code:
metan a b c d if risk_factor=="Sex", or random by (risk_factor) sgweight nowarning lcols(study year country agegroup agerange ) texts(230) xlabel(0.0,0.5,1.0,2.0,3.0,4.0) force

Error message:
invalid label specifier, : 0 "1" . "0" -.6931471805599453 ".5" 0 "1" .6931471805599453 "2" 1.09861228866811 "3" 1.386294361119891 "4":

I've used metan many times before and never had this error message, so I have no idea how to make it stop!

Does anyone have any advice?

Thank you!

↧

Insufficient observations for imputation

August 13, 2019, 8:49 am

≫ Next: MLOGIT reporting and interpreting interactions between binary and continuous variables

≪ Previous: Metan - invalid label specifier for xlabel()

Hi,
I am trying to impute values for a single variable in a 3year panel, though the variable I am trying to impute is time-invariant. I am thus trying to impute values at the group level, where the group is the id units of the panel (media_outlet). After running the following code:

Code:

mi set wide
mi register imputed subscribers
mi impute regress subscribers, by(media_outlet) add(5) rseed(12345)

I receive the error code insufficient observations for numerous outlets and the imputation does not take place. Is there a way to simply pass through these observations s/t values are still imputed for those observations that do have sufficient observations?

Thank you.

↧

MLOGIT reporting and interpreting interactions between binary and continuous variables

August 13, 2019, 9:12 am

≫ Next: coefplot: option cismooth causes error (invalid '14285714' )

≪ Previous: Insufficient observations for imputation

Dear all,

I am running a panel data multinomial logit model, where I regress a categorical variable on households’ migration decision (0=no migration, 1=rural migration, 2=urban migration, 3=international migration) on interactions of temperature and precipitation continuous variables with a binary variable agriculture (1=agricultural dependence). Additionally, I control for some other household-specific characteristics (educ, income, assets …):
mlogit migr c.temp##i.agriculture c.precip##i.agriculture c.educ c.income i.assets, cluster(State)

I would like to report the marginal effects for each outcome category. Now, I know that it gets tricky with the interaction terms. What I am doing right now is that for instance for the outcome category 1, I estimate the marginal effects in the following manner:

margins, dydx(temp precip ) at(Agriculture=(0 1)) predict(outcome(1)) post

If I understand this correctly, the coefficients give me marginal probability that a household migrates to another rural area if temperature and precipitation change by one unit separately for agricultural and non-agricultural households. Is this a correct interpretation?

What I would like to show, however, is whether the marginal probability of migrating to lets say a rural area is significantly different for agricultural and non-agricultural households, in other words whether the two marginal probabilities are significantly different. Is this also a valid approach to report the results ? Or is the previous approach a better one? Additionally, I would like the outcome table also to include the marginal effects of the additional controls. How do Isay this in the command?

I have tried to go through a lot of posts and literature (also here in Statalist) but still I am not really sure.
Could you please help me?
I would be super grateful.

Thank you!

Best,

Barbora

↧

coefplot: option cismooth causes error (invalid '14285714' )

August 13, 2019, 9:21 am

≫ Next: Creating statistical output

≪ Previous: MLOGIT reporting and interpreting interactions between binary and continuous variables

Hi all,
Until today, I have had no problem using the cismooth option with coefplot. At the moment, when I want to use this option, there is an error:

Code:

sysuse auto.dta, clear
logit foreign price i.rep78
margins rep78, post
coefplot
coefplot, cismooth

Do you have an idea?
Thank you.

↧

Creating statistical output

August 13, 2019, 9:51 am

≫ Next: Logistic regression eststo command help

≪ Previous: coefplot: option cismooth causes error (invalid '14285714' )

Dear community,
How to create this kind of output with several models?
Array

↧

Logistic regression eststo command help

August 13, 2019, 10:07 am

≫ Next: How do I estimate a trend using this data?

≪ Previous: Creating statistical output

Thanks

↧

How do I estimate a trend using this data?

August 13, 2019, 10:39 am

≫ Next: Robust F-stat simple linear regression

≪ Previous: Logistic regression eststo command help

Hi,

I'm kinda a beginner to Stata (and statistical analysis) and it's really stressing me out.

I have time series data which has: the year (1992 - 2000) and age specific suicide rates per 100,000 of the population from age 0 to >85 at 5 year intervals,

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(Year Reg) str1(Group Sex) byte CCl int Cause byte(Drac0 Drac1 Drac5 Drac10) int(Drac15 Drac20 Drac25 Drac30 Drac35 Drac40 Drac45 Drac50 Drac55 Drac60 Drac65 Drac70 Drac75 Drac80 Drac85)
1992 1100 "T" "M"  9 173 0 0 4 53 254 429 600  742  775  794  839  982  850  800  671  699 870 1119 1280
1993 1100 "T" "M"  9 173 0 0 3 56 323 534 747  901  982 1037 1073 1296 1097  928  831  786 979 1238 1353
1994 1100 "T" "M"  9 173 0 0 2 60 354 649 863 1022 1089 1164 1190 1405 1170 1026  974  827 994 1016 1229
1995 1100 "T" "M"  9 173 0 0 2 57 366 725 847  989 1046 1130 1164 1327 1116  957  959  781 856 1087 1248
1996 1100 "T" "M"  9 173 0 0 2 50 351 734 828  949  985 1070 1095 1183 1022  906  978  752 856 1173 1157
1997 1100 "T" "M"  9 173 0 0 2 54 347 724 767  855  905  962  994 1030 1008  825 1046  855 880  976 1296
1998 1100 "T" "M"  9 173 2 0 2 52 335 709 722  808  839  898  934  894  973  758  917  837 782    .    .
1999 1100 "T" "M" 10 249 0 0 4 62 339 757 800  837  906 1004 1052 1040 1091  859 1063 1000 791  962 1236
2000 1100 "T" "M" 10 249 0 0 5 64 363 796 867  877  886  986 1059 1072 1017  832  956 1001 760  975 1289
end
format %ty Year

Now what I'm trying to do is estimate a trend for suicides from the year 1992 to 1997 ( I also want to include an offset term “log(population per year and age group)” to adjust for annual changes in population figures and age structure)

I then want to check to see whether suicides in 1998 were above the number of suicides predicted by that trend (there was an economic crisis in 1998 so my hypothesis is that suicides would go up)

How would I go about this? (I'm so sorry if this is a very stupid question)

↧

Robust F-stat simple linear regression

August 13, 2019, 11:19 am

≫ Next: Cluster analysis (kmeans) error message "insufficient observations"

≪ Previous: How do I estimate a trend using this data?

Hi all,

does anyone know how stata calculates the F-stat in a simple linear regression of the type

Code:

reg y x, robust

?

I need to code it in another software, so exact formulas used for the variance estimators would be very much appreciated

I am aware of the content of this doc https://www.stata.com/manuals/rregress.pdf

Thanks!!
s

↧

Cluster analysis (kmeans) error message "insufficient observations"

August 13, 2019, 11:54 am

≫ Next: Reshaping a tabbed data set

≪ Previous: Robust F-stat simple linear regression

Hi, I am trying to perform a cluster analysis and the stata command I am trying is as follows:

cluster kmeans v1-v2870, k(2)

Variables v1 to v2870 are 2870 different variables. I get the error message "insufficient observations r(2001);" When I reduce the number of variables to v1 to v1150 the cluster analysis works. Do I try to use too many variables? I.e. is there a restriction on the number of observations that stata handles?

My data set consists of 2870 rows (each row is a region identified by "region_id"). The variables v1-v2870 each contains a distance measure between the region of the row (region_id) and each of the other regions (hence the 2870 variables).

↧

Reshaping a tabbed data set

August 13, 2019, 1:01 pm

≫ Next: Problems with mulitlevel multinomial logistic regression syntax

≪ Previous: Cluster analysis (kmeans) error message "insufficient observations"

I am requesting help coding up a clean file that will reshape a tabbed data set.
I'm running Stata 15.

I have a data set (.xlsx) that describes conference sessions people attended. The data were auto-collected via electronic beacon & name badge. Tab title = person's name (e.g. "adam jones", "barbara spencer"). <person> n = 40. Each tab contains rows (only one column) logging data on three variables:
1. Session title (e.g. "Sunday AI session") = <title>
2. Number of minutes a person was in a session (e.g. 24.2) = <duration>
3. Date and time the person entered the room (e..g 6/23/2019 3:59 PM EDT) = <date>

Each tab contains between 3 and 45 rows of data, representing 1 to 15 sessions that person attended.

Goal = I want to create a data set with persons listed as rows (observations), thus turning the tabs into rows, and I want Stata to reshape the data presented as rows (<title>, <duration>, and <date>) into variables. There are hundreds of sessions. I presume I'd leave the data long, per my sample below, and zero (0) minutes <duration> and no <date> would indicate if the person never attended hte session.

I believe the order of commands would be something as follows:
1. Turn the tabbed data set (.xlsx) into a long-format data set (.dta), with <person> attached to each corresponding row as appropriate.
2. My long data set would include <person> and <session>. I would then reshape <duration> and <date> into wide format. So my final data set would look like:

person session duration date
adam jones session 1 20.3 6/23/2019 3:59 PM EDT
adam jones session 2 14.3 6/24/2019 3:01 PM EDT
adam jones session 3 0
barbara spencer session 1 0
barbara spencer session 2 0
barbara spencer session 3 17.2 6/24/2019 11:01 AM EDT

Can anyone help me with the code I need to automate this?

↧

Problems with mulitlevel multinomial logistic regression syntax

August 13, 2019, 1:09 pm

≫ Next: putdocx*textblock doesn't work

≪ Previous: Reshaping a tabbed data set

Hello Everyone,
I am working on the determinants of employment status choice. I tried using the multi-level multinomial logistic regression. I am not fully aware if I am doing it correctly. First, I tried the simplified version of non-multi level command :
_{gsem ( 2.employment_status <- i.dist i.sex schooling Total_hrs occupation industry), mlogit nocapslatent}
The stata converged the result. Then I used multi level version.
_{gsem ( 2.employment_status <- i.dist i.sex schooling Total_hrs occupation industry M1[hhid]@1), mlogit nocapslatent latent(M1)}
The stata also converged this result.
I used number 2 infront of employment_status since the stata did not converge when I type i.employment_status. Hence, I used the given categorical number for depvar to calculate. Therefore, I am not sure if above given stata command makes any difference.
Second, I used hhid (household number) as individual are naturally nested within household.
I think household is also nested within districts. Here, I am stuck. This looks like three level model. If so, how do I calculate them? This is my first time working with this model and I am not sure if I am following the path.

↧

putdocx*textblock doesn't work

August 13, 2019, 1:52 pm

≫ Next: how to explain a non-significant result in fixed effect model?

≪ Previous: Problems with mulitlevel multinomial logistic regression syntax

Hi there,

I'm using STATA 15 on a Mac. and trying to work with commands below.
However, the putdocx textblock begin command does not work.
The error code is r (198) and STATA says unknown subcommand textblock.

I can create and save tables, graphs and etc. Only textblock command is not working.
Can you please help.
Many thanks!

putdocx begin
putdocx paragraph, style(Heading1)
putdocx text ("SL Entrepreneurship Survey")

putdocxtextblock begin
We use data from the Survey conducted in Sierra Leone (SL) on 07/08/19 to study the
Perceptions of Returns to Entrepreneurship in Sierra Leone. 661 observation
reported without duplications
putdocxtextblock end

putdocx save SLReport, replace

↧

how to explain a non-significant result in fixed effect model?

August 13, 2019, 2:33 pm

≫ Next: J(*,*,*) function

≪ Previous: putdocx*textblock doesn't work

Hi everyone,
My reserach is effect of Mircofinance in SMEs, My data including 100 smes and 8 years, and i have one dependent variable is profit, 4 independent variables including main variable(amount of mircofinance). I used xtreg y x i.year, fe vce(cluster id) to analysis my panel data. However, the p-value of all variables are bigger than 0.05. i dont know how to explain the result. could anyone give me some advices?

↧

J(,,*) function

August 13, 2019, 3:03 pm

≫ Next: Factor analysis using custom correlation matrix

≪ Previous: how to explain a non-significant result in fixed effect model?

Hi all,
Does anyone know why using the J(*,*,*) function results in a matrix that looks as if it is a triangular matrix? For example, typing

Code:

. matrix D = J(5,5,10)
. matlist D

gives this:
| c1 c2 c3 c4 c5
-------------+-------------------------------------------------------
r1 | 10
r2 | 10 10
r3 | 10 10 10
r4 | 10 10 10 10
r5 | 10 10 10 10 10

Thank you,
Stan

↧