Quantcast
Channel: Statalist
Viewing all 73258 articles
Browse latest View live

Non-linear (panel data) models (with constraints)

$
0
0
Dear all,

I try to estimate these two equations (extracted from the paper of Arvin et Baum 1997, Tied and untied foreign aid: a theoretical and empirical analysis, Keio Economic Studies 34(2)):


Array

I have data on A1 and A2, I can therefore have a discrete approximation of A1dot/A1 and A2dot/A2. Have to be estimated the six remaining parameters. My data set is a panel (29 individuals over 38 years). My goal, first use pooled data to estimate the 6 parameters (I may have to put constraints on some of them) and then use current non-linear panel techniques to estimate the same 6 parameters (I still may have to put constraints on some of them). My question, is someone aware of an in-built command in Stata that does solve such non-linear panel data model (ideally with the option of imposing some constraints on my parameters)? Is by any case, the new command menl implemented in Stata 15 doing what I want to do?

Thanks for your help,

Luciano

Compute New Variable - Based on Responses to Other Variables

$
0
0

Hello Listservers,


i have 3 variables:

X (use of anal-condom)
Y (use of oral condom)
Z (use of vaginal condom)

Each as is coded 1=Yes, 2=No, 0,8,9=Missing.

I want to create a new variable that is set to 1 (CONDUSE = 1), if the respondent uses condom for any type of sex, 2, if not. Here is an example of the scratch data

Code:
       | ANALCOND   ORALCOND   VAGCOND   conduse |
       |-----------------------------------------|
221. |        0          0         0         . |
222. |        0          0         0         . |
223. |        0          0         0         . |
224. |        0          0         0         . |
225. |        0          0         0         . |
     |-----------------------------------------|
226. |        0          2         2         2 |
227. |        0          0         0         . |
228. |        0          1         1         1 |
229. |        0          2         1         2 |
230. |        0          0         0         . |
     |-----------------------------------------|
231. |        0          0         0         . |
232. |        0          0         0         . |
233. |        0          2         2         2 |
234. |        0          0         0         . |
235. |        0          1         1         1 |
     |-----------------------------------------|

I tried the following command:


Code:
/* Create conduse = use of condom */
gen conduse=1 if ANALCOND==1 | ORALCOND==1 | VAGCOND==1
replace conduse=2 if ANALCOND==2 | ORALCOND==2 | VAGCOND==2
fre conduse
However but I am not sure if it if counting those who are not using condoms appropriately. For example, as you can see Case # 229 uses condom both orally and vaginally, but it is still coded 2 (not using condom).

I will appreciate some help from you all.

thanks - Cy





Esttab no betas (standardized coefficients) in table displayed

$
0
0
Hey,

I use esttab to create a table of a regression analysis that includes factor variables.
In the output, I get the usual standardized beta coefficients but when I use esttab, all the cells in the table that show interactions effects created by using factor variables, only display standard errors and significance levels, but not the coefficients itself. I searched the web but have not come across this issue. Any experiences, recommendations?

Thanks a lot!

Regression code: reg pol_interest_w5 RELATIVE_SOCIALMEDIA_W3 edu female i.generation##c.RELATIVE_SOCIALMEDIA_W3
Esstab: esttab, b(3) se(3) beta ar2 not onecell nogaps nolz constant star(+ 0.10 * 0.05 ** 0.01 *** 0.001)

Table:
Array

linearity assumption of Multinomial logistic regression ?

$
0
0
hi , ststalist comunity
how i can test linearity assumption of Multinomial logistic regression ?

Differences in Differences in Stata

$
0
0
Hi,

I have data for student satisfaction pre and post reform in treatment and control group. I figured that in order to estimate the DiD coefficient I can use the diff command, with which I input
Code:
diff satisfaction, t(treated) p(post) cluster(countrycode)
This gives me a DiD coefficient of -5. However, I also plan to see whether the nature of the degree plays a part in the treatment effect. Degree is a dummy variable. How can I specify it with the "diff" command?

Finally, what is the based way to graph the data to visualise the "parallel trends" assumption?

Thanks,

Alex

Recoding string values as missing

$
0
0
Hi Everyone,
I have a question about string data. I have data that is mostly numeric (numbers circled on a likert scale survey), but in some cases responses which will be coded as missing, have been entered as string values. (e.g., when 2 responses were accidentally circled on the survey, the data are entered as "2,3" and if no response was circled "no response" was entered into the file).

I can easily deal with the "no response" entries because those are all entered identically (i have replaced them all with "."). What I'm stuck on is getting all other string responses (i.e., when 2 values were circled) to be coded as ".a". This has me stumped because they could be pretty much any combination (e.g., 2,4; 4,5; 1,5 etc.). Is there a way to make all of these ".a" that is easier than coding every possible combination?
Thanks!
Candy

Problem using rdrobust and rdplots command in stata 14.1.

$
0
0
Hi all,
I am facing a problem using rdrobust and rdplots command in stata 14.1. please help me to resolve this issue. when I type the following code

rrdplot vote margin, binselect(es) ci(95)
the following error just popped in
command rdplot is unrecognized
however, I have checked thrice that rdrobust and all other rdpackages are already installed, I have tried everything to reinstall and replace the existing packages but it keep giving me the same message i.e.


package name: st0366_1.pkg
from: http://www.stata-journal.com/software/sj17-2/

checking st0366_1 consistency and verifying not already installed...
installing into c:\ado\plus\...
file c:\ado\plus\next.trk already exists

r(602);
Pleae guide me how I can resolve this issue, as I need to run regression discontinuity design and plots on my datasets, and what could be the other possible ways
an earlier response is highly appreciated

thanks


Problem with generating class membership variable using latent profile analysis

$
0
0
Hello all:

I have a longitudinal dataset of adolescent relationships (N=683), each of which have one continuous measure of relationship length in months (MosTotalTog).

I have been successful in generating a three class solution as the best fit, using this code:
gsem (MosTotalTog <- _cons), lclass(C 3) nocapslatent nonrtolerance

I would like to generate a "predicted class membership" variable (predclass) to use in other analyses. My code is this:
predict cpost*, classposteriorpr
egen max = rowmax(cpost*)
generate predclass = 1 if cpost1==max
replace predclass = 2 if cpost2==max
replace predclass = 3 if cpost3==max

When I run a frequency distribution on this variable however, I only see two classes: 2 and 3, both of which account for the entire 683 observations. I am clearly "losing" class 1 somewhere, but I can't figure out where.

Many thanks!
Devon



Panel data + individual-irrelevant, time-variant data

$
0
0
Hi Statalist Forum,

I am looking to merge three datasets. One is panel data containing information pertaining to specific gilts over time along with some time-invariant variables such as Duration of the respective gilts e.g.
Gilt name Date Yield Duration (years)
Gilt 1 Date1 1.2 1
Gilt 1 Date2 1.3 1
Gilt 1 Date3 1.2 1
Gilt 2 Date1 2.1 2
Gilt 2 Date2 2.4 2
Gilt 2 Date3 2.3 2
Gilt 3 Date1 1.8 1.5 etc

I want to merge this with more panel data, namely, cumulative daily gilt purchase data (purchase amount) (specific to gilts) e.g.
Date of purchase Gilt Name Purchase amount
Date 1 Gilt 1 30
Date 1 Gilt 2 28
Date 1 Gilt 3 29
Date 1 Gilt 6 41
Date 2 Gilt 1 36
Date 2 Gilt 2 32
Date 2 Gilt 3 28
Date 2 Gilt 7 24

And then I would also like to merge time-variant, individual-unrelated data, namely, Overnight Indexed Swap data e.g.
OIS(1 year spot curve) OIS (3 year spot curve)
Date 1 1.2 1.6
Date 2 1.3 1.4
Date 3 1.1 1.4

From what I have read in the merge help pdf, I cannot figure out which type of merges I need to perform.

Any help would be greatly appreciated

Kind regards,

Jack

Interaction between binary and count variable

$
0
0
Hi Statalist,

I'm working on data from a randomised controlled trial and looking at heterogeneity in treatment effect. I want to see whether effect sizes fade out over time and therefore exploit variation in the timing of the follow-up interview. For this, I interact time to follow up with the treatment status (0/1 for control or intervention) to predict my main outcomes of the trial. The time to follow- up ranges from February to July and is coded by months, i.e. it takes one of the following values: 1,2,3,4,5, 6. I now wonder whether Stata allows me to specify an interaction term between a count and binary variable?
I had first used something like this, but I think it's not accurate to treat the follow-up variable as continuous?
reg ESS_pca_C3 ESS_pca_C1 Time_FU_C##i.TrialArm , vce(cluster ClusterID_AT1a)
What would you recommend?

Thank you!

Hourly date conversion

$
0
0
Hello.

Below I have a string variable imported from excel called dates, and I wish to convert this string into readable dates in Stata.

I load the variable in string format into Stata, and create date1 as an attempt to convert these dates, by doing:

gen date1 = date(dates,"YMDhms")

And get the following:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str19 dates float date1
"2012-01-01 00:00:00" 18993
"2012-01-01 01:00:00" 18993
"2012-01-01 02:00:00" 18993
"2012-01-01 03:00:00" 18993
"2012-01-01 04:00:00" 18993
"2012-01-01 05:00:00" 18993
"2012-01-01 06:00:00" 18993
"2012-01-01 07:00:00" 18993
"2012-01-01 08:00:00" 18993
"2012-01-01 09:00:00" 18993
"2012-01-01 10:00:00" 18993
"2012-01-01 11:00:00" 18993
"2012-01-01 12:00:00" 18993
"2012-01-01 13:00:00" 18993
"2012-01-01 14:00:00" 18993
"2012-01-01 15:00:00" 18993
"2012-01-01 16:00:00" 18993
"2012-01-01 17:00:00" 18993
"2012-01-01 18:00:00" 18993
"2012-01-01 19:00:00" 18993
"2012-01-01 20:00:00" 18993
"2012-01-01 21:00:00" 18993
"2012-01-01 22:00:00" 18993
"2012-01-01 23:00:00" 18993
"2012-01-02 00:00:00" 18994
"2012-01-02 01:00:00" 18994
"2012-01-02 02:00:00" 18994
"2012-01-02 03:00:00" 18994
"2012-01-02 04:00:00" 18994
"2012-01-02 05:00:00" 18994
"2012-01-02 06:00:00" 18994
"2012-01-02 07:00:00" 18994
"2012-01-02 08:00:00" 18994
"2012-01-02 09:00:00" 18994
"2012-01-02 10:00:00" 18994
"2012-01-02 11:00:00" 18994
"2012-01-02 12:00:00" 18994
"2012-01-02 13:00:00" 18994
"2012-01-02 14:00:00" 18994
"2012-01-02 15:00:00" 18994
"2012-01-02 16:00:00" 18994
"2012-01-02 17:00:00" 18994
"2012-01-02 18:00:00" 18994
"2012-01-02 19:00:00" 18994
"2012-01-02 20:00:00" 18994
"2012-01-02 21:00:00" 18994
"2012-01-02 22:00:00" 18994
"2012-01-02 23:00:00" 18994
"2012-01-03 00:00:00" 18995
"2012-01-03 01:00:00" 18995
"2012-01-03 02:00:00" 18995
"2012-01-03 03:00:00" 18995
"2012-01-03 04:00:00" 18995
"2012-01-03 05:00:00" 18995
"2012-01-03 06:00:00" 18995
"2012-01-03 07:00:00" 18995
"2012-01-03 08:00:00" 18995
"2012-01-03 09:00:00" 18995
"2012-01-03 10:00:00" 18995
"2012-01-03 11:00:00" 18995
"2012-01-03 12:00:00" 18995
"2012-01-03 13:00:00" 18995
"2012-01-03 14:00:00" 18995
"2012-01-03 15:00:00" 18995
"2012-01-03 16:00:00" 18995
"2012-01-03 17:00:00" 18995
"2012-01-03 18:00:00" 18995
"2012-01-03 19:00:00" 18995
"2012-01-03 20:00:00" 18995
"2012-01-03 21:00:00" 18995
"2012-01-03 22:00:00" 18995
"2012-01-03 23:00:00" 18995
"2012-01-04 00:00:00" 18996
"2012-01-04 01:00:00" 18996
"2012-01-04 02:00:00" 18996
"2012-01-04 03:00:00" 18996
"2012-01-04 04:00:00" 18996
"2012-01-04 05:00:00" 18996
"2012-01-04 06:00:00" 18996
"2012-01-04 07:00:00" 18996
"2012-01-04 08:00:00" 18996
"2012-01-04 09:00:00" 18996
"2012-01-04 10:00:00" 18996
"2012-01-04 11:00:00" 18996
"2012-01-04 12:00:00" 18996
"2012-01-04 13:00:00" 18996
"2012-01-04 14:00:00" 18996
"2012-01-04 15:00:00" 18996
"2012-01-04 16:00:00" 18996
"2012-01-04 17:00:00" 18996
"2012-01-04 18:00:00" 18996
"2012-01-04 19:00:00" 18996
"2012-01-04 20:00:00" 18996
"2012-01-04 21:00:00" 18996
"2012-01-04 22:00:00" 18996
"2012-01-04 23:00:00" 18996
"2012-01-05 00:00:00" 18997
"2012-01-05 01:00:00" 18997
"2012-01-05 02:00:00" 18997
"2012-01-05 03:00:00" 18997
end
However, if I go in data editor and format date1 to clock as such:

format %tcCCYY-NN-DD_hh:MM_AM date1

It shows the following date values, as if the conversion was not successful:
(Note, this is how it looks in editor/preview)

Code:
dates                          date1
2012-01-01 00:00:00    1960-01-01 12:00 AM
2012-01-01 01:00:00    1960-01-01 12:00 AM
2012-01-01 02:00:00    1960-01-01 12:00 AM
2012-01-01 03:00:00    1960-01-01 12:00 AM
2012-01-01 04:00:00    1960-01-01 12:00 AM
2012-01-01 05:00:00    1960-01-01 12:00 AM
2012-01-01 06:00:00    1960-01-01 12:00 AM
2012-01-01 07:00:00    1960-01-01 12:00 AM
2012-01-01 08:00:00    1960-01-01 12:00 AM
2012-01-01 09:00:00    1960-01-01 12:00 AM
2012-01-01 10:00:00    1960-01-01 12:00 AM
2012-01-01 11:00:00    1960-01-01 12:00 AM
2012-01-01 12:00:00    1960-01-01 12:00 AM
2012-01-01 13:00:00    1960-01-01 12:00 AM
2012-01-01 14:00:00    1960-01-01 12:00 AM
2012-01-01 15:00:00    1960-01-01 12:00 AM
2012-01-01 16:00:00    1960-01-01 12:00 AM
2012-01-01 17:00:00    1960-01-01 12:00 AM
2012-01-01 18:00:00    1960-01-01 12:00 AM
2012-01-01 19:00:00    1960-01-01 12:00 AM
2012-01-01 20:00:00    1960-01-01 12:00 AM
2012-01-01 21:00:00    1960-01-01 12:00 AM
2012-01-01 22:00:00    1960-01-01 12:00 AM
2012-01-01 23:00:00    1960-01-01 12:00 AM
2012-01-02 00:00:00    1960-01-01 12:00 AM
2012-01-02 01:00:00    1960-01-01 12:00 AM
2012-01-02 02:00:00    1960-01-01 12:00 AM
2012-01-02 03:00:00    1960-01-01 12:00 AM
2012-01-02 04:00:00    1960-01-01 12:00 AM
2012-01-02 05:00:00    1960-01-01 12:00 AM
Any help would be appreciated.




SYNTH : errorcode says missing value for all periods error, although variable is not missing for all period

$
0
0
Hey Statalist!

I am trying to construct a synthetic Great Britain to investigate impact of EU on their trade.

My code looks like this:
Code:
synth eksport GDP_percapita landarea life_expec pop_growth inflation CAB invest_GDP industry_GDP, trunit(29) trperiod(1973)
where eksport is danish for export (my dependent trade variable) and GDP pr. capita landarea life_exec pop_growth inflation CAB invest_GDP and industry_GDP is my control variables.

I have data fra 1969 to 2015 for 29 countries to construct a synthetic UK. But when i run my code i get the following error:

Code:
----------------------------------------------------------------------------------------------------------------
Synthetic Control Method for Comparative Case Studies
----------------------------------------------------------------------------------------------------------------

First Step: Data Setup
----------------------------------------------------------------------------------------------------------------
control units: for 7 of out 29 units missing obs for predictor GDP_percapita in period 1969 -ignored for averaging
control units: for 7 of out 29 units missing obs for predictor GDP_percapita in period 1970 -ignored for averaging
control units: for 7 of out 29 units missing obs for predictor GDP_percapita in period 1971 -ignored for averaging
control units: for 7 of out 29 units missing obs for predictor GDP_percapita in period 1972 -ignored for averaging
control units: for at least one unit predictor GDP_percapita is missing for ALL periods specified
I though it meant that i had a country where GDP_per capita was missing for all years, but i ran a tab if missing and countries have at most 22 missing values, but i have 49 years..

Do anyone know what it could mean and would be able to help me?

Thanks in advance
Julia

RECLINK with several variables

$
0
0
Hi all,

I am trying to match two dataset whoich do not share any common numeric variable. Company name is important in this matching.
The first dataset has ID1, YEAR, COMPANY NAME, STATE and other variables. But some years for some ID1 are missing.
The second dataset(COMPUSTAT) has ID2, YEAR(from birth to some point), COMPANY NAME, and other variables.

Last time, I just generated a new ID for each of them: bid for the first one and gid for the second one.
And then, I did this command below but I found several ridiculous mismatching such as mismatched state, year...
: reclink cmpy using compustat_claned_test2.dta, gen(recscore) idm(bid) idu(gid)

I need some help.

Sang-min

<<The first dataset>>

+---------------------------------------------------------------------------------+
| year newid cmpy statef~s prof tech profip techip |
|---------------------------------------------------------------------------------|
1. | 1982 6 3ELAELECT 36 4 2 4 2 |
2. | 1979 7 3M 6 850 0 850 0 |
3. | 1980 7 3M 6 . . 850 0 |
4. | 1981 7 3M 6 . . 850 0 |
5. | 1982 7 3M 6 850 0 850 0 |
|---------------------------------------------------------------------------------|
6. | 1983 7 3M 6 . . 732 1.5714286 |
7. | 1984 7 3M 6 . . 614 3.1428571 |
8. | 1985 7 3M 6 . . 496 4.7142857 |
9. | 1986 7 3M 6 . . 378 6.2857143 |
10. | 1987 7 3M 6 . . 260 7.8571429 |
|---------------------------------------------------------------------------------|
11. | 1988 7 3M 6 . . 142 9.4285714 |
12. | 1989 7 3M 6 24 11 24 11 |
13. | 1990 7 3M 6 . . 24 11 |
14. | 1991 7 3M 6 . . 24 11 |
15. | 1992 7 3M 6 24 11 24 11 |
|---------------------------------------------------------------------------------|
16. | 1993 7 3M 6 . . 24 11 |
17. | 1994 7 3M 6 . . 24 11 |
18. | 1995 7 3M 6 24 11 24 11 |
19. | 1996 7 3M 6 . . 24 11 |
20. | 1997 7 3M 6 . . 24 11 |
|---------------------------------------------------------------------------------|
21. | 1998 7 3M 6 24 11 24 11 |
22. | 1979 7 3M 27 4000 0 4000 0 |
23. | 1980 7 3M 27 . . 4333.3333 0 |
24. | 1981 7 3M 27 . . 4666.6667 0 |
25. | 1982 7 3M 27 5000 0 5000 0 |
|---------------------------------------------------------------------------------|
26. | 1983 7 3M 27 . . 5300 0 |
27. | 1984 7 3M 27 . . 5600 0 |
28. | 1985 7 3M 27 . . 5900 0 |
29. | 1986 7 3M 27 6200 0 6200 0 |
30. | 1987 7 3M 27 . . 6533.3333 0 |
|---------------------------------------------------------------------------------|
31. | 1988 7 3M 27 . . 6866.6667 0 |
32. | 1989 7 3M 27 7200 0 7200 0 |
33. | 1990 7 3M 27 . . 7250 666.66667 |
34. | 1991 7 3M 27 . . 7300 1333.3333 |
35. | 1992 7 3M 27 7350 2000 7350 2000 |
|---------------------------------------------------------------------------------|
36. | 1993 7 3M 27 . . 7800 2000 |
37. | 1994 7 3M 27 . . 8250 2000 |
38. | 1995 7 3M 27 8700 2000 8700 2000 |
39. | 1996 7 3M 27 . . 8700 2000 |
40. | 1997 7 3M 27 . . 8700 2000 |
|---------------------------------------------------------------------------------|
41. | 1998 7 3M 27 8700 2000 8700 2000 |
42. | 1986 7 3M 48 225 0 225 0 |
43. | 1987 7 3M 48 . . 225 0 |
44. | 1988 7 3M 48 . . 225 0 |
45. | 1989 7 3M 48 225 0 225 0 |
|---------------------------------------------------------------------------------|
46. | 1990 7 3M 48 . . 353 68 |
47. | 1991 7 3M 48 . . 481 136 |
48. | 1992 7 3M 48 609 204 609 204 |
49. | 1993 7 3M 48 . . 609 204 |
50. | 1994 7 3M 48 . . 609 204 |
+------------------------------------

<<The second dataset>>

gvkey year cmpy at state ..................
1000 1961 AEPLASTIKPAK . CA
1000 1962 AEPLASTIKPAK . CA
1000 1963 AEPLASTIKPAK . CA
1000 1964 AEPLASTIKPAK 1.416 CA
1000 1965 AEPLASTIKPAK 2.31 CA
1000 1966 AEPLASTIKPAK 2.43 CA
1000 1967 AEPLASTIKPAK 2.456 CA
1000 1968 AEPLASTIKPAK 5.922 CA
1000 1969 AEPLASTIKPAK 28.712 CA
1000 1970 AEPLASTIKPAK 33.45 CA
1000 1971 AEPLASTIKPAK 29.33 CA
1000 1972 AEPLASTIKPAK 19.907 CA
1000 1973 AEPLASTIKPAK 21.771 CA
1000 1974 AEPLASTIKPAK 25.638 CA
1000 1975 AEPLASTIKPAK 23.905 CA
1000 1976 AEPLASTIKPAK 38.586 CA
1000 1977 AEPLASTIKPAK 44.025 CA
1001 1978 AMFOODSERV . OK
1001 1979 AMFOODSERV . OK
1001 1980 AMFOODSERV . OK

interpreting &quot;regress&quot; output

$
0
0
Dear all,
I'm using stata 14.2
I ran the following pooled ols regressions:
1) regress growth_t_t1 L.lntot_revenue_def L.lnLP_def_1, vce(robust)
where
growth_t_t1=lntot_revenue_def-L.lntot_revenue_def
and
2) regress lntot_revenue_def L.lntot_revenue_def L.lnLP_def_1, vce(robust)

Estimated coefficients and their significance, as well as std.erros and observations,
are identical among the two regressions with the exception for the coefficients of L.lntot_revenue_def

OUTPUT REGRESSION 1:
Linear regression Number of obs = 684,323
F(2, 684320) = 1965.00
Prob > F = 0.0000
R-squared = 0.0136
Root MSE = .55838
| Robust
growth_t_t1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------+----------------------------------------------------------------
lntot_revenue_def L1. | -.0337006 .0006104 -55.21 0.000 -.034897 -.0325043
lnLP_def_1 L1.| -.0111504 .0011786 -9.46 0.000 -.0134604 -.0088403
_cons| .2581096 .005334 48.39 0.000 .247655 .2685642

OUTPUT REGRESSION 2:
Linear regression Number of obs = 684,323
F(2, 684320) > 99999.00
Prob > F = 0.0000
R-squared = 0.9017
Root MSE = .55838
| Robust
lntot_revenue_def | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------+----------------------------------------------------------------
lntot_revenue_def L1. | .9662994 .0006104 1583.09 0.000 .965103 .9674957
lnLP_def_1 L1. | -.0111504 .0011786 -9.46 0.000 -.0134604 -.0088403
_cons | .2581096 .005334 48.39 0.000 .247655 .2685642


However if I run the same regressions without L.lntot_revenue_def among independet variables I obtain different results from the two regressions.

3) regress growth_t_t1 L.lnLP_def_1, vce(robust)
4) regress lntot_revenue_def L.lnLP_def_1, vce(robust)

OUTPUT REGRESSION 3
Linear regression Number of obs = 684,323
F(1, 684321) = 1328.18
Prob > F = 0.0000
R-squared = 0.0054
Root MSE = .56069
| Robust
growth_t_t1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnLP_def_1 L1. | -.0413937 .0011358 -36.44 0.000 -.0436198 -.0391675
_cons | .1277148 .0044164 28.92 0.000 .1190589 .1363707

OUTPUT REGRESSION 4:
Linear regression Number of obs = 686,252
F(1, 686250) > 99999.00
Prob > F = 0.0000
R-squared = 0.2304
Root MSE = 1.5695
| Robust
lntot_reve~f | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnLP_def_1 L1. | .8574252 .0023444 365.74 0.000 .8528303 .86202
_cons | 3.985948 .00832 479.08 0.000 3.969641 4.002255
------------------------------------------------------------------------------

I would like to know why the results are "equal" when I control for L.lntot_revenue_def .
Thank you very much in advance
Chiara

Plotting predicted probabilities and 95CIs following discrete time proportional hazards modelling

$
0
0
Dear all

I am looking for some advice about the best way to graphically plot a non-linear continuous predictor in a discrete time proportional hazards model. I have found Stephen Jenkins' pages on DTPH very useful, but have got to a position where I require some guidance from the community.

I am modelling the time to an outcome on the age scale (years) and wish to see if two variables (moving house in a given year, binary) and total distance moved (continuous variable) predict the outcome, having controlled for a set of covariates.

I find that a model with a cubic term for moves gives best fit to the data as follows:

Code:
cloglog outcome discrete_age i.moved_house c.distance c.distance#c.distance c.distance#c.distance#c.distance, eform
I assume I have set up the model correctly an specified the cubic term properly.

I have tried to use the margins command to estimate the marginal hazard but understand that this is not advisable in a proportional hazards scenario since the underlying hazards are not known. If I do so, however, I run the following

Code:
margins, at (distance=(min(range)max))
marginsplot, noci
which produces the figure here.

Array

While this gives me some visual clue as to the relationship, I am concerned it is wrong to do this, but do not know what to do instead. When 95%CIs are added to the plot, I am sure these are wrong, as I get the following:

Array

I presume they should get broader over greater distances because a histogram shows the vast majority of participants have very small cumulative distances, with very few over 2000km. The dataset size is 1.4m participants

I would be grateful for any advice on a correct way to visually plot this non-linear relationship between distance and outcome, with correct 95% CIs, following discrete time proportional hazards modelling.

Thanks

James

Reshape Wide to Long With Multiple Suffixes (year not being at the end)

$
0
0
Dear All,

I'm trying to reshape my employment data from a wide to a long format. I generally have the problem however that the variables are in a wide format both in terms of year, as well as in terms of job number. What I mean by this is that people might have held multiple jobs in the same year and each one is found in a new variables. The variable name structure therefore looks as follows: JobTenure_1987_03, where the first part is the variable, the second part (1987) the year, and the third part (here: 03) the number of the job held this year about which there is information (i.e. the variable provides information about tenure at the third job that individual has held in 1987. I am now looking get the data into a long format and wondering how to do so. I am not sure whether I should basically reshape the data twice (and have both variables for the number of the job entry in a specific year, as well as a variable for the year) or just using the year as a long format variable. Does anybody have any thoughts on the differences this would make in my analysis and what my stata syntax should look like to make this process efficient? I'm asking about the latter part as I have several dozen variables with the structure described above and I'm wondering whether there is a smart way to reshape this.

Thanks for the help,
J

Visualizing two regression in one graph

$
0
0
Dear Statalist respected users,

I ran two regression models and I want to visualize the impact of 2 different interactions (one from each model) on an outcome variable, which is the same on the two models.
For example,
the first model: y = B0 + B1X1 + B2X2 + B3X2*X3 + e

The second model: y= B0 + S1X1 + S2X2 + S3X2*X3 + e

I want to visualize the impact of B3 and S3 on y in one graph.

Thanks a lot in advance.

Kind regards,
Mohammed Kasbar

Code for reporting summary statistics on data that is used for regression.

$
0
0
Hello, everyone,

Is there a direct code that is used to report summary statistics on data that is actually used for regression?

"summarize" in Stata only report the original data.

Due to log forms, the original data and the data that is used for regression are different. Or in research papers, we should just report the original data without considering the log variables?


Thanks.





Calculating the number of groups corresponding to a given observation

$
0
0
Hi everyone,

The data I have below is the number of groups made in each grade pertaining to some particular activity.

From this, I want to ascertain the number of mixed-sex groups, that is, for a particular grade - how many girls and boys made groups together?

I want to generate a new variable having the number of mixed-sex groups for each observation of grade.

How can I do that?

Code:
grade     girls_group_1    boys_group_1          girls_group_2   boys_group_2              
  8             5                       0                        5                       0
  9             0                       6                        1                       5                                         
 10            5                       1                        3                       5
Thanks for helping!

interpretation of insignificant results

$
0
0
Hi,

I have run a regression with test scores on parental income and have run two separate regressions based on type of school they go to (pay fees or not).

The coefficient on parental income has no become insignificant. How do you go about interpreting this? Can you say something about Collinearity, that because I have split them into fee/non fee schools, the parental income effect will be less apparent comparing all parents that pay fees?

Thanks
Viewing all 73258 articles
Browse latest View live