Values of Coefficient and Power of Test in Data Editor

August 22, 2019, 9:30 pm

≫ Next: How to convert -mixed- to -gsem-

≪ Previous: Creating portfolios with double sorting and calculating mean returns of the portfolios.(please help)

Hello,

I am using the following command "bysort var: reg y x". In doing so, I get many results.

For each of the set of regression results, I want to have the values of both "Coef. of x" and "P>|t|" pasted in the data editor corresponding to each set of var. After the analysis my data editor should look like the following (an example)

var	y	x	Coef.	P>\|t\|
1	0.8	2005	2.03	0.000
1	0.9	2006	2.03	0.000
1	1.5	2007	2.03	0.000
1	0.78	2008	2.03	0.000
2	5.23	2011	-0.73	0.98
2	6.56	2012	-0.73	0.98
2	4.33	2013	-0.73	0.98

Is there any way I go about this in STATA?

Many thanks,
Sagnik

↧

How to convert -mixed- to -gsem-

August 22, 2019, 10:06 pm

≫ Next: nscale: new package for rescaling variables to run from 0 to 1 now available on SSC

≪ Previous: Values of Coefficient and Power of Test in Data Editor

Does anyone know how to convert a 3-level model with random-effects and random slopes from by -mixed- to by -gsem-

For example, the model by -mixed- is as below:

Code:

 mixed y x1 x2 x3 x4 c.x5#c.x4 || level2id: || level1id: x1 x2 x4, cov(un) reml

Could we test the same model with -gsem- ?

↧

nscale: new package for rescaling variables to run from 0 to 1 now available on SSC

August 22, 2019, 11:02 pm

≫ Next: Displaying contents of a constraint: negative values shows up as "0-,12345" etc.

≪ Previous: How to convert -mixed- to -gsem-

Thanks to Kit Baum, a new package called nscale is now available on SSC archive.

Code:

. ssc install nscale

to install the package.

If you are interested in rescaling variables to lie between 0 and 1, nscale alone can do almost anything you want--reverse coding, renaming variables, recoding some values to missing (e.g. exclude 99 as DK in survey questions).

Code:

newvar=var-min(var)/max(var)-min(var)

is the essential formula for this command.

cf. N, the initial letter of nscale, is an abbreviation for normalization, which commonly refers to rescaling variables this way (at least in political psychology).

↧

Displaying contents of a constraint: negative values shows up as "0-,12345" etc.

August 23, 2019, 1:25 am

≫ Next: Use e(sample) on imputed dataset

≪ Previous: nscale: new package for rescaling variables to run from 0 to 1 now available on SSC

Dear all,

I have been a long-time reader of Statalist and usually find what I need in the excellent trove of Q&A. But now, I ran into a problem where I found no answer – so this is my first post.

Here is the problem: I (quietly) run a constrained regression, looped over a few 100 sub-samples of a large data-set. In some cases, I need to display the contents of the constraints (they feed into a csv-type output in my log-file). So I define a local to hold them, but the display of negative values has proved tricky. This happens irrespective of whether I use decimal points or decimal commas. Here is a much abridged version of what I do:

Code:

. set dp period, perm    
(set dp preference recorded)
. capture drop gender
.         gen gender = 0
. constraint define 1 gender = -.12345
. constraint get 1
. local cgender = r(contents)
.         display in smcl `cgender'
0-.12345
. set dp comma, perm     
(set dp preference recorded)
.         display in smcl `cgender'
0-,12345

As you see, the minus and the leading zero are “mixed up” and show as “0-.” or “0-,”. I would need the display in a “normal” format. Any of the following would work: -0,12345 | -0.12345 | -,12345 | -.12345 . But how do I achieve that?

Thanks a lot to anybody who can help!

Malte

↧

Use e(sample) on imputed dataset

August 23, 2019, 2:23 am

≫ Next: Hausman test issues in panel data

≪ Previous: Displaying contents of a constraint: negative values shows up as "0-,12345" etc.

Dear Statalist,

I'm doing Cox regression with a categorical variable.

To obtain the count within each category after -stcox- I use

Code:

tab indepvar if e(sample)

Is there a way to make that code work on imputed datasets? I tried to use the same code after doing mi estimate: stcox, but it returned the following error:

Code:

tab indepvar if e(sample)
no observations

Best regards,
Sigrid

↧

Hausman test issues in panel data

August 23, 2019, 2:39 am

≫ Next: Saving margins estimates as a new variable

≪ Previous: Use e(sample) on imputed dataset

Hi all, i am sonia kaur. I am using stata 15.1 version at tbe moment. I joined this forum recently and i needed some advice on a thesis im writing.

My question is : i am currently working on my thesis ( determinants of bank profitability) in which i have to analyse approximately 17000 banks over 64 quarters. The determinants i have chosen are lag roa, size , credit risk ratio, a few more ratios as well and inflation , gdp and interest rates.

I have a few issues that i need to clarify and i really hope i can get some insights from you

1) firstly i set my data as panel using the xtset command( i set my data as unbalanced and over 64 quarters .) Then i moved on to pooled regression and used the reg command while including roa as dependent variable and lagroa and the rest as my independent variables. I realised 2 of the variables are insignificant ( interest coverage ratio and bank efficency) so i dropped them from.my equation. ( Lag roa still included )

Then i go on to run my fe and re using the remaining variables plus lagroa but the problem i face is in the hausman test. My test is giving me a positive definite error and i did some research and tried to change the commands to fe_all, re_all , store both of them and then use the command " hausman fe_all re_all, sigmamore. And i do not get the error anymore. Now my question is can i actually do this to solve the problem or am i just forcing the data to work my way by using this commands? Is there an underlying problem i am not seeing? Im really confused. Please do give me your insights.thankyou so much

↧

Saving margins estimates as a new variable

August 23, 2019, 4:06 am

≫ Next: How do I keep only the matched observation after data set merge (STATA 15)

≪ Previous: Hausman test issues in panel data

Hi Statalisters,
Is it possible to save margins estimates as a new variable? In the example below, I would like to create a variable that contains the marginal effects of ycn across the whole range of age in order to then use this variable in a two-way graph (that way I will be able to combine several "margins plots" in one graph). You can of course create this variable manually, but that is so tiresome...

I found a similar post (https://www.statalist.org/forums/for...=1566557316931) but I do not manage to get the code provided in that post to work in my case.

Code:

.  . use http://www.stata-press.com/data/r13/margex, clear
(Artificial data for margins)

. . reg outcome c.ycn c.age c.age#c.ycn

      Source |       SS       df       MS              Number of obs =    3000
-------------+------------------------------           F(  3,  2996) =  182.34
       Model |  65.2517193     3  21.7505731           Prob > F      =  0.0000
    Residual |  357.387947  2996  .119288367           R-squared     =  0.1544
-------------+------------------------------           Adj R-squared =  0.1535
       Total |  422.639667  2999  .140926865           Root MSE      =  .34538

------------------------------------------------------------------------------
     outcome |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ycn |  -.0018226   .0010404    -1.75   0.080    -.0038625    .0002173
         age |    .008331   .0019362     4.30   0.000     .0045346    .0121275
             |
 c.age#c.ycn |   .0000576   .0000252     2.29   0.022     8.18e-06    .0001071
             |
       _cons |  -.1989264   .0783917    -2.54   0.011    -.3526334   -.0452194
------------------------------------------------------------------------------

. margins, dydx (ycn) at (age=(20(1)60))

Thanks in advance!

Kind regards,
David

↧

How do I keep only the matched observation after data set merge (STATA 15)

August 23, 2019, 4:34 am

≫ Next: Mixed effects model identification

≪ Previous: Saving margins estimates as a new variable

Hi,

I used the one to many formula to merge two data sets. Once done, how can I keep only the matched observations?

merge 1:m hhcode using "xyz.dta"
(label province already defined)
(label region already defined)

Result # of obs.
-----------------------------------------
not matched 379,662
from master 142 (_merge==1)
from using 379,520 (_merge==2)

matched 120,347 (_merge==3)
-----------------------------------------

Best,
Shehryar

↧

Mixed effects model identification

August 23, 2019, 4:46 am

≫ Next: Loop over several files in a directory and save each dataset in a new .dta

≪ Previous: How do I keep only the matched observation after data set merge (STATA 15)

This isn’t necessarily a Stata specific question, but one that seems like it would still be relevant to the broader community nonetheless.

After reviewing a couple of texts on the subject of mixed effects models (i.e., Raudenbush & Bryk, 2002; Rabe-Hesketh & Skrondal, 2012) it seems that model identification is not discussed much if at all for mixed effects models; there were no index entries in the first reference above and only a single reference in the later. So, I was wondering if anyone in the Stata community knew of any commands, papers, or procedures that one could use to determine whether or not their model is identified?

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. New York City, NY: Sage Publications Inc

Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata. 3rd Edition. College Station, TX: Stata Press.

↧

Loop over several files in a directory and save each dataset in a new .dta

August 23, 2019, 5:09 am

≫ Next: Renaming multiple variables in a loop

≪ Previous: Mixed effects model identification

Hello. I would like to repeat do-files for several datasets in a directory. I like to include such a loop in my master do file. I have different merge-datasets, but would likt to perform the same cleaning operations on all of them. At the moment I always copy/comment "use data1.dta" at the begin and "save data1_done.dta" at the end of every do file.

I have some problems with loading all .dta files in a local and running a loop over that local variable, finally saving new files with appended names.

Code:

******************************************************
* Generating two test data files in current directory
******************************************************
clear

* Nick Data 2005
set seed 123
set obs 100
forval j = 1/10 {
    gen v`j' = uniform()
}
save "data2005.dta", replace
clear

* Nick Data 2015
set seed 123
set obs 6
egen year = seq(), from(1991) to(1993) block(2)
gen country = cond(mod(_n, 2), "UK", "US")
gen growth = 1 + rnormal()

save "data2015.dta", replace
clear

****************************************************************
* Creating a local list of all .dta files in directory
* Doing a loop over all files, e.g. dropping first obs
* Saving each dataset in a new file, append "done" in file name
****************************************************************

local files : dir . files ".dta"

 foreach aaa of local files {
     use `aaa'
     drop if _n == 1
 save `aaa'+"_done.dta", replace
 }

PS: The test data files are unrelated to my loop problem. You can take any other two files and perform any other operation on them.

↧

Renaming multiple variables in a loop

August 23, 2019, 5:20 am

≫ Next: Sample Distribution by Years

≪ Previous: Loop over several files in a directory and save each dataset in a new .dta

Hi,

How can I rename multiple variables without having to manually rename every single one of them using a loop? I am using the following formula but I get an error:

. local a " qq01 qq02 qq03 qq04 qq05 qq06 qq07 qq08a qq08b qq09a qq09b"

. local b "qq_old01 qq_old02 qq_old03 qq_old04 qq_old05 qq_old06 qq_old07 qq_old08a qq_old8b qq_old9a qq_old9b"

. foreach x of varlist `a' {
2. rename `x' `b'
3. }

syntax error
Syntax is
rename oldname newname [, renumber[(#)] addnumber[(#)] sort ...]
rename (oldnames) (newnames) [, renumber[(#)] addnumber[(#)] sort ...]
rename oldnames , {upper|lower|proper}

Best,
Shehryar
A new STATA user

↧

Sample Distribution by Years

August 23, 2019, 6:10 am

≫ Next: missing coefficient and standard error when using areg command

≪ Previous: Renaming multiple variables in a loop

Dear Statalists,

I have an unbalanced panel sample and want to create a table summarizing firms distribution by years. The command 'panelstat()' would achieve the desired result, though I am not able to install the package.
Does anyone has some suggestions on this?
Many thanks in advance!

Best regards,
Lang

↧

missing coefficient and standard error when using areg command

August 23, 2019, 7:09 am

≫ Next: generating shares from a categorical variable

≪ Previous: Sample Distribution by Years

Hi everyone,

I have a data set that I included here. I try to run the regression to find the effect of import volume on the tariff. The product code includes 6 digits. I include in the regression industry fix effect at 2 digits (for example, product 112345 and 114567 are in the same industry 11) and run the command:

areg tariff import_volume,a( industry_2digit) robust

My result is:

. areg tariff import_volume,a( industry_2digit) robust

Linear regression, absorbing indicators Number of obs = 94
Absorbed variable: industry_2digit No. of categories = 22
F( 1, 71) = .
Prob > F = .
R-squared = 1.0000
Adj R-squared = 1.0000
Root MSE = 0.0000

-------------------------------------------------------------------------------------------------------------
tariff | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------+-------------------------------------------------------------------------------------
import_volume | 0 (omitted)
_cons | 88.26729 . . . . .
-------------------------------------------------------------------------------------------------------------

I try but can not explain why R2==1 (perfect fit) but the coefficient equal 0 and omitted Standard error. I try another command like that

reg tariff import_volume i.industry_2digit, robust

and get the result:

. reg tariff import_volume i.industry_2digit, robust

Linear regression Number of obs = 94
F(0, 71) = .
Prob > F = .
R-squared = 1.0000
Root MSE = 0

--------------------------------------------------------------------------------------------------------------------------
| Robust
tariff | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+---------------------------------------------------------------------------------------------------------
import_volume | 2.54e-16 3.42e-16 0.74 0.461 - 4.29e-16 9.36e-16
|
industry_2di~t |
3 | 18.23529 3.86e-15 4.7e+15 0.000 18.23529 18.23529
7 | -236.4044 4.92e-14 -4.8e+15 0.000 -236.4044 -236.4044
8 | 38.23529 3.86e-15 9.9e+15 0.000 38.23529 38.23529
9 | 38.23529 3.85e-15 9.9e+15 0.000 38.23529 38.23529
13 | 38.23529 3.86e-15 9.9e+15 0.000 38.23529 38.23529
15 | 38.23529 3.97e-15 9.6e+15 0.000 38.23529 38.23529
17 | 38.23529 6.10e-15 6.3e+15 0.000 38.23529 38.23529
24 | 38.23529 9.04e-15 4.2e+15 0.000 38.23529 38.23529
25 | 38.23529 1.62e-14 2.4e+15 0.000 38.23529 38.23529
30 | 38.23529 3.86e-15 9.9e+15 0.000 38.23529 38.23529
41 | 38.23529 3.86e-15 9.9e+15 0.000 38.23529 38.23529
44 | 38.23529 3.86e-15 9.9e+15 0.000 38.23529 38.23529
52 | 38.23529 7.30e-15 5.2e+15 0.000 38.23529 38.23529
57 | 38.23529 9.04e-15 4.2e+15 0.000 38.23529 38.23529
61 | 38.23529 3.86e-15 9.9e+15 0.000 38.23529 38.23529
62 | 38.23529 3.86e-15 9.9e+15 0.000 38.23529 38.23529
64 | 38.23529 1.68e-14 2.3e+15 0.000 38.23529 38.23529
71 | 38.23529 4.94e-15 7.7e+15 0.000 38.23529 38.23529
72 | 38.23529 4.03e-15 9.5e+15 0.000 38.23529 38.23529
74 | 38.23529 3.94e-15 9.7e+15 0.000 38.23529 38.23529
80 | 38.23529 6.09e-15 6.3e+15 0.000 38.23529 38.23529
|
_cons | 61.76471 3.86e-15 1.6e+16 0.000 61.76471 61.76471
---------------------------------------------------------------------------------------------------------------------

Could anyone help me to explain why two results of coefficient of import volume are different? Which one is correct? Thanks in advance!

↧

generating shares from a categorical variable

August 23, 2019, 7:13 am

≫ Next: Dynamically change variable used in evaluation across rows, for example like this: gen result2a = *data2_varK * `=data2_varJ[_n]'

≪ Previous: missing coefficient and standard error when using areg command

Dear Statalist members,

i need help with a stata code. I have data on cities within the Uk

I wan to calculate the proportion of employment in each city at the i digit industry level each year . I have a variable called employment at the city level and my industry variable(sic_1) is at the 1 digit level (ranges from 1 to 9)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int year byte sic_1 double employment str6 city
1997 1  54.08769668519048 "London"
1997 1  73.07463417915024 "Essex"
1997 2  59.05968811831689 "London"
1997 2  61.44326888837763 "Essex"
1997 3  40.98425906135514 "London"
1997 3  62.02598668195702 "Essex"
1997 4   65.1810169258588 "London"
1997 4   75.1066153677049 "Essex"
1997 5  54.12021605954607 "London"
1997 5  53.37966399060853 "Essex"
1997 6  49.52648645433413 "London"
1997 6 52.000408742377566 "Essex"
1997 7  51.60031037604652 "London"
1997 7   52.8027065170633 "Essex"
1997 8  50.37985992614201 "London"
1997 8  51.27768609766951 "Essex"
1997 9  52.20491985980248 "London"
1997 9  50.60917635878345 "Essex"
1998 1   48.6436566075447 "London"
1998 1 48.794584735114896 "Essex"
1998 2 49.801065829718496 "London"
1998 2  52.06410881189149 "Essex"
1998 3  50.99885022670681 "London"
1998 3  52.44109572491248 "Essex"
1998 4 55.552461927164224 "London"
1998 4 56.645853956985015 "Essex"
1998 5  55.14270772565902 "London"
1998 5  54.66427347292895 "Essex"
1998 6 54.040502383238376 "London"
1998 6 52.111713453797314 "Essex"
1998 7  53.09959389875997 "London"
1998 7  54.75493120768624 "Essex"
1998 8 56.730362634566546 "London"
1998 8 58.349022882006906 "Essex"
1998 9  59.78753983409949 "London"
1998 9  61.05003414993989 "Essex"
2001 1  60.00401825408841 "London"
2001 1 58.431435320284805 "Essex"
2001 2  57.28097641466853 "London"
2001 2 57.931585650647364 "Essex"
2001 3  59.38237975034002 "London"
2001 3 56.353335395044134 "Essex"
2001 4  57.12971932143041 "London"
2001 4  57.29789763097837 "Essex"
2001 5 59.763054102687654 "London"
2001 5 62.087372466229354 "Essex"
2001 6 60.917129977206784 "London"
2001 6  61.15803334776842 "Essex"
2001 7   63.6887061635924 "London"
2001 7  68.35047982900696 "Essex"
2001 8  73.73339656952838 "London"
2001 8  80.14376844232532 "Essex"
2001 9  84.99187391851496 "London"
2001 9  81.86854888686999 "Essex"
end

I would like to find the share of each industry(sic) in total employment in a particular year. My actual data has 103 cities.

Many thanks.

Bridget

↧

Dynamically change variable used in evaluation across rows, for example like this: gen result2a = data2_varK `=data2_varJ[_n]'

August 23, 2019, 7:27 am

≫ Next: 5-year relative survival: stpm2

≪ Previous: generating shares from a categorical variable

Hello Statalist,

1. Problem description

I have a data set that is produced by a data base in the format as in the table below, although this example is simplified. The data data set is outputted in a dataex script below the table. The data set is a result of merging two data sets, data1 and data2. In the example below the variable names show which variable comes from which data set. The two data sets are merged on country.

After I have this data I want to multiply the value in data2_varK with the value of the variable listed in data1_varY. So for country Aland I want to multiply data1_varY with data2_varK (22 * .25), but for Eland I wan to multiply data1_varZ with data2_varK (5 * .5). While most countries will only have the one value I want to use in data1_varX, data1_varY and data1_varZ, there are cases, like Cland in the example, that has multiple values.

We want to do this multiple times and sometimes it is not multiplication. It could also be generate a dummy if value in data1_varX, data1_varY or data1_varZ is higher than a cut-off value that also coming from data2.

country	data1_varX	data1_varY	data1_varZ	data2_varJ	data2_varK
Aland		22		data1_varY	.25
Bland		34		data1_varY	.75
Cland	15	42		data1_varX	.6
Dland	24			data1_varX	.85
Eland			34	data1_varZ	.75
Fland			5	data1_varZ	.5

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str5 country byte(data1_varX data1_varY data1_varZ) str10 data2_varJ double(data2_varK result)
"Aland"  . 22  . "data1_varY" .25  5.5
"Bland"  . 34  . "data1_varY" .75 25.5
"Cland" 15 42  . "data1_varX"  .6    9
"Dland" 24  .  . "data1_varX" .85 20.4
"Eland"  .  . 34 "data1_varZ" .75 25.5
"Fland"  .  .  5 "data1_varZ"  .5  2.5
end

2. What we've tried so far

Test 2a;

Code:

gen result2a =  data2_varK * `=data2_varJ[_n]'

We have tried something like this, but `=data2_varJ[_n]' takes the value of data2_varJ from the first row for all rows. So for all rows the code evaluates to this since data2_varJ in the first row is "data1_varY".

Code:

gen result2a =  data2_varK * data1_varY

Test 2b;

Code:

bys data2_varJ : gen result2b =  data2_varK * `=data2_varJ[_n]'

This has the same result as test 2a. `=data2_varJ[_n]' is evaluated into data1_varY for all rows, like this:

Code:

bys data2_varJ : gen result2b =  data2_varK * data1_varY

Test 2c;

Code:

bys data2_varJ : gen result2c =  data2_varK * `=data2_varJ[1]'

This gave the same result as in 2b.

3. What works but we are concerned will be too slow

Code:

        
gen results3a = .
forvalue obs = 1/`=_N' {
    replace results3a =  data2_varK * `=data2_varJ[`obs']' if _n == `obs'
}

We are so far only piloting this, so this might not be too slow, but we are concerned that it will be. We also would love to be able to solve this on one line as then we can plug it in to the data base so it happens on the fly. Nevertheless, it is good to have this loop-option as an option of last resort in case someone else have a better suggestion.

Thanks,
Kristoffer

↧

5-year relative survival: stpm2

August 23, 2019, 7:50 am

≫ Next: Label two Y axis with Xtline code

≪ Previous: Dynamically change variable used in evaluation across rows, for example like this: gen result2a = *data2_varK * `=data2_varJ[_n]'

Dear Listers,

I am running relative survival analysis to explore disease status (yes vs. no) and have 4 categorical predictors in my model (age, sex, smoke and bmi). I am using stpm2 and would like to obtain 5-year relative survival associated with each predictor's categories.

i have 2 questions on this:

1) I am currently using meansurv as
stpm2 relapse i.sex i.age2 i.age3 i.age4 i.age5 i.smk2 i.smk3 i.bmi1 i.bmi3 i.bmi4, df(3) scale(hazard) bhazard(rate) eform tvc(agec2 agec3 agec4 agec5) dftvc(2)

g rs5 = 5
predict rs_sex1, surv timevar(rs5) at(sex 1)
predict rs_sex2, surv timevar(rs5) at(sex 2)
predict rs_age1, surv timevar(rs5) at(age 1)
etc...

I am using survival but in some example I have seen the use of meansurv - which one is appropriate in this case? what's teh main difference between surv and meansurv?

2) I noticed the estimated 5-year relative survival for the reference category of some predictors is identical - is this OK as it may reflect overall survival or am I doing something wrong?
for example age category 1 RS5= .45
and smoking category 1 RS5 = .45

Thanks in advance!

↧

Label two Y axis with Xtline code

August 23, 2019, 7:50 am

≫ Next: MHTEXP*Theorem 3.1 Adjusted Values Extraction

≪ Previous: 5-year relative survival: stpm2

I am using an xtline to graph two variables with very different scales. I am using the code below

xtline ManufacturingProperty, addplot(line FeeinLieuandJointInd year, yaxis(2))

It works in the sense that it allows me to use different scales for the left and right Y axis but I cannot figure out how to label the 2nd Y axis since addplot does not seem to allow ytitle() as a suboption.

↧

MHTEXP*Theorem 3.1 Adjusted Values Extraction

August 23, 2019, 8:03 am

≫ Next: Two-level Logistic Regression with Complex Survey Design - Query

≪ Previous: Label two Y axis with Xtline code

Hi everyone,

I have been trying to create a matrix of List's MHT adjusted p-values that I get after running the mhtexp command but I can't seem to be able to do so. Please would you help me find the mistake? Something is probably wrong with the specification of the matrices in the last line.

gen Treat1=0
replace Treat1=1 if incentivize==1
replace Treat1=2 if spillover==1
replace Treat1=3 if spillovercontrol==1
mhtexp $Demographics, treatment(Treat1)
matlist results
matrix define D == results
matlist D
local i=0
foreach var in $Demographics {
local i=`i'+1
display `i'
forvalues j=1(1)8{
local j = [(`j'-1)*3 +1]
display `j'
matlist D
matrix A = J(8,2,.)
matlist A

mat A[`i', 1] = mat D[`j', 1]
mat A[`i', 3] = mat D[`j', 3]
}
}

Thanks a million,
kind regards,

↧

Two-level Logistic Regression with Complex Survey Design - Query

August 23, 2019, 8:14 am

≫ Next: Help on using coded variables to run the mincer wage equation?

≪ Previous: MHTEXP*Theorem 3.1 Adjusted Values Extraction

Hello,

I am relatively new to Stata (and Multilevel Modelling more broadly), and would be grateful for some support. My question is regarding running a 2-level logistic regression.

More specifically, I am trying to run a 2-level logistic regression taking into account Complex Survey Design, but I'm not quite sure if my Stata code is correct. I have chosen to run a 2-level logistic regression (i.e. using -melogit- because I cannot use -svyset- with -xtlogit-). Please note, I am using Stata 15.1.

My data is 2-level in that it reflects multiple observations over time (2009-2018) nested within an individual (identified by the pidp variable in the code below). My dependent variable, empstatus, is coded as 1=employed and 0=unemployed. My dataset comprises of approximately 160,000 observations.

I have gone through a variety of material including the Stata Manual, and Rabe‐Hesketh, S., & Skrondal, A. (2006). Multilevel modelling of complex survey data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(4), 805-827, but haven't always fully understood them. Particularly when there is a discussion around needing to have weights at both levels.

I have also looked at previous questions raised, but haven't managed to find advice on a specific way to get Stata to run the model I want. I have found advice on applying weights in a multilevel logit model, for example through the following code,

Code:

melogit empstatus i.gender age i.race[pweight=h_indinui_lw] || pidp: , allbase

However, unless I am mistaken such code doesn't also take into account clustering and stratification (identified through my strata' and psu variables in the code below) which is what I would like to do.

From all the code I have played around with, the below seems to make the most sense to me (but as I said I'm not 100% certain it is indeed doing what I want).

Step1: Communicate the Complex Survey Design

Code:

svyset, clear
svyset psu, strata(strata) weight(h_indinui_lw) singleunit(scaled)

Note: If I were using the -xtreg- command I would normally run the -svyset- command as follows:

Code:

svyset, clear
svyset psu [pweight = h_indinui_lw], strata(strata) singleunit(scaled)

But that generates an error when I then run the -melogit- command after, therefore I specified -svyset- differently in this case.

Step 2: Run the 2-level logistic regression taking into account Complex Survey Design

Code:

svy: melogit

Code:

empstatus i.gender age i.race|| pidp: , or allbase

The command runs with no error in Stata 15.1, but I'm not sure if the output reflects what I actually think I am running. Therefore, I would be very grateful if you could advise:

(i) if the code above (Step 1 and Step 2 combined) is indeed asking Stata to run a 2-level logit regression taking into account Complex Survey Design (i.e. clustering, weights, and stratification all at the same time)

(ii) if there is a book/article you can point me to that goes through multilevel logistic regression with a specific focus on how to apply Complex Survey Design using Stata, I would be very grateful.

Many thanks in advance for your help,

Samir Sweida-Metwally

↧

Help on using coded variables to run the mincer wage equation?

August 23, 2019, 8:16 am

≫ Next: Creating table using estout

≪ Previous: Two-level Logistic Regression with Complex Survey Design - Query

Hi,

I'm trying to run the mincer wage equation using ols regression with a random effect model. In my survey data, the education and age variables are coded and the years of experience variable was not given

However, mincer stated that the years of experience can be obtained using this formula- Years of education-age-5, but since my education variable and age are in codes, it's quite not straightforward to get the potential years of experience.

I was wondering if I could assume years of education based on the education level and get the midpoint of my age variable(coded 2=18-64) in order to run the mincer wage regression.

Would that be correct? Would it make the result of my thesis bias?

Hope to get a response soon

↧