check whether variable contains specific realization

August 31, 2016, 6:14 am

≫ Next: STATA SEM builder and survey weighted data

≪ Previous: code for Fama-Macbeth with Shanken (1989) correction

Hi all,

I would like to check (in an if environment) whether a certain value of var2 appears somewhere in var1 and consequently mark it in var3;
If possible also using a by operator to check that for specific region within var1.

Thus if there is an 8 of 5 in var1, set var3 to 1 in the specific row as can be seen below:

var1	var2	var3
0
1
2
3	8	1
4
5	10
6
7	5	1
8	11
9

I don't know if there is a simple expression or if I have to go there "by foot" with some help-variables.

Thanks
Tim

↧

STATA SEM builder and survey weighted data

August 31, 2016, 5:30 pm

≫ Next: Graphing binary response variable with extreme independent variable

≪ Previous: check whether variable contains specific realization

I am a new user to Stata because I want to analyse some gSEM models with survey weighted data - which I understand is a new feature in Stata 14. What I have noticed is, that even when accessed through the Survey menu, the command line generated through the STATA SEM builder DOES NOT include the necessary "svy:" prefix to produce appropriately weighted estimates (when I had, perhaps naively, assumed it would). To obtain the appropriately weighted estimates, it is necessary to edit the command line. This is just a cautionary note to anyone else trying to do a similar analysis.

↧

Graphing binary response variable with extreme independent variable

August 31, 2016, 8:13 pm

≫ Next: How to obtain initial values for melogit

≪ Previous: STATA SEM builder and survey weighted data

My question is relatively simple.

I'm trying to study whether primary school completion (yes/no- binary) varies with income for 3 racial groups. The problem with my data is that for some groups I have a large proportion of people bunched at zero to very little income. I also some extreme outliers with very high incomes. I essentially just want a way to be able to visualize the relationship between the probability of having completed primary school and income (elasticity of primary school completion to income). Income is measured in local currency. Since I am using survey data with sample weights I want to be able to use weights appropriately. This data is at the individual level and the question essentially is- Have you completed primary school. I also have several other controls for gender, age, etc.

I want to know whether the income elasticity is different for the 3 groups.

What would be the best way to do that graphically and econometrically.

↧

How to obtain initial values for melogit

August 31, 2016, 9:53 pm

≫ Next: FE model: sufficient number of transitions?

≪ Previous: Graphing binary response variable with extreme independent variable

I have problem with my model, since stata iterations always end up with not concave error. One possible solution can be initial values. But the problem is I don't know the syntax for it. I've searched the net and came up with ml search syntax, but couldn't really drill through it. If anyone could explain it thoroughly to me, it would be a great help to me.

many thanx

↧

FE model: sufficient number of transitions?

September 1, 2016, 12:33 am

≫ Next: Converting EFA and Cluster Analysis Data to Multi-Dimensional Scaling in Stata

≪ Previous: How to obtain initial values for melogit

Hi all,

I am just wondering if there is a way or a rule of thumb on how many transitions I need to observe in order to run a FE model?

I am looking at transitions out of cohabitation and am observing 69 transitions. I would like to run a FE model with a continuous variable (income) as my dependent variable and my main independent variable Cohabitation_Separated (0 if person is cohabiting and 1 if person is separated from cohabitation). I don't intend on including too many control variables (maybe around 5 variables).

Are there any power tests I should run or any other test I should do to make sure that 69 is a sufficient number?

Thanks,
Nicole

↧

Converting EFA and Cluster Analysis Data to Multi-Dimensional Scaling in Stata

September 1, 2016, 12:49 am

≫ Next: qui and quietly are the same command? but "quietly" brings syntax error

≪ Previous: FE model: sufficient number of transitions?

I originally posted this in the Sandbox and was advised to post it here. I appreciate any help. Here is my first posting:

I am doing research on factors of student success using a 40-question survey. The survey is composed of 5-point Likert scale questions. I will be using the same dataset for Exploratory Factor Analysis with factor rotation (EFA), K-Cluster Analysis (CA), and Multi-dimensional Scaling (MDS). I have already completed the design in Stata for the EFA and CA, and have successfully done a practice run with dummy data.

I am now having difficulties with converting the same dataset (of factors) for use with MDS. Basically, I am looking to start with creating a 40X40 correlation coefficient matrix, then square the results. I know the "correlate / pwcorr" command and the squaring separately, but I just can't find the appropriate combination of commands for doing that and then pass it on to the "mds" command for the MDS solution and map. Is this the proper way or am I going in the wrong direction?

Thank you so much for any help.

Jose

↧

qui and quietly are the same command? but "quietly" brings syntax error

September 1, 2016, 1:38 am

≫ Next: Survival analysis

≪ Previous: Converting EFA and Cluster Analysis Data to Multi-Dimensional Scaling in Stata

Hello everyone,

I am leaning stata by following already made "do foles"
The command below works fine. But as I replace qui with "quietly" (as I believe these two are the same command), it brings an error msg : "invalid syntax"
Could you please explain me what is wrong?

PS. would you recommend other good sources to learn stata - I want to learn by replicating already made do files - or if there is any other good ways to learn.
Thanks a lot!

qui {
rename State state_old
gen state =.
replace state = 00 if state_old==99
label define state_lbl 00 "notdefined" 01 "somewhere" 01 "delhi" ///
label value state state_lbl
}

↧

Survival analysis

September 1, 2016, 3:53 am

≫ Next: Confidence Interval for Logistic Regression

≪ Previous: qui and quietly are the same command? but "quietly" brings syntax error

Hi all,

In the review of a scientific article in which I made two type of survival analysis 1) Kaplan-Meier method with log-rank test and 2) Cox regression model, I received the following recommendation of the Editorial Board:

"If the author decided to use semiparametric approach, proportional hazard assumption should be tested first. If the assumption met, the author should report the result from Cox regression. In term of the survival curve, since the author adjusted for covariates, the survival curve can be produced based on the clinically meaningful definition of the covariate. For example if age is the covariate, it could be mean age, so that the survival curve will represent the survival of patient with average age. If the assumption not met, the author may consider to use non-parametric approach, but need to respect to the shape of the survival curve before choosing a test to test the difference between the curve (such as Log-Rank test). ".

The assumption has been met and I will report the result from Cox regression, but I have doubts about what I have to do with survival curves: ¿should I keep the same graph made with sts graph, failure by () without reporting the long rank test? or ¿should I create a new graph using sts graph, failure by () adjustedfor ()?

Thanks,
Germán

↧

Confidence Interval for Logistic Regression

September 1, 2016, 4:20 am

≫ Next: Reshaping data

≪ Previous: Survival analysis

Dichotomous outcome: Negative biopsy(0) Positive biopsy (1)
Three Predictors: x, y, z (all continuous variables)

I use the command logit outcome x y z to get intercept and betas

Risk score = intercept +beta1*x + beta2*y +beta3*z

Probability of positive biopsy = exp(risk score) / (1 +exp(risk score))

How do I get a confidence interval for probability of positive biopsy??????

Thanks

↧

Reshaping data

September 1, 2016, 4:36 am

≫ Next: Advice on creating a loop to generate a time updated slope variable

≪ Previous: Confidence Interval for Logistic Regression

Dear all,
Am trying to reshape the data below so that each hhid is represented by a single observation.
So far, I have tried creating a new "j" variable and use hhid as my "i" in vain. What can I do?
hhid mem Gender Marital_stat Age b06
1 4 female Never married 13 10
1 2 female Married 32 8
1 3 female Never married 21 6
1 5 male Never married 18 4
1 4 male Married 73 2
1 6 male Never married 8 6
1 7 female Never married 4 10
2 3 female Never married 21 4
2 6 male Never married 15 1
2 4 male Never married 18 2
2 5 female Never married 11 6
2 7 male Never married 9 9
2 2 female Married 40 1
2 8 male Never married 7 10
2 1 male Married 50 0
3 2 female Married 28 4
3 5 male Never married 1 6
3 4 female Never married 5 2
3 3 male Never married 12 5
3 1 male Married 39 2
4 4 male Never married 2 1
4 2 female Married 28 2
4 1 male Married 30 4
4 3 male Never married 4 9
5 2 female Married 22 9
5 3 female Never married 1 11

↧

Advice on creating a loop to generate a time updated slope variable

September 1, 2016, 4:58 am

≫ Next: Reorder multiple numbered variables (stub ending) sequentially and make them alternate

≪ Previous: Reshaping data

Dear Statalist

This is my first post and I'm afraid I'm not a very experienced programmer and I can't find similar answers online but I really hope one of you will be able to help me.

I am doing an analysis looking at renal function (eGFR) during exposure in different drug categories and I would normally do this by looking at eGFR slopes using eGFR as a variable in a mixed effects model or OLS ie. regress egfr time i.drug

The dataset is currently set up similar to this where eGFR is time updated per month (although for the purposes of the slope coefficient, this is generated as change in eGFR per year). There are 50,000 individual IDs with between 1 and 500 eGFR values in the dataset.

id	egfr	drug	date
1	75	1	1nov2000
1	75	2	1dec2000
1	64	2	1jan2001
1	64	2	1feb2002
2	90	1	1jul2004

etc..

Although this gives me an estimate of the overall eGFR slope, I am interested in looking at generating a time updated slope variable where there is value for the slope at each time point. Each slope value at each date for each subject would include all the eGFR measurements up to and including that time point.

My problem is generating this time updated slope variable; I would like to use linear regression (or indeed, a slope equation) to generate the slope and use this as my new variable but I don't know how to create a loop to do this, or incorporate the result in my dataset.

I would be very grateful for any advice

Many thanks
Lisa

↧

Reorder multiple numbered variables (stub ending) sequentially and make them alternate

September 1, 2016, 5:10 am

≫ Next: Item response model in Stata 12

≪ Previous: Advice on creating a loop to generate a time updated slope variable

I'm using Stata 14 and have a seemingly simple question about order the variables in my list. I have multiple variables that are tagged with numbered stubs at the end. I'd like to order these variables so that they alternate. For instance, if I have the two variables 'districtX' and 'perhundistrictX', where X represents 1, 2, 3, ..., X, I'd like this:

district1
perhundistrict1
district2
perhundistrict2
district3
perhundistrict3

rather than this:

district1
district2
district3
perhundistrict1
perhundistrict2
perhundistrict3

I'm hoping to avoid having to type out all variables by total numbered stubs, N X 59.

Any help is appreciated.

↧

Item response model in Stata 12

September 1, 2016, 5:13 am

≫ Next: Interaction terms

≪ Previous: Reorder multiple numbered variables (stub ending) sequentially and make them alternate

Dear all,

I would like to use item response model (that is developed with Stata version 14) in Stata 12.

However, whenever I tried to install it via "ssc install itr" or " findit itr" I could not find it.

Is it not possible to use item response model with Stata 12?

P.S: the command for the item response model is
irt grm

Thank you,
Maya

↧

Interaction terms

September 1, 2016, 5:30 am

≫ Next: Multiple observations per id

≪ Previous: Item response model in Stata 12

Hi there, I am looking at the determinants of debt source before and after the financial crisis of 2008 and wondering what would be the best way to go about this in Stata.

I have my dependent variable, 12 continuous independent variables, and a dummy variable 'postcrisis' which is=1 when year>=2008.

I have created an interaction term for each of my independent variables by multiplying them by 'postcrisis', and want to run xtreg with random effects including the 12 independent variables, the dummy variable 'postcrisis' and the 12 interaction terms.

However, I am not sure that this is the right approach to my problem. I was planning to run 2 separate regressions for each time period, but thought this might reduce the accuracy of my results.

Would I be right to think that if I ran the regression as above, the coefficients on the interaction terms now tell me the relationship between the independent variables and dependent variable after 2008, and the coefficients of the original independent variables now describe the relationship before 2008?

Thanks,
Lily

↧

Multiple observations per id

September 1, 2016, 5:46 am

≫ Next: Is there anything more efficient than rowtotal?

≪ Previous: Interaction terms

Hello,

Im new to STATA so excuse my inexperience. In my dataset I have multiple observations per id. In a simplistic way it looks like this

ID	Dummy	X	Irrelevant data	More irrelevant data
1	1	5
1	1	5
1	1	5
2	0	2
2	0	2
2	0	2
3	1	4
3	1	4
3	1	4

I want to do a mean comparrison t test for X on the dummy variabele. However I only need 1 observations per ID instead of the 3 observations. I dont want to delete observations since I need the irrelevant data for other test.

How can I do this?

Thanks in advance.

↧

Is there anything more efficient than rowtotal?

September 1, 2016, 6:38 am

≫ Next: Kaplan Meier Survival Analysis - Censoring

≪ Previous: Multiple observations per id

Hi,

First let me provide a generalised version of my dataset.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 a byte(b1 b2 b3 c1 c2 c3 d1 d2 d3 e1 e2 e3)
"b1" 1 4 7  5  3 9 7  5  3  2 1 6
"b2" 3 6 9 12 15 6 9 12 15  3 7 3
"b3" 5 3 3  3  3 3 3  3  3 15 9 4
"c1" 8 2 4  6  8 2 4  6  8  3 3 5
"c2" 9 4 5  6  7 4 5  6  7  8 4 1
"c3" 9 5 7  9 11 5 7  9 11  7 5 1
"d1" 8 6 1  2  5 6 1  3  4 11 7 1
"d2" 7 4 1  2  7 4 1  5  1  5 1 5
"d3" 5 3 2  1  7 3 2  1  2  7 1 9
"e1" 4 3 3  3  3 3 3  3  3  7 2 8
"e2" 6 2 8 14 20 2 8 14 20  3 3 9
"e3" 6 1 9 17 25 1 9 17 25 20 8 6
end

I wish to sum the rows based on the variables. Here, as in my true dataset, I want to sum the variables depending common variable prefixes. Therefore, for this very simple dataset, I used the following code, which provides the desired results.

Code:

egen b = rowtotal(b1-b3)
egen c = rowtotal(c1-c3)
egen d = rowtotal(d1-d3)
egen e = rowtotal(e1-e3)

keep a b c d e

The issue lies in the fact that my true dataset contains 5000 variables, each with 5000 observations. Therefore, I need to generate a large number of additional variables, which is time consuming.

My data is formatted like a square matrix, just like the sample dataset I provided above. Therefore, I was wondering if I convert the data to a matrix and then compute the sums, would this be quicker?

Is there another alternative I have not yet thought of?

↧

Kaplan Meier Survival Analysis - Censoring

September 1, 2016, 11:54 am

≫ Next: product looping?

≪ Previous: Is there anything more efficient than rowtotal?

Hey guys and girls,

i am doing a project in a hospital and am looking at the survival of renal cancer patients on a particular treatment. Now, i am going to need to do the kaplan meier analysis to get some survival curves and this is my first time doing this. i have separated all my data into what i need them to be and have calculated the days each person survived (or until they were last seen in clinic if they were still alive). i am slightly confused about the process of censoring my data. do i censor the ones who are still alive? if so, do i ascribe them a number 0 or a 1? i know this is a fundamental part of the analysis but i have never done stats before in my life. i dont want to get this essential component wrong!!

best,

simon

↧

product looping?

September 1, 2016, 12:12 pm

≫ Next: How to check if data present every period

≪ Previous: Kaplan Meier Survival Analysis - Censoring

Dear all,

I'm curious if I can do something like this in Stata,

for t = 1:12 {
replace yyy = xxx*R[_n-1]*...R[_n-t] if missing(yyy)
}

Quintessentially, I want a for loop that updates yyy by xxx times product of R's lagged 1 month to t months if yyy is missing. Of course, I can do,

replace yyy = xxx*R[_n-1] if missing(yyy)
replace yyy = xxx*R[_n-1]*R[_n-2] if missing(yyy)
replace yyy = xxx*R[_n-1]*R[_n-2]*R[_n-3] if missing(yyy)
etc.

But I'm curious if there's something nicer. I'd very much appreciate your lead.

Thanks so much for your time.

Best,

John

↧

How to check if data present every period

September 1, 2016, 12:58 pm

≫ Next: gladder command

≪ Previous: product looping?

So my data consists of GP practices and periods, among other things. In other words for each practice I (might) have data for each period from 201201 - 201512 (months from 2012 to 2015). I strongly suspect some practices do not have data for every single period. Is there a way to check which practices are present in every period and which ones are not?

↧

gladder command

September 1, 2016, 2:04 pm

≫ Next: countfit macro problems

≪ Previous: How to check if data present every period

Hi,

I would like to check different transformations of my dependent variable. I am using the "gladder" command, but for some reason I am only getting a few of the transformations (cubic, square, identity, and sqrt). I am missing 5 (log, 1/sqrt, inverse, 1/square, and 1/cubic). I am using Stata 12.0.

My code is: gladder depression

Any insight as to how to get the remaining histograms would be greatly appreciated.

Thanks.

↧