Quantcast
Channel: Statalist
Viewing all 72765 articles
Browse latest View live

check whether variable contains specific realization

$
0
0
Hi all,

I would like to check (in an if environment) whether a certain value of var2 appears somewhere in var1 and consequently mark it in var3;
If possible also using a by operator to check that for specific region within var1.

Thus if there is an 8 of 5 in var1, set var3 to 1 in the specific row as can be seen below:
var1 var2 var3
0
1
2
3 8 1
4
5 10
6
7 5 1
8 11
9















I don't know if there is a simple expression or if I have to go there "by foot" with some help-variables.

Thanks
Tim

STATA SEM builder and survey weighted data

$
0
0
I am a new user to Stata because I want to analyse some gSEM models with survey weighted data - which I understand is a new feature in Stata 14. What I have noticed is, that even when accessed through the Survey menu, the command line generated through the STATA SEM builder DOES NOT include the necessary "svy:" prefix to produce appropriately weighted estimates (when I had, perhaps naively, assumed it would). To obtain the appropriately weighted estimates, it is necessary to edit the command line. This is just a cautionary note to anyone else trying to do a similar analysis.

Graphing binary response variable with extreme independent variable

$
0
0
My question is relatively simple.

I'm trying to study whether primary school completion (yes/no- binary) varies with income for 3 racial groups. The problem with my data is that for some groups I have a large proportion of people bunched at zero to very little income. I also some extreme outliers with very high incomes. I essentially just want a way to be able to visualize the relationship between the probability of having completed primary school and income (elasticity of primary school completion to income). Income is measured in local currency. Since I am using survey data with sample weights I want to be able to use weights appropriately. This data is at the individual level and the question essentially is- Have you completed primary school. I also have several other controls for gender, age, etc.

I want to know whether the income elasticity is different for the 3 groups.

What would be the best way to do that graphically and econometrically.

How to obtain initial values for melogit

$
0
0
I have problem with my model, since stata iterations always end up with not concave error. One possible solution can be initial values. But the problem is I don't know the syntax for it. I've searched the net and came up with ml search syntax, but couldn't really drill through it. If anyone could explain it thoroughly to me, it would be a great help to me.

many thanx

FE model: sufficient number of transitions?

$
0
0
Hi all,

I am just wondering if there is a way or a rule of thumb on how many transitions I need to observe in order to run a FE model?

I am looking at transitions out of cohabitation and am observing 69 transitions. I would like to run a FE model with a continuous variable (income) as my dependent variable and my main independent variable Cohabitation_Separated (0 if person is cohabiting and 1 if person is separated from cohabitation). I don't intend on including too many control variables (maybe around 5 variables).

Are there any power tests I should run or any other test I should do to make sure that 69 is a sufficient number?

Thanks,
Nicole

Converting EFA and Cluster Analysis Data to Multi-Dimensional Scaling in Stata

$
0
0
I originally posted this in the Sandbox and was advised to post it here. I appreciate any help. Here is my first posting:

I am doing research on factors of student success using a 40-question survey. The survey is composed of 5-point Likert scale questions. I will be using the same dataset for Exploratory Factor Analysis with factor rotation (EFA), K-Cluster Analysis (CA), and Multi-dimensional Scaling (MDS). I have already completed the design in Stata for the EFA and CA, and have successfully done a practice run with dummy data.

I am now having difficulties with converting the same dataset (of factors) for use with MDS. Basically, I am looking to start with creating a 40X40 correlation coefficient matrix, then square the results. I know the "correlate / pwcorr" command and the squaring separately, but I just can't find the appropriate combination of commands for doing that and then pass it on to the "mds" command for the MDS solution and map. Is this the proper way or am I going in the wrong direction?

Thank you so much for any help.

Jose

qui and quietly are the same command? but "quietly" brings syntax error

$
0
0
Hello everyone,

I am leaning stata by following already made "do foles"
The command below works fine. But as I replace qui with "quietly" (as I believe these two are the same command), it brings an error msg : "invalid syntax"
Could you please explain me what is wrong?

PS. would you recommend other good sources to learn stata - I want to learn by replicating already made do files - or if there is any other good ways to learn.
Thanks a lot!

qui {
rename State state_old
gen state =.
replace state = 00 if state_old==99
label define state_lbl 00 "notdefined" 01 "somewhere" 01 "delhi" ///
label value state state_lbl
}

Survival analysis

$
0
0
Hi all,

In the review of a scientific article in which I made two type of survival analysis 1) Kaplan-Meier method with log-rank test and 2) Cox regression model, I received the following recommendation of the Editorial Board:

"If the author decided to use semiparametric approach, proportional hazard assumption should be tested first. If the assumption met, the author should report the result from Cox regression. In term of the survival curve, since the author adjusted for covariates, the survival curve can be produced based on the clinically meaningful definition of the covariate. For example if age is the covariate, it could be mean age, so that the survival curve will represent the survival of patient with average age. If the assumption not met, the author may consider to use non-parametric approach, but need to respect to the shape of the survival curve before choosing a test to test the difference between the curve (such as Log-Rank test). ".

The assumption has been met and I will report the result from Cox regression, but I have doubts about what I have to do with survival curves: ¿should I keep the same graph made with sts graph, failure by () without reporting the long rank test? or ¿should I create a new graph using sts graph, failure by () adjustedfor ()?



Thanks,
Germán

Confidence Interval for Logistic Regression

$
0
0
Dichotomous outcome: Negative biopsy(0) Positive biopsy (1)
Three Predictors: x, y, z (all continuous variables)

I use the command logit outcome x y z to get intercept and betas

Risk score = intercept +beta1*x + beta2*y +beta3*z

Probability of positive biopsy = exp(risk score) / (1 +exp(risk score))

How do I get a confidence interval for probability of positive biopsy??????


Thanks

Reshaping data

$
0
0
Dear all,
Am trying to reshape the data below so that each hhid is represented by a single observation.
So far, I have tried creating a new "j" variable and use hhid as my "i" in vain. What can I do?
hhid mem Gender Marital_stat Age b06
1 4 female Never married 13 10
1 2 female Married 32 8
1 3 female Never married 21 6
1 5 male Never married 18 4
1 4 male Married 73 2
1 6 male Never married 8 6
1 7 female Never married 4 10
2 3 female Never married 21 4
2 6 male Never married 15 1
2 4 male Never married 18 2
2 5 female Never married 11 6
2 7 male Never married 9 9
2 2 female Married 40 1
2 8 male Never married 7 10
2 1 male Married 50 0
3 2 female Married 28 4
3 5 male Never married 1 6
3 4 female Never married 5 2
3 3 male Never married 12 5
3 1 male Married 39 2
4 4 male Never married 2 1
4 2 female Married 28 2
4 1 male Married 30 4
4 3 male Never married 4 9
5 2 female Married 22 9
5 3 female Never married 1 11

Advice on creating a loop to generate a time updated slope variable

$
0
0
Dear Statalist

This is my first post and I'm afraid I'm not a very experienced programmer and I can't find similar answers online but I really hope one of you will be able to help me.

I am doing an analysis looking at renal function (eGFR) during exposure in different drug categories and I would normally do this by looking at eGFR slopes using eGFR as a variable in a mixed effects model or OLS ie. regress egfr time i.drug

The dataset is currently set up similar to this where eGFR is time updated per month (although for the purposes of the slope coefficient, this is generated as change in eGFR per year). There are 50,000 individual IDs with between 1 and 500 eGFR values in the dataset.

id egfr drug date
1 75 1 1nov2000
1 75 2 1dec2000
1 64 2 1jan2001
1 64 2 1feb2002
2 90 1 1jul2004
etc..

Although this gives me an estimate of the overall eGFR slope, I am interested in looking at generating a time updated slope variable where there is value for the slope at each time point. Each slope value at each date for each subject would include all the eGFR measurements up to and including that time point.

My problem is generating this time updated slope variable; I would like to use linear regression (or indeed, a slope equation) to generate the slope and use this as my new variable but I don't know how to create a loop to do this, or incorporate the result in my dataset.

I would be very grateful for any advice

Many thanks
Lisa

Reorder multiple numbered variables (stub ending) sequentially and make them alternate

$
0
0
I'm using Stata 14 and have a seemingly simple question about order the variables in my list. I have multiple variables that are tagged with numbered stubs at the end. I'd like to order these variables so that they alternate. For instance, if I have the two variables 'districtX' and 'perhundistrictX', where X represents 1, 2, 3, ..., X, I'd like this:

district1
perhundistrict1
district2
perhundistrict2
district3
perhundistrict3

rather than this:

district1
district2
district3
perhundistrict1
perhundistrict2
perhundistrict3

I'm hoping to avoid having to type out all variables by total numbered stubs, N X 59.

Any help is appreciated.

Item response model in Stata 12

$
0
0
Dear all,

I would like to use item response model (that is developed with Stata version 14) in Stata 12.

However, whenever I tried to install it via "ssc install itr" or " findit itr" I could not find it.

Is it not possible to use item response model with Stata 12?

P.S: the command for the item response model is
irt grm

Thank you,
Maya

Interaction terms

$
0
0
Hi there, I am looking at the determinants of debt source before and after the financial crisis of 2008 and wondering what would be the best way to go about this in Stata.

I have my dependent variable, 12 continuous independent variables, and a dummy variable 'postcrisis' which is=1 when year>=2008.

I have created an interaction term for each of my independent variables by multiplying them by 'postcrisis', and want to run xtreg with random effects including the 12 independent variables, the dummy variable 'postcrisis' and the 12 interaction terms.

However, I am not sure that this is the right approach to my problem. I was planning to run 2 separate regressions for each time period, but thought this might reduce the accuracy of my results.

Would I be right to think that if I ran the regression as above, the coefficients on the interaction terms now tell me the relationship between the independent variables and dependent variable after 2008, and the coefficients of the original independent variables now describe the relationship before 2008?

Thanks,
Lily

Multiple observations per id

$
0
0
Hello,

Im new to STATA so excuse my inexperience. In my dataset I have multiple observations per id. In a simplistic way it looks like this
ID Dummy X Irrelevant data More irrelevant data
1 1 5
1 1 5
1 1 5
2 0 2
2 0 2
2 0 2
3 1 4
3 1 4
3 1 4
I want to do a mean comparrison t test for X on the dummy variabele. However I only need 1 observations per ID instead of the 3 observations. I dont want to delete observations since I need the irrelevant data for other test.

How can I do this?

Thanks in advance.

Is there anything more efficient than rowtotal?

$
0
0
Hi,

First let me provide a generalised version of my dataset.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 a byte(b1 b2 b3 c1 c2 c3 d1 d2 d3 e1 e2 e3)
"b1" 1 4 7  5  3 9 7  5  3  2 1 6
"b2" 3 6 9 12 15 6 9 12 15  3 7 3
"b3" 5 3 3  3  3 3 3  3  3 15 9 4
"c1" 8 2 4  6  8 2 4  6  8  3 3 5
"c2" 9 4 5  6  7 4 5  6  7  8 4 1
"c3" 9 5 7  9 11 5 7  9 11  7 5 1
"d1" 8 6 1  2  5 6 1  3  4 11 7 1
"d2" 7 4 1  2  7 4 1  5  1  5 1 5
"d3" 5 3 2  1  7 3 2  1  2  7 1 9
"e1" 4 3 3  3  3 3 3  3  3  7 2 8
"e2" 6 2 8 14 20 2 8 14 20  3 3 9
"e3" 6 1 9 17 25 1 9 17 25 20 8 6
end

I wish to sum the rows based on the variables. Here, as in my true dataset, I want to sum the variables depending common variable prefixes. Therefore, for this very simple dataset, I used the following code, which provides the desired results.

Code:
egen b = rowtotal(b1-b3)
egen c = rowtotal(c1-c3)
egen d = rowtotal(d1-d3)
egen e = rowtotal(e1-e3)

keep a b c d e

The issue lies in the fact that my true dataset contains 5000 variables, each with 5000 observations. Therefore, I need to generate a large number of additional variables, which is time consuming.

My data is formatted like a square matrix, just like the sample dataset I provided above. Therefore, I was wondering if I convert the data to a matrix and then compute the sums, would this be quicker?

Is there another alternative I have not yet thought of?

Kaplan Meier Survival Analysis - Censoring

$
0
0
Hey guys and girls,

i am doing a project in a hospital and am looking at the survival of renal cancer patients on a particular treatment. Now, i am going to need to do the kaplan meier analysis to get some survival curves and this is my first time doing this. i have separated all my data into what i need them to be and have calculated the days each person survived (or until they were last seen in clinic if they were still alive). i am slightly confused about the process of censoring my data. do i censor the ones who are still alive? if so, do i ascribe them a number 0 or a 1? i know this is a fundamental part of the analysis but i have never done stats before in my life. i dont want to get this essential component wrong!!

best,

simon

product looping?

$
0
0
Dear all,

I'm curious if I can do something like this in Stata,

for t = 1:12 {
replace yyy = xxx*R[_n-1]*...R[_n-t] if missing(yyy)
}

Quintessentially, I want a for loop that updates yyy by xxx times product of R's lagged 1 month to t months if yyy is missing. Of course, I can do,

replace yyy = xxx*R[_n-1] if missing(yyy)
replace yyy = xxx*R[_n-1]*R[_n-2] if missing(yyy)
replace yyy = xxx*R[_n-1]*R[_n-2]*R[_n-3] if missing(yyy)
etc.

But I'm curious if there's something nicer. I'd very much appreciate your lead.

Thanks so much for your time.

Best,


John

How to check if data present every period

$
0
0
So my data consists of GP practices and periods, among other things. In other words for each practice I (might) have data for each period from 201201 - 201512 (months from 2012 to 2015). I strongly suspect some practices do not have data for every single period. Is there a way to check which practices are present in every period and which ones are not?






gladder command

$
0
0
Hi,

I would like to check different transformations of my dependent variable. I am using the "gladder" command, but for some reason I am only getting a few of the transformations (cubic, square, identity, and sqrt). I am missing 5 (log, 1/sqrt, inverse, 1/square, and 1/cubic). I am using Stata 12.0.

My code is: gladder depression

Any insight as to how to get the remaining histograms would be greatly appreciated.

Thanks.
Viewing all 72765 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>