Quantcast
Channel: Statalist
Viewing all 72800 articles
Browse latest View live

Overlay 3 kaplan-meier survival graphs from 3 different populations

$
0
0
Hello,
I'm doing some cross cohort research on SEP and mortality across 3 different cohorts and would like to overlay them onto one graph. Would anyone have any solutions to this. I have viewed previous threads (https://www.stata.com/statalist/arch.../msg00875.html) although my needs are slightly different and I have had trouble adapting this, and would appreciate any help.

My cohorts have 3 different ages at follow up (Cohort 1: 70, Cohort 2: 58, and Cohort 3: 43 ) therefore each survival curve would require a different length, and setset. I've stripped the code down to the bare minimum needed to translate what I need:

Code:
stset exitdate70, failure(dead70) origin(entrydate) enter(age_26) id(pid) scale(4) if cohort==1 & cc1==1

sts graph if cohort==1 & cc1==1, by(housing) 
                         
stset exitdate58, failure(dead58) origin(entrydate) enter(age_26) id(pid) scale(4) if cohort==2 & cc2==1 

sts graph if cohort==2 & cc2==1, by(housing)  

stset exitdate43, failure(dead43) origin(entrydate) enter(age_26) id(pid) scale(4) if cohort==3 & cc3==1  

sts graph if cohort==3 & cc3==1, by(housing)
Thank you.

mediation package: medsens command and why it does not function

$
0
0
Dear All,

I am at the moment using the mediaton package (Hicks and Tingley, 2011) to conduct a causal mediation analysis. I need to conduct a sensitivity test with the command medsens following the medeff command for causal mediation analysis. However, for some strange reason, Stata 14 does not give me the full results, with this following error message:

"You must have moremata installed to run this program
net install moremata.pkg
r(198);

end of do-file

r(198);""

In fact, I do have moremata installed, as if I do run the command "net install moremata.pkg", it gives me the following message:
checking moremata consistency and verifying not already installed...
all files already exist and are up to date.


I would be so grateful if you could tell me how to solve this problem.

Thanks a million beforehand.

Best,

Marie

Quantile regression t-stata insignificant

$
0
0
Hi Joao Santos Silva Clyde Schechter

I am running qreg for a panel of 372 firms with a monthly time observation of 237 months each. This makes a total of 88,164 observations in my data. I am focusing on 99 and 95 percentiles of the data. All my t stats and p values are insignificant, although there is a strong relationship between the variables that exist. Any Idea what is going wrong and how this issue can be addressed or resolved?

Best Regards

Export either table or tabdisp as latex file

$
0
0
I am using:


gen ycc_table=.
replace ycc_table=year if evnt3w==1


by country ( ycc_table ), sort: gen ycc_count =_n if ycc_table!=.
tabdisp country ycc_count, cell(ycc_table)


or just:

table country ycc_count, c(mean ycc_table )


which creates the following table:



Array






I want to save this table as .tex file but I doesn't work. Is there a way to save this table as latex file?









Convolutions of two WTP empirical distributions

$
0
0
Dear statalist,

Have anyone coded a covolution to evaluate equality of empirical WTP distributions? I have tried to code it, but since im a new user in stata I'm finding it challenging.

I would appreciate any help.

Greetings.

Survival analysis

$
0
0
Dear all,
I'm restructuring my historical datasets to perform survival analysis. My data is in a non-standard historical dataset format and I would be very grateful if you could help me with it. I have my master dataset that contains information on companies in their year of founding such as their unique identifier, capital, location of headquarters etc. I have information on companies that were found to let's say from the 1800 year to the 1914 year. So it looks like cross-sectional data with DATE column is a founding date of the firm:
1st variable - a unique identifier
2nd variable - DATE as the date of founding in the YEAR-MONTH-DAY format
3d variable - location of headquarter etc.
Then I have 6 separate cross-sectional datasets on these companies in the year 1847, 1869, 1874, 1892, 1905 and 1914. If the company was listed in the year 1847 dataset it means it survived until 1847 and similarly for all years. Year datasets variables partially intersect with master dataset variables since it has the companies unique identifier and some variables such its capital etc. I do not have a dummy variable for survival in either of the datasets. I tried to structure the datasets in survival analysis form and created variable failure which equals 0 for all companies listed in the year 1847,1869,1874 etc. Then I merged year of a founding variable in years-datasets. By comparing the year of founding with 1847,1869, 1874, 1892, 1905 and 1914 years I created year0, year1 per company and then by append I constructed one dataset based on all year-datasets. However, in my dataset, I have for instance one company for which we know when failure equals 0 but we do not know when failure equals 1 since our year-datasets include just survived companies. Basically here is like my dataset looks like now:
Id year0 year1 failure
1 1836 1847 0
1 1847 1869 0
1 1869 1874 0
2 1836 1847 0
2 1847 1869 0
From that, we know that the firm with id 1 was founded in the year 1836 and was listed last time in the year 1874. It means that the firm died somewhere on the interval between 1874 - 1892. The firm with id 2 died somewhere in the interval 1869-1874. My question is how do I add another observation per firm with failure equals 1 since we know for sure if the firm was not listed in one of our year-datasets it fails to survive on the corresponding interval. So what I want to have is that :
Id year0 year1 failure
1 1836 1847 0
1 1847 1869 0
1 1869 1874 0
1 1874 1892 1
2 1836 1847 0
2 1847 1869 0
2 1869 1874 1
where I need to add red observations that tell me when the firm fails to survive.

I'm just starting my career as a researcher and maybe my description of the problem is not clear but feel free to ask any questions. I really appreciate any help or comments.
Thank you!





how to calculate the experience diversity of panels?

$
0
0
Every expert panel has about 5 experts, every expert has different experience such as service, R&D, management. How to calculate the experience diversity of panels? Maybe something like Herfindahl index?

Here is the tips for calculating method
Array
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(expID firmID) byte(exp_state exp_manager exp_RD exp_adm exp_teacher exp_designer exp_lawyer exp_reporter exp_medical exp_engineer exp_foreign exp_service)
137 1 . . . . . . . . . . . .
 70 1 0 1 0 0 0 0 0 0 0 0 0 0
141 1 . . . . . . . . . . . .
 41 1 . . . . . . . . . . . .
 35 1 . . . . . . . . . . . .
  8 2 0 0 1 0 0 0 0 0 0 1 0 0
112 2 . . . . . . . . . . . .
111 2 . . . . . . . . . . . .
187 2 0 1 0 1 0 0 0 0 0 0 0 1
 98 2 . . . . . . . . . . . .
170 3 . . . . . . . . . . . .
254 3 1 1 0 0 0 0 0 1 0 0 0 0
  1 3 . . . . . . . . . . . .
149 3 . . . . . . . . . . . .
174 3 0 0 0 0 0 0 0 0 0 0 0 0
 73 4 . . . . . . . . . . . .
 99 4 . . . . . . . . . . . .
121 4 0 1 1 0 0 0 0 0 0 1 0 0
110 4 . . . . . . . . . . . .
263 4 0 1 0 0 0 0 0 0 0 0 0 0
end

by()-option in twoway graphs: Getting rid of the boc around the title

$
0
0
Dear Statalisters,

I have a rather simple question about formatting graphs, but I couldn'f figure it out myself and using the forum's search function didn't yield the desired results.
So, here it comes:

I am using the by() option in a twoway graph to get seperate graphs by an indicator variable. Apparently, some things related to the overall look of the graph need to be specified within the by()-option, e.g. turning of the default note or getting the background in white (see example below).

Now, I would also like to get rid of the coulored text boxes around the title of the subgraphs (blue boxes aroung "Domestic" and "Foreign").
I tried adding things like title(nobox) or title(fcolor(white) bcolor(white)) inside the by() option but that didn't work out. Adding these in the options of the whole graph did'nt work either (i.e. a line like title(fcolor(white) bcolor(white)) below the legend() option in the example).

Does anybody know how to turn off these boxes?

Thank you in advance and best regards,
Boris


Example Code:
Code:
sysuse auto, clear
twoway (scatter price mpg, by(foreign, note("") graphregion(color(white)))) ///
            (lfit price mpg), ///
            legend(region(lcol(white)))
Example output (blue circles and X's added by hand to stress how badly I need to get rid of these boxes):

odbc load can't find data source

$
0
0
Hi All -

I've been trying to query an ODBC (.accdb) file via Stata 16 on a Mac. (This is for a local file.)

- I've tried to follow these steps: https://www.stata.com/support/faqs/d...figuring-odbc/

- I've set up the data source using both ODBC manager and iODBC Adminstrator64. The latter program will test a connection, and reports that it is working.

- I've looked at the odbc.ini file in the /Library/ folder and set it to resemble the sample at the link above.

- odbc.ini looks like what follows (with filenames and paths removed, but they are correct in the actual file)

Code:
[ODBC Data Sources]
database_name = Actual Access

[database_name]
Driver  = /...path to driver.../ataccess.so
DBQ = /...path to data source .../filename.accdb
CacheResults = No
ConnectionType = Direct
Host = localhost
but
Code:
 odbc list
does not return the data source (this is despite a few restarts).

Is there another way to configure access to the file from within Stata itself?

ksmirnov Test

$
0
0
Dear,

Can I use ksmirnov one sample test for uniform distribution (see below)?

ksmirnov X = uniform()

If not, would you recommend an appropriate test for the comparison of distribution with uniform distribution?

Thank you in advance,




Discrete choice experiment : WTP with effects coding

$
0
0
Hello,
I'm afraid this is not strictly a stata question. I have conducted a Discrete choice experiment. In this choice experiment there are three alternatives in each choice set. Alternatives 1 and 2 are characterized by 2 attributes with 3 levels and a payment attribute. Alternative 3 is fixed "opt out". The design is a D-efficient design.
In order to properly estimate the ASC I have decided to use effects coding. For simplification, I only present information relative to one attribute plus the payment attribute.
For my first attribute Nature that has three levels (L1, L2 and L3), I have created two variables, nature_l1 and nature_l2 . These variables are coded in the following manner:
- nature_l1 takes value 1 if the alternative is Nature L1 , value 0 if the alternative Nature L2 and value -1 if the alternative is Nature L3.
- nature_l2 takes value 0 if the alternative is Nature L1 , value 1 if the alternative Nature L2 and value -1 if the alternative is Nature L3.

I use a mixlogit model to estimate respondent's choice.=:

mixlogit choice, group(choiceset) id(id) nrep(500) rand(nature_l1 nature_L2 ASC Payment)

I want to estimate the WTP in preference space first. I wonder how I should do this?
For example, if I want to estimate the marginal WTP between Nature L1 and Nature L3. Let's consider that b1 and b2 are the respective mean values of the coefficients of nature_l1 and nature_l2 and b3 the coefficient of the payment attribute in the mixlogit results.
My impression is that I should calculate this marginal WTP with the following formula: (2b1+b2)/b3.
Is this the correct approach?
How could this be estimated in wtp space?

Thanks
Best,

Keep a group of observations if one record meets a certain condition

$
0
0
Hi All,

Here's an example of the sort of dataset that I'm working with:

clear
input studyid edvisits frequent_user
1 0 1
1 2 1
1 1 1
1 0 1
1 0 1
2 0 0
2 0 0
2 0 0
2 0 0
2 0 0
3 1 1
3 0 1
3 2 1
3 0 1
3 0 1
4 1 0
4 0 0
4 0 0
4 0 0
4 1 0
end

I am trying to subset out the data to create 3 separate datasets: people with no ed visits, people with few ed visits, and people with frequent ed visits.

For example, the "few ed visits" dataset would contain all the records for the person with studyid #4 and the the "frequent ed visits" dataset would contain
all records for studyid # 1 and 3

By looking around the forums I think the code should look something like this:
*no edvisits*
bysort studyid (edvisits) : drop if edvisits[1] > 0

*few edvisits*
bysort studyid (edvisits) : drop if edvisits[1] == 0
bysort studyid (frequent_user) : drop if frequent_user == 1

*frequent edvisits*
bysort studyid (edvisits) : drop if edvisits[1] == 0
bysort studyid (frequent_user) : keep if frequent_user == 1

but I don't seem to be having any success so I'm clearly missing something.

think i've gotten close to the answer with these these two posts, but like I said, no luck so far. Any help would be much appreciated: https://www.stata.com/statalist/archive/2005-08/msg00361.html
https://www.statalist.org/forums/forum/general-stata-discussion/general/1396103-drop-whole-group-of-observations-if-one-fulfils-condition

Tabout and Fre suddenly not working?

$
0
0
For some reason when I export frequency tables to excel and/or word (as .csv or .rtf files), they aren't formatting into neat columns and rows properly. I'm re-running code that has worked before, so I'm quite puzzled. I've tried using -fre- and -tabout- and gotten the same results. Strangely -esttab- is still working fine. Does anybody have any ideas?

Break in consecutive values to identify groups

$
0
0
Suppose I have a dataset like this:

clear
set obs 20
gen x = 1
gen id = 1 in 1/10
replace id = 2 in 11/20
replace x = x + 5 in 6/10
replace x = x + 5 in 16/20
As you can see, there is a break in the x variable. I would like to group observations based on a break, i.e., have groups id11 and id12 for id = 1 with the break of the value as the delimiter of the variable.

This is just an example; more generally, I have a dataset with minute-level data that crosses the day delimiter, and I would like to identify a night (e.g. starting at 10 PM and ending at 6 AM). Since I have minute-level data, there are breaks where on the same date the difference between times is higher than 1 minute (e.g. jumping from 6AM to 11PM - the start of a new night). I thought of using these jumps to identify the nights since dates as groups, in this case, are useless (because they encompass both the beginning of the next night and the end of the previous one).

Any ideas are appreciated.

Multilevel regresion code doubt

$
0
0
Hi all,

I have date of 100 employees per company, for several companies. I would like to perform a Multilevel regresion where Dependent variable is at company level, independent variables are at employee/individual level, and control variable are at both levels, that is:

- Dependent: Firm innovation level (inn)
- Independent: Employee's productivity (Empprod), Employee creativity (empcrea)
- Controls: Firm financial performance (ROA), firm size (size), Employee tenure (emptenure), employee level of studies (empstud)

All my variables are CONTINOUS, so I performed:

xtmixed inn emprod empcrea roa size emptenure empstud, mle

My questions are:

- Why the programm did not work if I set the syntax || company ? Am I okey using this methodology for this data or should I use another?
- How can I obtain post stimation such as R-Squared or sd for both levels ?

I would be very thanks if someone could she light on my problem

Meta Command for prevalences

$
0
0
I'm trying to run a meta-analysis using the metan command for a rate of the progression outcome variable (events/100,000 people/year). I'd need help with the Stata syntax.
What is the correct command and what calculations I need to do beforehand?
Thank you so much,

Find attrition at personal level

$
0
0
Dear Stata users,
I am using an unbalanced panel dataset. Here is an example of my dataset.
hhid pid year attrition
1 1 2000 Successful
1 1 2001 Successful
1 . 2002 moved out of country
1 1 2003 Successful
2 2 2001 Successful
2 2 2002 Successful
hhid is household id, pid personal id. I only have information of survey attrition (if the interview was successful or not) at the household level. What I want to create is a variable that indicates me that the individual in the next year will "move out of the country", or that, in this case, the individual 1 in 2002 is not in the country. Beware that an individual can leave the country and return, and continue to participate in the survey in 2003 (but does not participate in the survey when it is out of the country). Could any one help me?
I am using Stata 14.

Simple question on asdoc and label lengths

$
0
0
Hi all,

I am using asdoc to export some simple summary stats to word, using labels as variable names - When they export, the end of the label is cut off. I have tried to use all the different options to extend it, but I think the length of labels must be limited somehow. In any case, I am generating a fairly large number of summary stats and it would be incredibly helpful to be able to use the full label if possible.

Any help would be much appreciated!

Thanks all!!

Variable labels and frmttable

$
0
0
I am using frmttable to generate a summary table.
As far as I can see the option varlabels should automatically add variable labels in the first column of the table. Am I doing this the wrong way? Code below.


Code:
cls
clear all

set obs 1000
ge u = rnormal()
label var u "Some label"

su u
mat a = r(mean),r(sd)

frmttable, statmat(a) varlabels

Baseline hazard function for Piecewise constant model

$
0
0
I need to plot baseline hazard functions for a piecewise constant model. The problem I find is that the hazard needs to be constant in each interval, and then have discrete changes between intervals, which I am not able to achieve.

The code I am using is the following:

streg e1 e2 e3 e4 log_trabalhadores innovator1 exporter1 i. ano i.region, dist(exp) nolog vce(cluster NPC_FIC) nocons

mat b = e(b)
mat list b
scalar n = colsof(b) - 1
scalar list n
mat b =b[1,2...n]
mat score xb = b

ge h=exp(_b[e1]) if age==1
replace h=exp(_b[e2]) if (age>1 & age<=4)
replace h=exp(_b[e3]) if (age>=5 & age<=6)
replace h=exp(_b[e4]) if age>=7



twoway(connect h age, sort )

However, when I do obtain the graph, the "jumps" in between the 4 different periods I have defined are not discrete. How can I make it so that they are? Am I doing something fundamentally wrong?
Viewing all 72800 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>