summarize command

September 16, 2018, 2:52 pm

When i use sum for three variables, there summaries of means of them. If i want quantiles at 0.95, 0.75, 0.5, 0.25, and 0.05 listing after the mean, could i use sum? or other commands?

↧

Generate

September 16, 2018, 7:56 pm

≫ Next: assumption of normality

≪ Previous: summarize command

Hello guys, I am really really new using Stata, and I have a question. I have a huge database, and I want to generate a new variable using three existing variables. This new variable will help me to get the prevalence of Chronic lung disease using three existing variables (emphysema, COPD and Cronic bronchitis); however, people could respond yes to emphysema, COPD and Cronic bronchitis (the three variables). I have to generate these new variables but I do not want to “double or triple count” those who responded yes to all three terms.

Thank you in advance.

↧

assumption of normality

September 16, 2018, 8:11 pm

≫ Next: Creating binary variable defined on existing variable

≪ Previous: Generate

Hello,
I would like to know if by using command xtreg in panel data I need assumption of normality of the sample.
Thank you
Erasmo

↧

Creating binary variable defined on existing variable

September 16, 2018, 10:37 pm

≫ Next: Seeking right method to compute confidence interval

≪ Previous: assumption of normality

Hi,

So, this is probably really simple but I can't seem to figure it out: I'm trying to create a binary variable. The binary variable will be based on another variable, a variable with values between 0-4 (this is social scientific experimental data so the data points are individual answers). I want everyone who has a 2 or higher to be a "1" in the binary variable, and everyone below 2 should be 0. How do I do this?

Thanks!

↧

Seeking right method to compute confidence interval

September 16, 2018, 11:17 pm

≫ Next: Regression Discontinuity Graph

≪ Previous: Creating binary variable defined on existing variable

Hi,
I have some anthropometric measurements till 12 months
i would like to compute difference in mean growth 0-6 months in pahse1 and phase2 separately for male and female
can any one guide me how i can proceed to get the desire result

clear
input str6 study double month float gender long counts double avg_weight
"Phase1" 0 1 195 2.7541025641025643
"Phase1" 0 2 185 2.716378378378378
"Phase1" 6 1 195 5.945502645502645
"Phase1" 6 2 185 5.48494318181818
"Phase2" 0 1 220 2.5724409090909104
"Phase2" 0 2 144 2.5227708333333334
"Phase2" 6 1 260 5.716719230769237
"Phase2" 6 2 167 5.325449101796409
end
label values gender s1q118
label def s1q118 1 "Male", modify
label def s1q118 2 "Female", modify
[/CODE]

↧

Regression Discontinuity Graph

September 17, 2018, 2:24 am

≫ Next: Replacing missing values of a given variables a large merged dataset

≪ Previous: Seeking right method to compute confidence interval

Hi,

I am working in stata 15 and am trying to use a twoway lpoly command to come up with a regression discontinuity graph. My running variable is age and the outcome variable is Trust levels which is a categorical variable. I set a cutoff of 33 years of age. When I ran both the lpoly and rdplot commands the output I get from the plotted graph seems to be having problems with scale especially in the x axis of my graph. May you please help me on how to fix this and get a better graph.

Here are the two codes I am using for twoway lpoly graph and rdplots respectively:

twoway lpoly Trust1 agefromthresh if agefromthres<0 || lpoly Trust1 agefromthresh if agefromthres>=0
rdplot Trust1 agefromthresh

I have attached the two graphs which I get from running the commands above.

↧

Replacing missing values of a given variables a large merged dataset

September 17, 2018, 2:43 am

≫ Next: Generate different possible combinations

≪ Previous: Regression Discontinuity Graph

Hi all,
I have data that was collected in a household survey about individual children (age, Sex, Malaria e.t.c) and household characteristics (household size (hh_size), water source(WAT_SO) and access to sanitary facility e.t.c). The household data is the same for all children in a given household and is uniquely identified by the question number (qno) but is often entered for one child creating missing values for the other children in the same household. I need to fill in the missing values but have to identify the specific household in a merged data set depending on the survey (season_year), the livelihood (livelihood), the region (region), the district (district) and cluster (cluster) where it is found. Getting to the household questionnaire would be sorting by season_ year then livelihood then region then district then cluster to qno. I looked at similar posts/questions but I did not find one where the sorting has to go through several layers and for multiple variables. I tried to attach a sample of my dataset as a dta but the file was rejected - not sure how to do this otherwise.

Thanks for your help

Mona

↧

Generate different possible combinations

September 17, 2018, 3:15 am

≫ Next: How to obtain different font sizes in the same line of a graph title?

≪ Previous: Replacing missing values of a given variables a large merged dataset

Hi,

I would like to generate multiple groups of observations for each different combination of a group of numbers. For example:

var1 group order
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6
7 1 7

and would like to get the following:

var1 group order
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6
7 1 7
1 2 1
2 2 2
3 2 3
4 2 4
5 2 5
7 2 6
6 2 7
1 3 1
2 3 2
3 3 3
7 3 4
4 3 5
5 3 6
6 3 7

etc..

for all potential combinations of the numbers 1 to 7, so a total of 7!=5040 different groups. Any help would be much appreciated.

Thank you

↧

How to obtain different font sizes in the same line of a graph title?

September 17, 2018, 3:55 am

≫ Next: Percentage of total by category

≪ Previous: Generate different possible combinations

dear Statalisters,

My aim is to produce a graph title with different font sizes, something like this:

Age=65 years and eGFR=30 mL/min per 1.73m²

because the unit for eGFR is so long that it takes half the space if I don't reduce its fontsize. The problem is that I cannot split the title between the number and the unit measure, it would be very awkward to read.

I guess the solution would be to use some SMCL trick, but I haven't found anything helpful in the Stata manual for my problem

thank you for your attention

Dino

↧

Percentage of total by category

September 17, 2018, 3:57 am

≫ Next: New version of somersd on SSC

≪ Previous: How to obtain different font sizes in the same line of a graph title?

Dear Statalists,

I have a simple problem that I cannot seem to figure out. I have three variables: X, Y and Z, where X is a dummy variable (0-1), Y is a categorical variable with four values and Z is a variable capturing time (years).

I would like to visualise, out of the total of X for each value of Z (year), how much goes to each category in Y.

In other words, I would like to plot the results of the following tabulation for each of the values in Z:

Code:

 tab Y X if Z==`i', col

An example:

Y	X =0	X=1
1	30	10
2	30	20
3	5	40
4	35	30
Total	100%	100%

I have tried with catplot and collapsing but I can get to is the results of the following tabulation:

Code:

 tab Y X if Z==1, row

This seems relatively easy so I might be missing something really basic.

Many thanks in advanced.
Guillem

↧

New version of somersd on SSC

September 17, 2018, 3:59 am

≫ Next: multivariate binary data on stata

≪ Previous: Percentage of total by category

Thanks as always to Kit Baum, a new version of the somersd package is now available to download from SSC. In Stata, use the ssc command to do this, or adoupdate if you already have an old version of somersd.

The somersd package is described as below on my website, and computes confidence intervals for a range of rank statistics, with the options of clustering and/or sampling-probability weighting. The new version contains an improved version of the Mata function tidottree(), which uses a search tree algorithm to compute jackknife pseudovalues for these rank statistics. The new tidottree() uses the quadsum() function to improve precision when adding tiny sums of weights to huge sums of weights, which can lead to loss of precision in really big datasets.

Best wishes

Roger

-------------------------------------------------------------------------------------
package somersd from http://www.rogernewsonresources.org.uk/stata12
-------------------------------------------------------------------------------------

TITLE
somersd: Kendall's tau-a, Somers' D and percentile slopes

DESCRIPTION/AUTHOR(S)
The somersd package contains the programs somersd, censlope and cendif,
which calculate confidence intervals for a range of parameters behind
rank or "nonparametric" statistics. somersd calculates confidence
intervals for generalized Kendall's tau-a or Somers' D parameters,
and stores the estimates and their covariance matrix as estimation results.
It can be used on left-censored, right-censored, clustered and/or
stratified data. censlope is an extended version of somersd, which also
calculates confidence limits for the generalized Theil-Sen median slopes
(or other percentile slopes) corresponding to the version of Somers' D
or Kendall's tau-a estimated. cendif is an easy-to-use program to
calculate confidence intervals for Hodges-Lehmann median differences
(or other percentile differences) between two groups. The somersd package
can be used to calculate confidence intervals for a wide range of
rank-based parameters, which are special cases of Kendall's tau-a,
Somers' D or percentile slopes. These parameters include differences
between proportions, Harrell's c index, areas under receiver operating
characteristic (ROC) curves, differences between Harrell's c indices or
ROC areas, Gini coefficients, population attributable risks, median
differences, ratios, slopes and per-unit ratios, and the parameters
behind the sign test and the Wilcoxon-Mann-Whitney or Breslow-Gehan
ranksum tests. Full documentation of the programs (including methods and
formulas) can be found in the manual files somersd.pdf, censlope.pdf and
cendif.pdf, which can be viewed using the Adobe Acrobat Reader.

Author: Roger Newson
Distribution-date: 16september2018
Stata-version: 12.1

INSTALLATION FILES (click here to install)
cendif.ado
censlope.ado
somers_p.ado
somersd.ado
_bcsf_bisect.mata
_bcsf_bracketing.mata
_bcsf_regula.mata
_bcsf_ridders.mata
_blncdtree.mata
_somdtransf.mata
_u2jackpseud.mata
_v2jackpseud.mata
blncdtree.mata
tidot.mata
tidottree.mata
lsomersd.mlib
cendif.sthlp
censlope.sthlp
censlope_iteration.sthlp
mf_bcsf_bracketing.sthlp
mf_blncdtree.sthlp
mf_somdtransf.sthlp
mf_u2jackpseud.sthlp
somersd.sthlp
somersd_mata.sthlp

ANCILLARY FILES (click here to get)
cendif.pdf
censlope.pdf
somersd.pdf
-------------------------------------------------------------------------------------
(click here to return to the previous screen)

↧

multivariate binary data on stata

September 17, 2018, 4:10 am

≫ Next: how to hide zero percent in blabel option of graph bar

≪ Previous: New version of somersd on SSC

Hi, i'm completely new to stata, i have a binary data set, with the dependent variable being intimate partner violence( 1 if yes , 0 otherwise ) and my independent variables are an education dummy ( no education, primary, secondary and tertiary) , employment status ( 1 if yes , 0 if not employed ), type of employment( seasonal , occasional and permanent) . I wish to do the pca , and some descriptive analysis like bar graphs . what are the commands ?

↧

how to hide zero percent in blabel option of graph bar

September 17, 2018, 4:54 am

≫ Next: Package control: Packages over-writing other packages

≪ Previous: multivariate binary data on stata

Dear Stata Users,

I have a question as to option for labeling bars. When I graph bar charts of grouped data (percents of different activity of three business types), I want to add a label on each bar. However, there are some zero percent in my data, and the -blabel()- option will display them as same as non-zero values. And my question is how to hide zero percents in corresponding bars.

Code:

graph bar v2 v3 v4 v5, over(v1) stack blabel(bar, format(%9.1f) posi(center)) nofill legend(row(1) order(1 "None" 2 "One" 3 "Two" 4 "Three"))

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 v1 float(v2 v3 v4 v5)
"business1" 31.47 68.53    0   0
"business2" 67.33 29.53 3.13   0
"business3" 78.93   7.4 13.2 .47
end

Array

↧

Package control: Packages over-writing other packages

September 17, 2018, 5:22 am

≫ Next: Fixed effects and dynamic ordinary least squares

≪ Previous: how to hide zero percent in blabel option of graph bar

Hi there

I have a more general question that I haven't been able to find the answer for. How does one specify which command should be used, if multiple commands from different user-written packages share the same name?

This problem occur when I try to run a rdplot from the rdrobust package. Stata then runs the rdplot as programmed in the rdplot package.

Code:

. which rdplot
C:\Program Files (x86)\Stata15\ado\base\r\rdplot.ado
*!version 7.5.1  2018-07-05

How do I make rdrobust the default package?

↧

Fixed effects and dynamic ordinary least squares

September 17, 2018, 6:46 am

≫ Next: Issue with syntax

≪ Previous: Package control: Packages over-writing other packages

Dear all,

I am currently working on a research topic related on the environmental disaggregate renewable energy effectsts.
I am estimating some regression models with the objective of evaluating the effectiveness of renewable energy diffusion on the ghg emission.

The models are:

Yit= B0 + B1 RES+ B2 FOSSIL+ uit

Where: Res= share of renewable energy production; Fossil= share of fossil energy production; E= ghg emission

I would like to know what is the differece beetween fixed effects and DOLS (dynamic ordinary least squares).
I have 21 years and 28 countries.

I would very much appreciate some thoughts on this problem.
Thanks in advance!

Matteo

↧

Issue with syntax

September 17, 2018, 6:49 am

≫ Next: Time dummies and time trend simultaneously

≪ Previous: Fixed effects and dynamic ordinary least squares

Dear Statalist,

I am starting to use panel data for the first time and I wanted to create an easy variable that computes the gap between two groups (high and low educated people) in the time they spend in X activity. The structure of my data is as follows:

Code:

    ID       wave    time         moth_degree
61105964    2    13.25            1
61105964    3    16.375         1
61105964    4    33.71667     1
61105964    5    7.758338    1
61105964    6    5.833334    1
61105966    2    3.5             1
61105966    4    7                1
61105966    5    16.33334    1
61105966    6    10.20834    1

What I would like to have is a variable that calculates the gap in time spent (time) between those children (ID) that have a mother with degree (moth_degree==1) and those having mothers without degree (moth_degree==0)?? I know it is an easy task but after try many options I could not find the way!

Many many thanks!
Best

↧

Time dummies and time trend simultaneously

September 17, 2018, 8:09 am

≫ Next: Histogram discrete is not discrete

≪ Previous: Issue with syntax

Hello everyone,

I would like to ask if it makes any sense to include time dummies and a time trend in the same specification model .

I am running a panel data regression with macro economic variables. Therefore I would like to include time dummies, but some variables show a clear linear trend in the long term. Therefore I would like to include a trend for de-trending my data. Therefore my data include something like this:

Panel identifier	year	trend
1	2001	1
1	2002	2
1	2003	3
1	2004	4
1	2005	5
2	2001	1
2	2002	2

and also the time dummies which take the value 1 or 0 as usual. I do not know if the model is sensible if I include both (time dummies and a time trend at the same time).

Thanks in advance.

Regards

↧

Histogram discrete is not discrete

September 17, 2018, 11:54 am

≫ Next: Different confidence intervals for linear regression

≪ Previous: Time dummies and time trend simultaneously

This is driving me bananas.
I have a set of discrete data spanning 1-30.
I want a histogram with bins from 1-30.
When I type:

histogram hday if hday<=30, discrete

I get a histogram that lumps together the values of 1 (the min) and 2.
No amount of tampering with the starting values, the bin width, the bin number, etc has been able to solve this problem for me?
What is broken under the hood?

Thanks

↧

Different confidence intervals for linear regression

September 17, 2018, 2:48 pm

≫ Next: Generate mean of a variable for each level of another variable

≪ Previous: Histogram discrete is not discrete

I have run a linear regression which gives a 95% CI. I would like to calculate the 80% and 99% CI for β₁but can't find a command other than that which gives me a generalized CI for the X variable. Is there a command/menu option for me to calculate that? Is there also a way to show a picture of the probability distribution for each calculation?

Thanks

↧

Generate mean of a variable for each level of another variable

September 17, 2018, 3:40 pm

≫ Next: IPTW Cox regression following MI

≪ Previous: Different confidence intervals for linear regression

Hi there,
I want to create several variables that store the mean of other "mother" variables (trunk and displacement) for different values of an index variable (rep 78). Then I want to estimate the difference between means, also for the different levels of the index value. I used the following code:

Code:

sysuse auto, clear
drop if missing(rep78)
levelsof rep78,local(levels)
foreach l of local levels { 
summarize trunk displacement if rep78 == `l'
egen disp_`l'= mean(displacement) if rep78 == `l'
egen trunk_`l'= mean(trunk) if rep78 == `l'
gen dif_`l'= disp_`l' - trunk_`l'
di dif_`l'
}

As you can see, the output displays the summaries of trunk and displacement for the different values of rep78. But it only displays the means difference for rep78=3. Why?

↧