Quantcast
Channel: Statalist
Viewing all 72814 articles
Browse latest View live

how to calculate dependency ratio in STATA

$
0
0
Please let me know how to calculate dependency ration in STATA.. is the command below correct

bys ScheduleNo: egen Age_Working = count(B2Q1) if Age>=15 & Age<=64
bys ScheduleNo: egen count_Age_Working= max(Age_Working)
recode count_Age_Working (.=0)
bys ScheduleNo: egen Age_Dep = count(B2Q1) if Age<=14 | Age>=65
bys ScheduleNo: egen count_Age_Dep= max(Age_Dep)
recode count_Age_Dep(.=0)
gen DRatio = (count_Age_Dep/count_Age_Working)*100
recode DRatio(.=0)

Estimate a part of a data set

$
0
0
Hello!

In the data set I have the variable: "college" = 1 if student went to college, 0 otherwise; and "period" =2 for 2nd period, and =1 for the 1st period; and "degree" =1 for getting high school degree, =0 otherwise; and some other variables relating the characteristic of the student.

So, I want to estimate utility from going to college but conditional on student having obtained the high school degree, and in period 2 only.

I can do it by simply drop out observations of 1st period and students who did not obtained a high school degree. But I do not want to do so, since I still need the whole data set for other regression.

Is there any way to proceed without dropping out observations?

Thank you,

Estimate impact of a dummy variable in term of other variables

$
0
0
Hello,

I estimated utility from going to school conditional on some variables including a dummy variable "parent_chool", which =1 if the student has at least one of the parents went to high school; and another variable "distance" which measure the distance from the student's house to school.

Now I need to figure out the impact of not having a parent who went to high school to utility in term of meters.

First, I would like to ask how to compute the impact of not having a parent who went to high school? Is this just the parameter estimated for "parent_school"

Second, how could I find that impact in term of meters? Is this equal to the ratio of the parameter estimated for "parent_school" and for "distance"

Thank you,

Intraday Data (1 Minute data)

$
0
0
Hello,
I appreciate if you could help me on formatting 1-minute data.
I have datetime variable as “2005-01-01T00:00:00.000000000Z”. As a first step I run the following commands

gen str date = substr(datetime, 1, 10)
gen hour = real(substr(datetime,12,2))
gen min = real(substr(datetime,15,2))
gen year= real(substr(datetime,1,4))
gen month= real(substr(datetime,6,2))
gen day= real(substr(datetime,9,2))
gen mdy=mdy(month,day,year)
format mdy %td
gen mytime=Cmdyhms(month,day,year,hour,min,0)
format mytime %tc

However, when I tsset my dataset by “mytime”, I got the error as “repeated time values in sample”. I couldn’t figure out the accurate format for seconds I guess.

Pinar

Testing difference between BLUPS after mixed

$
0
0
Dear Statalisters,


I'm running a two-level empty mixed model, with levels individuals, city and state. The dependent variable being satisfaction with city services.

Code:
mixed satisfaction   || state: || city:
After that I'm retrieving BLUPs and standard errors for the city level:


Code:
predict eb*, reffects relevel(city)  
gen beta0 = _b[_cons] + eb1
predict eb_se*, reses relevel(city)  
gen eb_se1_l = beta0 - 1.96*eb_se1  //ci
gen eb_se1_u = beta0 + 1.96*eb_se1

What's the best way to test the statistical difference between spesific citities? Is it possible to use OLS, like this:

Code:
regress beta0 i.city
margins city, pwcompare(group sort)

Looking forward to your comments.

Creating variable from searching multiple strings

$
0
0
Hi there - probably an easy question for most of you, but I'm new to STATA and can't find the answer to this.
I am trying to create a new variable "ethnicity_Hisp" by searching seven other variables (ethnicity1; ethnicity2; etc) for the string "Hispanic or Latino".
I've tried to do this a couple different ways - shortened to just two variables here for simplicity:
gen variable ethnicity_Hisp=1 if ethnicity1 == "Hispanic or Latino" | ethnicity2 == "Hispanic or Latino"
gen variable ethnicity_Hisp=1 if strpos(ethnicity1, "Hispanic or Latino) | strpos(ethnicity2, "Hispanic or Latino"

I keep getting back "too many variables"
Is there a way to do this without a multistep code where I generate the variable and then replace with 1 for each of the ethnicity variables?

Thanks!

What does -teffects ipwra- actually do?

$
0
0
Dear Statalist

I am trying to figure out to understand what -teffects ipwra- actually does.

I understand that -teffects ra-

Code:
webuse cattaneo2, clear

// Regression adjustment
teffects ra (bweight mmarried prenatal1 fbaby medu) (mbsmoke)
does the following

Code:
// -teffects ra- using -regress-
qui regress bweight mbsmoke##(mmarried prenatal1 fbaby c.medu)
margins r.mbsmoke
and -teffects ipw-

Code:
// Inverse probability of treatment weighting
teffects ipw (bweight) (mbsmoke mmarried prenatal1 fbaby medu, logit)
does this:

Code:
// -teffects ipw- using -regress-
logit mbsmoke mmarried prenatal1 fbaby medu
predict probab
generate w = 1/probab       if mbsmoke == 1
replace  w = 1/(1 - probab) if mbsmoke == 0
regress bweight mbsmoke [pw = w]
I would now assume that -teffects ipwra- does a combination of the two,
but whatever I tried with -regress- didn't lead me to the -teffects ipwra-
result (and I didn't find/understand the page in the documentation explaining
what -teffects ipwra- actually does).

Can anybody help?

Thanks for your consideration
KS

Descriptive Statistics per Country - STATA to Work-Output

$
0
0
Hi,

I have started to work on my data do-file for my thesis I managed to run the commands I need and export the regression tables to word. However, I struggle with the Word-output for the descriptives statistics per country. I want to build a table like the example below. However, I haven't succeeded. Either the word-output is completely empty, the countries are next to each other rather than in rows, and so on ...
Country Absolute Number of Observations per Country Relative Number of Observations per Country Variable 1 Variable 1 Variable 2 Variable 3 Variable 4
Mean SD Median Median Median
AT 20 20%
AU 55 55%
BE 15 15%
BG 20 20%
Total 100 100%
Variable 1 is main dependent variable which is on firm-level. I'd like to show mean and sd for this one. Variable 2, 3, and 4 are country-level variables which hardly change (that's why I use the median).

In case it is not possible to do mean, sd and median in the same table, but for different variables. I would also appreciate two separate tables (one with variable 1, one with variable 2, 3, and 4)

Here is some code I have tried so far.
Code:
eststo clear
estpost tabstat VARIABLES, by(GGISO) statistics(count mean sd) columns(statistics)
esttab using Summary.rtf, ///
        append ///        
        noobs ///
        unstack ///
        nonumbers ///
        nolz ///
        nomtitle ///
        coeflabel(VARIABLES and LABELS) ///
        b(%9.2f) ///
        title ("Summary Statistics - Country")
eststo clear
Thank you very much!

The trace plot creation after data imputation

$
0
0
Hello,

I struggle with creating trace plots after data imputation.

Imputation code has the following structure:

Code:
mi impute chained variables, add(100) rseed (54321) savetrace(trace1, replace) augment

Code for generating trace plots:

Code:
use trace1,clear
describe

reshape wide *mean *sd, i(iter) j(m)
tsset iter
tsline SES_mean*, name(mice1,replace)legend(off) ytitle("Mean of SES")
tsline SES_sd*, name(mice2, replace) legend(off) ytitle("SD of SES")
graph combine mice1 mice2, xcommon cols(1) title(Trace plots of summaries of imputed values)
Every time when STATA gets to tsline the following error occurs:

Code:
too many variables specified
r(103);
Describe statistics includes the following information (in case it is relevant for resolving the issue):

Code:
Contains data from trace1.dta
  obs:         1,276                          
 vars:            38                         
 size:       193,952 

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                     1276   ->      11
Number of variables                  38   ->    4177

Setting maxvar to 32767 did not help. My STATA version is 13 SE. Is it really the case that too many variables are specified and therefore I can not proceed or is there a solution? Thank you in advance for your answers.

Print conditional statement

$
0
0
I wrote this code to get Stata to print a conditional statement:

Code:
if p_value < alpha ///
         {
          di "Reject null"
         }
else     {
          di "Do not reject null"
         }
but it prints this:

Array

Is there a way to get Stata to run this "quietly" so that the only output on the screen is either "Reject null" or "Do not reject null"?

PCA for repeated measures

$
0
0
Hello Statalist contributors,

I am looking for a solution to a problem that seems unanswered on this forum as of now even if there are people asking similar questions.

I have 42 blood biomarkers and it is common practice to reduce those to a smaller number of factors to understand the data. I therefore would conduct a PCA on it and go from there but in this particular case, I have two sets of these data on the same participants with 3 months apart. I would like to compare PCA's but of course I cannot simply conduct a PCA at each time point and compare them so the question is, are there known methods for conducting and comparing PCA's from two time points? I have found that there is a program called Multiple Factorial Analysis in R and an article on the subject but sadly I am simply inept in R. Is there something equivalent in Stata?

Any advice would be greatly appreciated.

J-M

Help regarding panel categorical data analysis in Stata

$
0
0

I am recently researching crises and the usefulness of different models in predicting them. It would be nice to have some help from any of you guys regarding Stata using categorical panel data.
I am interested in the following:

1) Any books which explain categorical data in detail - especially how to build a model and predict events

2) Any websites that contain lessons regarding categorical panel data

3) Any websites where I can find academic papers and their databases and do files to replicate them

Thanks in advance.

Counterfactual binary choice model

$
0
0
Hello,

I want to do counterfactual for my binary choice model of schooling decision. Specifically, assume that now only 50% of students are (randomly) allowed to enter school. I want to estimate impact of this situation on the total school enrollment.
I don't know how to put "50% of students are randomly allowed to enter school" into my estimation?

Is there anyone having an idea?

Thanks in advance,

Stata module tstf

$
0
0
Hello,

Do folks have experience using the Stata module tstf to compute intervention time-series models? I am unable to run the commands provided in the help file. If anyone has any insights or advice on this module, please drop your knowledge here. Much appreciated.

Gina

Lincom issue

$
0
0
Hello,
I am a Ph.D. student using Stata for a statistics class. Unfortunately, I am continually having issues with the "lincom command". For example when I run the following command:
"lincom [egalitarianism]Male - [egalitarianism]Female" I continue to get the error equation [egalitarianism] not found r(303). The funny thing is that it is an example from a Stata workbook which should work as the code is correct and exact. Each other lincom examples also give me the same error code, although when my professor used it he was able to do the calculation with no error. Instead of an error, I should receive information concerning the Standard Error, P-Value and other similar information.
I am actually typing it exactly as it is in the book, but each time I run it, I still get the the same error. Is this an issue with Stata itself, or is it an issue with my own data set?
Thank you,
Matthew Gomez

Plotting the decomposition of differences in distribution

$
0
0
Hi everybody

​​​​​​I am a graduate student writing my master thesis on the topic of " welfare differential between rural and urban Areas". I am trying to decompose the gape between urban and rural areas using Oaxaca and blinder decomposition at first and then a quintile regression based decomposition following Machado( I use cdeco command ) .

​​​​​However, my question is, after counterfactual regression is done the paper I am following have plotted the decomposition of differences in distribution as the image I have attached in this post. I have no idea how to do this graph. Could not find online any code to use . I am also new in stata . Please help!

Convert a string variable

$
0
0
Hi,

I have two data set. First one contains (full) university name variable (about 7100 univ and colleges) and zip code variable. The name in this file is full names like Arizona State University. Second one contains university name variable (about 600 universities) and other variables. I want to match these two data using university name. I will use the second data after adding zip code variable. But the university names in the second data is shortened like this: Arizona State U. or Babson C. Due to this, Stata completely failed match two files. I used "match" command.

So, my question is how to convert Arizona State U. or Babson C. in the strings into full university or college names in the second data file.

Thank you in advance.

LaTeX quirk?

$
0
0
Trying \ref{mylabel}, which previews fine but appears to have turned to ??? when posting. It works if you use CODE tags

Code:
\ref{mylabel}
I wonder if it works when we escape the backslash \\ref{mylabel}?

How to remove the left blank margin in a plot

$
0
0
I want to make a simple plot representing a consumer's consumer surplus in Stata. By default, Stata leaves some blank margin in the left, but this makes the graph quite strange (the triangle is not closed).I searched Statalist and tried many ways to remove it, including using xscale and plotregion(margin(zero)), but none of them worked. Could anyone tell me how to achieve this (removing the blank between AB and the y axis in the attached graph)? Many thanks in advance.

The code I wrote is as follows:
clear
Code:
set obs 100
generate q=_n-1
generate p=50
generate cons=100-q
twoway (line cons q) (line p q, lpattern(dash)), xline(25, lpattern(dash)) xtitle("Consumption") ytitle("Price") text(51 51 "O", place(nw)) text(50 1 "A", place(s)) text(100 1 "B", place(n)) xlabel(25 "Garbage") ylabel(50 "p" 100 "a") title("Free Disposal of Garbage") xscale(range(0 105)) legend(off) graphregion(margin(zero))

Matsize too small.

$
0
0
Dear All,

I want to initiate an empty column vector with #rows= number of observations. I permanently set the matsize to the maximum, but still get an error saying that the matzise is too small.
Code:
.
.         *initialize the residual variable
.         gen n=_n

.         egen maxn=max(n)

.         scalar nobs=maxn

.         set matsize 11000, permanently
(set matsize preference recorded)

.         matrix e2 = J(nobs,1,.)
matsize too small
    You have attempted to create a matrix with too many rows or columns or attempted to
    fit a model with too many variables.  You need to increase matsize; it is currently
    11000.  Use set matsize; see help matsize.

    If you are using factor variables and included an interaction that has lots of
    missing cells, either increase matsize or set emptycells drop to reduce the
    required matrix size; see help set emptycells.

    If you are using factor variables, you might have accidentally treated a continuous
    variable as a categorical, resulting in lots of categories.  Use the c. operator on
    such variables.
r(908);

end of do-file

r(908);
What am I doing wrong?

Thanks,
Sumedha.
Viewing all 72814 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>