Stata Econometrics Winter School and Stata Conference, Porto, 20-24 January 2020

December 20, 2019, 7:43 am

≫ Next: If macro equals multiple values or range

≪ Previous: Group of variables that explain most of the observations

Stata Econometrics Winter School, Porto, Portugal 20-24^th January 2020

The sixth annual Stata Econometrics Winter School runs in Oporto, Portugal between 20-24 January 2020.
This series of one-day short courses are jointly organised with the Faculdade de Economia da Universidade do Porto. The School aims to provide the full set of tools and techniques that any modern applied economist needs to know. Participants will learn the techniques properly using Stata statistical software.
The courses that comprise the 2020 Stata Winter School are:

Day 1: Introduction to Stata
Day 2: Data Analysis, Linear Regression, and Spatial Econometrics
Day 3: Linear Panel Data Models
Day 4: High-dimensional fixed-effects & Managing Output Files
Day 5: Introduction to programming in Stata

Who Should Attend?

Academic Staff, Masters / PHD students and professionals that need to analyse data. The courses aim to offer an effective way to reach an advanced level of econometric analysis. Therefore, in order to get the most out of the course, basic knowledge of statistics and econometrics is required.
All courses will be delivered in English.
To find out more information please visit: https://www.timberlake.co.uk/courses...hool-pt20.html

I will be offering the Day 5 coverage of Stata programming.

KIt

↧

If macro equals multiple values or range

December 20, 2019, 8:24 am

≫ Next: Loop: Calculate Distance Based on Positional Data

≪ Previous: Stata Econometrics Winter School and Stata Conference, Porto, 20-24 January 2020

I need that Stata will do something if a global is equal to the value 7 or 8 (or in a range). I have stumbled upon Nick's answer (https://www.stata.com/statalist/arch.../msg00810.html) which is in the direction of what I am looking for -

Code:

if strpos("$global", "7") {
 ...
 }

But I do not know how to make this argument a range or more than one number.

↧

Loop: Calculate Distance Based on Positional Data

December 20, 2019, 9:18 am

≫ Next: Generating variables to count, sum, or determine the median of select rows in data set

≪ Previous: If macro equals multiple values or range

Dear Statalist,

I am currently using a dataset that includes positional data of individuals in different buildings. For each individual, I have the x and y coordinates in a given building for a specific point in time. I am now interested in calculating the distance that a given individual covers in a specific building. For simplicity, I am assuming that each individual moves in a straight line.

Each individual starts from the entrance (x = 0 and y = 0). So the distance for the first observation is the length of the vector (x,y):

Code:

gen distance = sqrt((position_x-0)^2+(position_y-0)^2)

However, for the next subsequent lines in the dataset, I need to take the previous position of the individual into account as indicated in the previous row (i.e., the x,y coordinates of the previous row rather than the entrance coordinates). Hence, I am wondering how I could calculate this variable by referencing to the previous line. I posted the desired result in the column "distance" below:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte ind_id double(position_x position_y) str1 building int time double distance
1 612205.1 616313.1 "a" 172  868698.4066392777
1 612033.3 616526.2 "a" 773 273.72769315501506
2 623037.9 297557.4 "b"  56  690446.6895649295
2 618588.2 304253.1 "b" 123  8039.417179124367
2 624784.1 302962.8 "b" 199  6328.826976620569
2 621880.7 306320.6 "c"  37  693230.0592277069
end

I suppose this requires some form of loop. Any help with this problem would be highly appreciated.

Best,

Marvin

↧

Generating variables to count, sum, or determine the median of select rows in data set

December 20, 2019, 10:30 am

≫ Next: Coeeficients for levels of attributes in rologit

≪ Previous: Loop: Calculate Distance Based on Positional Data

I have a data set that has publication information per faculty member and the school they work at. I'm trying to generate a variable of the total faculty per institution and the total publications per institution and I have done it in the most roundabout way. I'm wondering if there is a much simpler way to do such a thing. Below are two cases in which I assigned an ID from 1 to 201 to each institution (because there are that many different schools in my data set), generated a count or total of that ID using egen for the ID# 1, duplicated the generated var and dropped it (so I could generate it again, and created a forvalues loop for the remaining of the IDs.

Code:

egen institution_group = group(institution_name)

egen tot_faculty_id = anyvalue(institution_group), values(1)
egen tot_faculty_temp = count(tot_faculty_id) if ot_faculty_id == 1
gen tot_faculty = tot_faculty_temp
drop tot_faculty_temp tot_faculty_id

forvalues institution_group = 2/201 {
    egen tot_faculty_id = anyvalue(institution_group), values(`institution_group')
    egen tot_faculty_temp = count(tot_faculty_id) if tot_faculty_id == `institution_group'
    replace tot_faculty = tot_faculty_temp if tot_faculty == .
    drop tot_faculty_temp tot_faculty_id    
    }

Code:

egen inst_tot_pub_top50_id = anyvalue(institution_group), values(1)
egen inst_tot_pub_top50_temp = total(tot_pub_top50) if inst_tot_pub_top50_id == 1
gen inst_tot_pub_top50 = inst_tot_pub_top50_temp
drop inst_tot_pub_top50_temp inst_tot_pub_top50_id

forvalues institution_group = 2/201 {
    egen inst_tot_pub_top50_id = anyvalue(institution_group), values(`institution_group')
    egen inst_tot_pub_top50_temp = total(tot_pub_top50) if inst_tot_pub_top50_id == `institution_group'
    replace inst_tot_pub_top50 = inst_tot_pub_top50_temp if inst_tot_pub_top50 == .
    drop inst_tot_pub_top50_temp inst_tot_pub_top50_id    
    }

I hope you are able to follow what I've done. Is there an easier way? I did the same thing with medians, and will have to do the same thing (averages, medians, and totals) for 8 more variables and I would love to find a faster option.

↧

Coeeficients for levels of attributes in rologit

December 20, 2019, 11:10 am

≫ Next: stata2mplus

≪ Previous: Generating variables to count, sum, or determine the median of select rows in data set

Dear colleagues,

I run a rologit command, for my study which is assessing preferences of consumers for attributes of fresh fish. My explanatory variables are selling type, size, source and price. Three of the attributes/explanatory have 3 levels each, coded -1, 1 and 0. I am able to get results from the rologit model, but would want to see the coefficients for other attribute levels for each attribute than just one coefficient as is currently the case. I have not been able to find the specific command to run that will give me coefficients for the other levels of the attributes. Attached pictures on how the variables appear and the results from the rologit command.

Best regards,

Edith

↧

stata2mplus

December 20, 2019, 11:14 am

≫ Next: How to add the title for the second y-axis in graph when -by()- is used

≪ Previous: Coeeficients for levels of attributes in rologit

I have exported a dta to a dat/Mplus file and received no error messages. Furthermore, I’ve inspected both files and found no discrepancies. Yet, when running basic analyses or EFA in Mplus, I am told that one of my categorical variables contains a non-integer value. As I have mentioned, the only non-integer value would be the missing values code, “.”, which is being registered in Mplus as the default -9999. And I have specified how missing value should be ready in M plus.
Has anyone encountered this peculiar error or have any insights as to what might be causing it?

↧

How to add the title for the second y-axis in graph when -by()- is used

December 20, 2019, 1:43 pm

≫ Next: How to plots similar graphs in STATA?

≪ Previous: stata2mplus

Dear Statalist,

I have a problem in adding the title for the second y-axis. If I do not use -by()-, stata will display the title for the extra y-axis. The problem is when I use -by()- and ytitle("name",axis(2)) I cannot get the graph with the title of the extra y-axis displayed.

Data I used:

Code:

 sysuse auto, clear
 sort foreign weight

If I typed,

Code:

twoway line price weight || line length weight, yaxis(2)

, the title of the extra y-axis length can be displayed.

If I however used -by()- and typed,

Code:

twoway line price weight, by(foreign) || line length weight, yaxis(2)

, stata by default suppressed the title for extra y-axis length.

I cannot display the title for y-axis length, even if I followed the instruction of stata documentation and typed

Code:

twoway line price weight, by(foreign) || line length weight, yaxis(2) ytitle("Length", axis(2))

.

If I used r2title option by typing

Code:

twoway line price weight, by(foreign) || line length weight, yaxis(2) r2title("Length")

, I will have title show up in both two plots (foreign and domestic). This is not what I want either.

Thus I wonder whether there is one way to make the left plot have title for the first y-axis and right plot have title for the extra y-axis.

Kind regards,
Yugen

↧

How to plots similar graphs in STATA?

December 20, 2019, 1:45 pm

≫ Next: Post-estimation test for mlogit (comparing overall effect of treatments)

≪ Previous: How to add the title for the second y-axis in graph when -by()- is used

Hello, could any one tell me how to plot the similar graph in STATA?

↧

Post-estimation test for mlogit (comparing overall effect of treatments)

December 20, 2019, 1:45 pm

≫ Next: Json with authentication file

≪ Previous: How to plots similar graphs in STATA?

Hi all, and thanks in advance for any responses.

I'm running an analysis on an experiment in which there are three conditions (control, treatment 1 and treatment 2) and within each treatment, participants can choose 1 of 3 options (choice 1, choice 2, or choice 3).

I'm analyzing the data using mlogit (multinomial regression) and am wondering if there's a way to test whether the overall effect of treatment1 (relative to control) on choice differs from the overall effect of treatment2 (relative to control) on choice.

The basic command is:
mlogit choice_ordered treatment1 treatment2, base(2)

I follow this with

"test treatment1", which gives me the overall effect of treatment 1.

And "test treatment2," which gives me the overall effect of treatment 2.

I thought I could compare these two effects by running
"test treatment1 = treatment2"

But when I run this, STATA changes the command in the output to:
"test [1]treatment1 = [1]treatment2" which tests the effect of treatment1 vs treatment 2 on just Choice 1. This is not the same as testing whether the overall effects differ.

Any ideas on how to do this?

Any and all advice would be deeply appreciated!
Best,
David

↧

Json with authentication file

December 20, 2019, 2:11 pm

≫ Next: Directional symmetry

≪ Previous: Post-estimation test for mlogit (comparing overall effect of treatments)

Hi, I'm trying to read a JSON file from an URL that requires me to authenticate for accessing the API. Can somebody provide me an example on how to do it, I did not see this option within insheetjson.

Thank you,

Danielken

↧

Directional symmetry

December 20, 2019, 3:57 pm

≫ Next: How to create a variable concatenating two variables

≪ Previous: Json with authentication file

Dear Statalist

I saw many researchers decomposing the variables into positive and negative values. For instance, if we are examining the impact of growth in sales on trade credit, before the decomposing, the results show that the coefficient of the growth is negative, so to have additional information about the effect of the variable, they decompose the growth to positive growth and negative growth, so the former takes positive values of sales growth, and 0 otherwise, and the latter takes negative values, and 0 otherwise, and this can be done by the following code:

generate growth_positive =gorwth*(growth>0)
generate growth_negative= -growth*(growth<0)

So, my question why they decompose it, is it because the growth coefficient before the decomposition was negative, if yes, shall we decompose the growth if we get a positive coefficient?

Devoting some of your valuable time to answering my question is highly appreciated.

Looking forward to hearing from you.

Many thanks in advance.

↧

How to create a variable concatenating two variables

December 20, 2019, 4:24 pm

≫ Next: Creating date variable

≪ Previous: Directional symmetry

Suppose "address" is string and "year" is floating number.
I would like to create something like region_time=="address"&"year".
For example,
address: abc
year: 1980
region_time: abc1980
How do I do this in Stata16?

↧

Creating date variable

December 20, 2019, 4:44 pm

≫ Next: Problem with asdoc table 3 way

≪ Previous: How to create a variable concatenating two variables

Hello!

How could I possibly create a new date variable in the monthly format for the data example below?

Thankfully,
Anton

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float product_id str9 date
 1 "26-Apr-17"
 1 "1-Sep-19" 
 1 "8-Aug-17" 
 1 "12-Jan-18"
 1 "27-Mar-17"
 1 "12-Nov-18"
 1 "9-Sep-17" 
 1 "15-Nov-18"
 1 "24-Aug-17"
 1 "2-Jan-17" 
 1 "13-May-17"
 1 "30-Jan-19"
 1 "25-Sep-18"
 1 "13-Sep-16"
 1 "2-Jul-19" 
 1 "18-Sep-18"
 1 "23-Nov-17"
 1 "24-Oct-19"
 1 "24-Aug-19"
 1 "28-Jan-17"
 1 "22-Feb-18"
 1 "29-Jan-17"
 1 "14-Apr-19"
12 "28-Aug-14"
14 "28-Aug-17"
16 "30-Jan-16"
16 "23-Dec-16"
16 "18-Sep-15"
16 "22-Aug-14"
16 "15-Sep-14"
16 "23-Sep-13"
16 "12-Jan-15"
16 "13-Jan-16"
16 "10-Nov-16"
16 "9-Jan-18" 
16 "7-Dec-14" 
16 "6-Jul-12" 
16 "15-Apr-18"
16 "27-Dec-12"
16 "22-Feb-16"
16 "11-Mar-15"
16 "4-Nov-14" 
16 "12-Mar-15"
16 "24-Nov-14"
16 "24-Apr-15"
17 "29-Nov-15"
17 "5-Mar-16" 
17 "11-Dec-17"
17 "26-Aug-19"
17 "22-Jun-17"
end

↧

Problem with asdoc table 3 way

December 20, 2019, 5:56 pm

≫ Next: interpreting interaction effects continuous by categorical variables

≪ Previous: Creating date variable

Hi all,

I am continually having problem with the results with asdoc in a table 3 way, example:

Code:

sysuse auto, clear

set seed 123
gen rbi=rbinomial(1,.5)
asdoc table rep78 foreign rbi, c(count mpg) sc col row


------------------------------------------------------------------------
Repair    |                       rbi and Car type                      
Record    | ------------- 0 ------------    ------------- 1 ------------
1978      | Domestic   Foreign     Total    Domestic   Foreign     Total
----------+-------------------------------------------------------------
        1 |        1         .         1           1         .         1
        2 |        6         .         6           2         .         2
        3 |       15         1        16          12         2        14
        4 |        6         5        11           3         4         7
        5 |        .         8         8           2         1         3
          |
    Total |       28        14        42          20         7        27
------------------------------------------------------------------------

Lamentably the result with asdoc in a word file is: Array
You can check that total (row) not agree with stata results.

↧

interpreting interaction effects continuous by categorical variables

December 20, 2019, 8:45 pm

≫ Next: Nationwide inpatient sample

≪ Previous: Problem with asdoc table 3 way

Hi all,

I am having difficulty to interpret interaction effects. I watched some videos and read forums but I am still confused.
Here is my model: Gender inequality= B0+ B1*mosque attendance + B2*gender + B12gendermosqueattendance
Mosque attendance is 0-6 which is the frequency of never attendance to daily attendance, gender is female=0 and male=1. Gender inequality scale is also continuous, so OLS is applied.
Can someone explain and interpret the coefficients of mosque attendance, gender, and interaction. What do those mean?Is there a sole effect of mosque attendance in this model? How do we interpret male and female effects separately? I appreciate if you could be as specific as possible.
Thank you for your time.

↧

Nationwide inpatient sample

December 20, 2019, 10:45 pm

≫ Next: Adjusting the Scale of Cluster Graph

≪ Previous: interpreting interaction effects continuous by categorical variables

Hi everyone,

I started to learn STATA for my NIS data analysis recently. I have no experience of using any statistical software in the past. How should I proceed ?

↧

Adjusting the Scale of Cluster Graph

December 21, 2019, 1:22 am

≫ Next: Dividing data into periods and then groups

≪ Previous: Nationwide inpatient sample

Hey everybody,

I am currently conducting a cluster analysis in Stata and I have a small problem.
As you can see in the attached picture, the bottom of the graph is the area "where the music plays". Therefore, I'd love to visualize this are more.

Is there a way to change the ratio scale in a way that maybe the are from 0-10 is very big and the are of 10-65 is very small?
I tried almost everything in the "Graph editor" but it seems that I cannot find the right option (if this is even possible).

The command I used is dendogram.

I hope you understand my issue and thanks to anybody in advance who is willing to help!

Best regards,
MaxArray

↧

Dividing data into periods and then groups

December 21, 2019, 5:19 am

≫ Next: Calculating egfr using creatinine

≪ Previous: Adjusting the Scale of Cluster Graph

Hi!

I want to divide my panel data into different cross-sectional data of each group, and then divide the companies in each cross sectional time period into either (1 for small size, 2 for medium, and 3 for large) based on mtob ratios, (which will be different for each time period).

Thank you in advance

↧

Calculating egfr using creatinine

December 21, 2019, 5:48 am

≫ Next: Multiple linear regression - Cross-sectional data - Percentage as response variable

≪ Previous: Dividing data into periods and then groups

Dear users

Can you please someone share their experience of using the stata module written by phil clayton to calculate egfr from creatinine. https://ideas.repec.org/c/boc/bocode/s457731.html
I have installed this module but cannot understand the syntax I should be using to generate the egfr using creatinine, gender and age information. Can someone please help ?

Thanks

↧

Multiple linear regression - Cross-sectional data - Percentage as response variable

December 21, 2019, 9:04 am

≫ Next: Replace strings

≪ Previous: Calculating egfr using creatinine

Dear community,

I'm trying to run a multiple linear regression with stata 16 using cross-sectional data. My response variable (d17aum - 1252 obs) captures the increase in terms of value of export activity in 2009 in comparison to the previous year for small and medium enterprises. Hence, it is expressed in percentage, from 1 to 100. My main explanatory variables are family ownership (continuous expressed in percentage as well - 6554 obs) and external management (categorical taking 1 when CEO is not a member of the family controlling the firm - 6827 obs). However other explanatory variables will be included as control variables. This is how data look like:

input int d17aum float(fam_own2 external_man)
. 82 0
. 100 0
. 100 0
. 100 0
. 71 0
. 50 0
. 100 0
10 50 0
. 92 0
25 77 0
10 27 0
. 100 0
. 100 0
. 100 0
. 95 0
. 100 0
10 51 0
. 100 0
. 100 0
. 100 1
. 52 0
. 100 1
10 52 0
. 90 0
. 99 0
. 62 0
. 60 0
. 52 0
. 77 1
. 67 0
. 53 0
15 100 0
. 100 0
. 27 1
1 52 0
. 52 0
. 99 0
. 100 0
. 47 0
. 100 0
6 100 1
5 100 0
. 100 0
. 100 0
. 78 0
. 100 0
. 52 1
. 51 0
. 100 0
. 100 0
. 100 0
. 63 0
. 50 0
. 92 0
. 80 0
. 100 0
. 100 0
. 26 0
. 100 0
. 100 0
. 27 0
. 27 0
40 77 0
. 100 0
. 77 0
. 52 0
. 77 0
. 37 0
. 100 0
. 100 0
. 27 0
. 100 0
. 100 0
. 100 0
. 77 0
6 52 0
. 100 0
. 100 0
. 100 0
. 62 0
. 76 0
. 100 0
. 77 0
10 100 0
. 60 0
. 52 0
. 77 0
. 100 0
. 100 0
. 100 0
. 100 0
. 100 0
. 35 0
. 62 0
. 62 0
. 52 0
. 53 0
. 69 0
. 76 0
. 100 0
end
[/CODE]

As you can see in boxplot.png and hist.png my DV is characterized by the presence of outliers and right skewed distribution. Hence I used ln(DV) in order to normalize the distribution and gain a bell curve (see log_distribution.png) and a more decent boxplot (see log_boxplot.npg). Data now look that way:
input float(wd17 fam_own2 external_man)
. 82 0
. 100 0
. 100 0
. 100 0
. 71 0
. 50 0
. 100 0
2.3025851 50 0
. 92 0
3.218876 77 0
2.3025851 27 0
. 100 0
. 100 0
. 100 0
. 95 0
. 100 0
2.3025851 51 0
. 100 0
. 100 0
. 100 1
. 52 0
. 100 1
2.3025851 52 0
. 90 0
. 99 0
. 62 0
. 60 0
. 52 0
. 77 1
. 67 0
. 53 0
2.70805 100 0
. 100 0
. 27 1
0 52 0
. 52 0
. 99 0
. 100 0
. 47 0
. 100 0
1.7917595 100 1
1.609438 100 0
. 100 0
. 100 0
. 78 0
. 100 0
. 52 1
. 51 0
. 100 0
. 100 0
. 100 0
. 63 0
. 50 0
. 92 0
. 80 0
. 100 0
. 100 0
. 26 0
. 100 0
. 100 0
. 27 0
. 27 0
3.6888795 77 0
. 100 0
. 77 0
. 52 0
. 77 0
. 37 0
. 100 0
. 100 0
. 27 0
. 100 0
. 100 0
. 100 0
. 77 0
1.7917595 52 0
. 100 0
. 100 0
. 100 0
. 62 0
. 76 0
. 100 0
. 77 0
2.3025851 100 0
. 60 0
. 52 0
. 77 0
. 100 0
. 100 0
. 100 0
. 100 0
. 100 0
. 35 0
. 62 0
. 62 0
. 52 0
. 53 0
. 69 0
. 76 0
. 100 0
end
[/CODE]

However, when checking for normality using swilk I fail to not reject the null hypothesis hence data for DV are still not normally distributed. I've also been told to use winsor to solve the issue, but no matter how I use it [p.(0.1 to 0.5) high only, lowonly or normal] the Shapiro-Wilk test proves always to be not significant. Therefore any regression i try to run (524 obs) making use of others control variables as well, results to be not significant (F > 0.5) with extremely low R-squared. Consequently, linearity and homoskedasticity assumptions do not hold. At this point, I started wondering if linear regression is the right model to use for my data. I've been reading a lot on the forum and on the web as well about analysis with dependent variables as percentages. Since my statistics knowledge is not at his best possible, I got even more confused. Some say I could treat percentage as continuous variable with linear regression being the best model to use, some say I could treat it as a count variable (even though the variable doesn't count anything) since it has a right-skewed distribution and so Poisson would me more suitable, while some others suggest to break my DV in categories according to its percentiles and use a logistic regression, while a beta regression would not be possible since I would have some values equal to 0 and 1. Since the model has to be useful for hypothesis testing, which model might fit my data the best ?

↧