Problems importing a csv

July 17, 2019, 12:09 pm

≫ Next: Problem with difference-in-differences: how to simulate data with the same distribution of real data

≪ Previous: xtabond2 model specification

Hello-I'm having problems importing a simple csv and I cannot figure out why so I'm hoping someone here can help.

I have a simple csv with 18 columns and about 855,000 observations. When I try to import into stata, my columns remain intact, but all of the data is merging into rows separated by commas. The result is 1.922 million observations. I have never run into this problem with importing a csv before.

Here is an example of the data in it's correct format:

County

Non Standard Location

Modification

State 2

Construction Type

Construction Description

WD Publication Date

Wage Group

Rate Effective Date

Craft Title

Craft Title 2

Craft Title 3

Craft Title 4

Hourly

Fringe

Rate Type

AK20190001

Aleutians East

BUILDING AND HEAVY CONSTRUCTION PROJECTS (does not include residential construction consisting of single family homes and apartments up to and including 4 stories)

Alaska

Building

4/19/2019

ASBE0097-001

1/1/2018

Asbestos Workers/Insulator

includes application of all insulating materials protective coverings, coatings and finishings to all types of mechanical systems

38.68

21.57

CBA

Here is a sample of what is happening after import (this is all in one cell in stata):

,8,Alaska,Building,,4/19/2019,ELEC1547-005,4/1/2019,Linemen,"Including Equipment Operators

Here is my code:

import delimited using "file name", varnames(1)

Thank you!

↧

Problem with difference-in-differences: how to simulate data with the same distribution of real data

July 17, 2019, 12:12 pm

≫ Next: Possible to export variable labels as matrix of strings?

≪ Previous: Problems importing a csv

Dear All,

I ran a difference-in-differences with the data I want to analyze to evaluate the impact of a policy. However, there is a limited amount of observations. Due to this, I would like to simulate having more data distributed in the same way as the real data to see if and how standard errors shrink.

I am familiar with the Stata commands program and simulate, however I am not sure how to emulate the exact distribution of real data. How can I achieve this?

Thank you for you help.

Best,

CS

↧

Possible to export variable labels as matrix of strings?

July 17, 2019, 12:15 pm

≫ Next: Problem sketching hourly time labels on x (or t) axis of a two-way graph

≪ Previous: Problem with difference-in-differences: how to simulate data with the same distribution of real data

I wish to export a matrix (or list) of variable labels from Stata to Tex. I though I'd do this by filling matrix with the labels, and then exporting via frmttable. However, I just realized that stata matrices cannot hold strings. Under any circumstances?

Below is a reproducible example of what I'd *like* to do, except that the last mata st_matrix command doesn't run. In fact, normally this loop wouldn't involve mata at all, Sstats would be created as a stata matrix directly.

Is there some way to export a series of 20-25 variable labels automatically, in a 1 column table, that doesn't involve matrices? Or can I somehow force a string-based matrix from mata? Thanks!

Code:

sysuse auto.dta, clear
gl mylist mpg rep78 headroom trunk weight length turn displacement gear_ratio

clear matrix
local rows = 0
foreach var of varlist $mylist  {
        local rows = `rows'+1
}
local cols 1
mata Sstats = J(`rows',`cols',"")

tokenize "$mylist"
forval i = 1/`rows' {
    local x : variable label ``i''
    mata Sstats[`i',1]= "`x'"
    di "`x'"
}
mata Sstats
mata st_matrix("Sstats",Sstats)

↧

Problem sketching hourly time labels on x (or t) axis of a two-way graph

July 17, 2019, 12:21 pm

≫ Next: weighted mean of proportions with confidence intervals

≪ Previous: Possible to export variable labels as matrix of strings?

Hi experts;

I am trying to sketch a twoway rcap figure showing the start and end time of a few people's performances (horizontally);

Here are my xvar1 and xvar2 (showing the start and end points of rcap). My y-axis values are also the list of people performing during these time series.

Start_Time End_Time
08:30:00 16:00:00
12:00:00 13:30:00
12:00:00 13:00:00
13:30:00 15:15:00
13:30:00 15:29:59
14:00:00 15:00:00
14:00:00 15:00:00
14:30:00 15:29:59
14:45:00 15:00:00
15:00:00 16:59:59
15:00:00 16:00:00
15:30:00 16:14:59
16:00:00 16:45:00
16:30:00 17:30:00
17:00:00 18:00:00
17:00:00 18:00:00
17:00:00 18:00:00
17:30:00 18:29:59
17:30:00 19:30:00
18:00:00 20:15:00
18:30:00 19:30:00
19:00:00 21:00:00
19:30:00 21:00:00
20:00:00 21:45:00
20:30:00 21:40:00
21:00:00 22:30:00
21:00:00 21:29:59
22:00:00 22:59:59
23:00:00 01:59:59
23:30:00 01:59:59

My problem is that I cannot label the x (or t) axis (the horizontal axis) properly. I know to with the numbers (not hours) it is easy to create lables like below:
xscale (range(0 24))
xlabel(0 (2) 24) // showing 0 2 4 6 ...... 22 24 on your x-axis

But here with the time, I don't know how to do it like the numbers.

Here is what I want:
08:00:00 09:00:00 10:00:00 ....... 21:00:00 22:00:00 23:00:00 // or even better below
08:00 am 09:00 am 10:00 am ........ 08:00 pm 09:00 pm 10:00 pm

The format of Start_Time and End_Time above is %tcHH:MM:SS.

Can you please help me what I should do to label my figure x-axis like what I described?

Thanks

↧

weighted mean of proportions with confidence intervals

July 17, 2019, 12:42 pm

≫ Next: Best command for multilevel multinomial logistic regression models

≪ Previous: Problem sketching hourly time labels on x (or t) axis of a two-way graph

I am looking to compute the mean and confidence intervals of an outcome measure across groups, using survey data.
Each observation is associated with a numerator, denominator, and proportion of the two, as well as a survey probability weight and stratum. Each observation is also associated with one of the 50 states. Could please someone advise the command and syntax to compute the weighted proportions and confidence intervals of the outcome measure across states?
Many thanks.

↧

Best command for multilevel multinomial logistic regression models

July 17, 2019, 12:44 pm

≫ Next: Error with -traj- code

≪ Previous: weighted mean of proportions with confidence intervals

Dear Statalist users,
I have Stata/SE 14.2.
I have a pooled cross-sectional time-series dataset. Basically monthly surveys are conducted on different nationally representative samples of individuals who are nested in provinces--so not a panel data.
I only have 7 waves of surveys.
My dependent variable is vote choice and is a nominal variable listing four political parties.
I would like to use mixed models as individuals from the same provinces may have intra-class correlations, though I am not sure how to handle the survey waves as the number is too few, only 7 waves.
I was planning to add the wave a variable for fixed effects.

Two questions:
1) If I am to use a multinomial choice model, what would be the best command for me to use? In Stata 14, there is no multilevel mixed effects model for nominal variables (such as memlogit), and I have seen clogit, pomlogit, or fomlogit as options. The manual suggests gsem though I am having trouble applying the example given in the manual to my case.
What would be the best command for my case?
2) How would you recommend I handle the errors from intra-wave correlations?

Thanks.

↧

Error with -traj- code

July 17, 2019, 12:49 pm

≫ Next: Help with merging and updating data

≪ Previous: Best command for multilevel multinomial logistic regression models

Hello StataListers,

I am running the following code with the -traj- program in Stata looking at monthly cost trajectories. Here, t_1-t_12 represent 12 months:

traj, var(total_cost*) indep(t_*) model(cnorm) min(0) max(748938.3) order (2 2 2)

I get the following error:

"total_cost2 is not within min1 = 0 and max1 = 748938.3"

The problem is, the maximum value for total_cost2 is actually $748,938.3. So the error doesn't make sense to me. If I increase the max to 748938.4, I get another error of "unrecognized command."

Question 1: How can I avoid getting the top error?

I tried making the "min" and "max" values that exceed the range of the actual values in the dataset and that didn't work.
I tried specifying min(0) and min1...min(12), with the values for min 1...12 being the actual maximum values in the dataset, and got an error for max(7), suggesting I cannot specify past max(6).

Question 2: The above specifies that there should be 3 groups, but this is somewhat arbitrary on my part. Is there a way to find out the "best" number of groups? Or is this a situation where I should run models with different numbers of groups and then compare model fit statistics among these models? There doesn't seem to be a lot of documentation on this; I've read the Jones and Nagrin 2013 paper and they don't address this aspect of trajectory modeling.

Thanks in advance for any advice.

↧

Help with merging and updating data

July 17, 2019, 2:29 pm

≫ Next: Doubt interpreting margins after performing interactions with Piecewise exponential regression

≪ Previous: Error with -traj- code

Hello

I am trying expand and update the data from a previous from a previous research project for political scientists,
Data can be found here

https://dataverse.harvard.edu/datase...910/DVN/0UNUAM

In the original project data are up to 2014 and the authors are merging their data using the MARPOR data (2016a update). My aim is to use expand the time spam up to 2018 and to use MARPOR data (2018b update)

I have tied by my own but I am failing in the last lines on the merge procedure. .For example in the USA case the observations for the Obama administration are still kept getting inserted also after the end of the term for the Trump administration . Also a seres of variable are even inserted wrong or are even omitted when the files are merges (i.e start/ end period, membership to oecd . eu etc). This happens for several countries for several variables The original do file is given bellow. I only updated the date to 2018 from the original one . Any help provided is whole-hardly appreciated

/************************************************** ********************************
Do file to take the WKB governments data and create a government partisanship dataset (based on MARPOR data).

Please edit:
1) cd "path" ==> The path for the working directory (the folder that contains the WKB data set)
2) "infile" ==> This is the name of the Seki-Williams Government data; only change this if you changed the name!
3) "t_outfile" ==> The name you wish to call your new Stata data file that contains the government data at a new unit of analysis (the default is annual).
4) "p_outfile" ==> The name you wish to call your new Stata data file that contains government partisanship data; the default time period is annual.
5) Choose the temporal dimension by commenting out all but one dimension with the "*". Depending on your machine, it might take a few moments for the data set to be created.

Please note:
1) If you are working with an older version of Stata, make sure that you have set a high enough memory, especially for daily data sets (the daily data set is quite large).
2) Run the entire do file at one time! Since the do file relies on temporary data and variables, these are deleted as soon as Stata is done executing the command.
3) If you want to include supporting parties, then comment out lines 392-394.

This produces the following files:
1) Governments dataset with a different unit of analysis: e.g., government/year (if you select year): you name this file in -t_outfile-
2) Government partisanship dataset (named via -p_outfile-); NOTE that the unit of analysis will be government-year (or other time period), so there will be multiple observations per time period!

************************************************** ********************************/

************************************************** ********************************/
************************************************** ********************************/
*** 1: Working directory
cd ""

*** 2: Name the new data set with the transformed unit of analysis
local t_outfile "Seki-Williams Transformed Governments--Version 2.0"

*** 3: Name the new government partisanship data set
local p_outfile "Seki-Williams Government Partisanship"

*** : Confirm the name of the original data set
local infile "Seki-Williams Governments--Version 2.0"

*** 3: Select one of the following; comment out the rest
*global t "daily"
*global t "monthly"
*global t "quarterly"
global t "annual"
************************************************** ********************************/
************************************************** ********************************/

clear
version 11
set mem 750m
set more off

use "`infile'.dta", clear

*** Generate the start date in Stata -ts- format
gen start_ts = mdy(startmonth,startday,startyear)

*** Drop an errant observation for Macedonia
drop if ccode == 343 & start_ts == .

*** Drop the later elections for Sri Lanka
drop if mapp == 0 & ccode == 780

*** Generate the end date in Stata -ts- format
sort ccode govtseq
bys ccode: gen end_ts = (start_ts[_n+1])
replace end_ts = date("31dec2018","DMY") if end_ts == . & inlist(ccode, 2, 20, 200, 205, 210, 211, 212, 220, 225, 230, 235, 255, 290, 305, 310, 316, 317, 325, 338, 344, 349, 350, 352, 355, 360, 366, 367, 368, 375, 380, 385, 390, 395, 666, 740, 900, 920)
format start_ts %td
format end_ts %td

sort ccode govtseq
tempfile wkb
save `wkb', replace

*** Create the variable measuring the last government's end date
preserve
sort ccode govtseq

tempvar ts_start
gen `ts_start' = mdy(startmonth,startday,startyear)
bys ccode: gen ts_final = `ts_start'[_n+1]
replace ts_final = end_ts if ts_final == .
format ts_final %td
drop if end_ts == .

keep ccode ts_final
lab var ts_final "Final government end date in TS format"
sort ccode
tempfile l
save `l', replace
restore

sort ccode
merge ccode using `l', keep(ts_final)
drop _merge
sort ccode govtseq

gen startquarter = startmonth
recode startquarter (1/3=1) (4/6=2) (7/9=3) (10/12=4)

tempvar s_ts e_ts

if "$t" == "daily" {
gen `s_ts' = mdy(startmonth,startday,startyear)
local t
}

if "$t" == "monthly" {
gen `s_ts' = ym(startyear,startmonth)
local t mofd
}

if "$t" == "quarterly" {
gen `s_ts' = yq(startyear,startquarter)
local t qofd
}

else if "$t" == "annual" {
gen `s_ts' = startyear
local t year
}

bys ccode: gen `e_ts' = (`s_ts'[_n+1])
replace `e_ts' = `t'(end_ts) if `e_ts' == .
cap drop duration
gen duration = `e_ts' - `s_ts'

drop if `e_ts' == .
drop if duration == .
keep ccode `s_ts' `e_ts' duration govtseq

local obs = _N
quietly foreach i of numlist 1(1)`obs' {
preserve
keep if _n == `i'
quietly sum duration if _n==1
local dur = r(mean)+1
quietly sum `s_ts' if _n==1
local start = r(mean)
local end = `start'+`dur'+1
expand `dur'
egen ts = seq(), f(`start') t(`end')
tempfile t_`i'
save `t_`i'', replace
restore
}

use `t_1', clear
quietly foreach i of numlist 2(1)`obs' {
append using `t_`i''
}
lab var ts "Date in $t TS format"
sort ccode ts

sort ccode govtseq
merge ccode govtseq using `wkb'
drop if _merge==2
drop _merge

if "$t" == "daily" {
format ts %td
}

if "$t" == "monthly" {
format ts %tm
}

if "$t" == "quarterly" {
format ts %tq
}

else if "$t" == "annual" {
format ts %ty
}

lab var start_ts "Government start date in daily time series format"
lab var end_ts "Government end date in daily time series format"
lab var duration "Government duration in $t format"

cap drop __*
order ccode govtseq ts start_ts end_ts duration
sort ccode ts govtseq

save "`t_outfile'.dta", replace

************************************************** ********************
*** Generate the government parties dataset
************************************************** ********************
use `wkb', clear

* First, create a temporary variable that counts the number of government parties
tempvar no_parties
gen `no_parties' = 10 if py10seat !=.
replace `no_parties' = 9 if py10seat == . & py9seat !=.
replace `no_parties' = 8 if py9seat == . & py8seat !=.
replace `no_parties' = 7 if py8seat == . & py7seat !=.
replace `no_parties' = 6 if py7seat == . & py6seat !=.
replace `no_parties' = 5 if py6seat == . & py5seat !=.
replace `no_parties' = 4 if py5seat == . & py4seat !=.
replace `no_parties' = 3 if py4seat == . & py3seat !=.
replace `no_parties' = 2 if py3seat == . & py2seat !=.
replace `no_parties' = 1 if py2seat == . & py1seat !=.
replace `no_parties' = 0 if gparties == 0
replace `no_parties' = 5 if ccode == 325 & inlist(govtseq, 57, 58)
tab2 gparties `no_parties'

list ccode govtseq if gparties ~= `no_parties'
list ccode govtseq if `no_parties' == .

tab `no_parties', miss

gen seat = py1seat
gen name = py1name
gen cab_perc = py1cab_perc
gen mpp = mpppy1

* 2 government parties
preserve
keep if `no_parties' == 2
expand 2
sort ccode govtseq
tempvar id
bys ccode govtseq: gen `id' = _n
qui replace seat = py2seat if `id' == 2
qui replace name = py2name if `id' == 2
qui replace cab_perc = py2cab_perc if `id' == 2
qui replace mpp = mpppy2 if `id' == 2

drop py1name - mpppy10
order country ccode govtseq name seat mpp

tempfile np2
save `np2', replace
restore

* 3 government parties
preserve
keep if `no_parties' == 3
expand 3
sort ccode govtseq
tempvar id
bys ccode govtseq: gen `id' = _n
qui foreach i of numlist 2 3 {
replace seat = py`i'seat if `id' == `i'
replace name = py`i'name if `id' == `i'
replace cab_perc = py`i'cab_perc if `id' == `i'
replace mpp = mpppy`i' if `id' == `i'
}
drop py1name - mpppy10
order country ccode govtseq name seat mpp

tempfile np3
save `np3', replace
restore

* 4 government parties
preserve
keep if `no_parties' == 4
expand 4
sort ccode govtseq
tempvar id
bys ccode govtseq: gen `id' = _n
qui foreach i of numlist 2(1)4 {
replace seat = py`i'seat if `id' == `i'
replace name = py`i'name if `id' == `i'
replace cab_perc = py`i'cab_perc if `id' == `i'
replace mpp = mpppy`i' if `id' == `i'
}
drop py1name - mpppy10
order country ccode govtseq name seat mpp

tempfile np4
save `np4', replace
restore

* 5 government parties
preserve
keep if `no_parties' == 5
expand 5
sort ccode govtseq
tempvar id
bys ccode govtseq: gen `id' = _n
qui foreach i of numlist 2(1)5 {
replace seat = py`i'seat if `id' == `i'
replace name = py`i'name if `id' == `i'
replace cab_perc = py`i'cab_perc if `id' == `i'
replace mpp = mpppy`i' if `id' == `i'
}
drop py1name - mpppy10
order country ccode govtseq name seat mpp

tempfile np5
save `np5', replace
restore

* 6 government parties
preserve
keep if `no_parties' == 6
expand 6
sort ccode govtseq
tempvar id
bys ccode govtseq: gen `id' = _n
qui foreach i of numlist 2(1)6 {
replace seat = py`i'seat if `id' == `i'
replace name = py`i'name if `id' == `i'
replace cab_perc = py`i'cab_perc if `id' == `i'
replace mpp = mpppy`i' if `id' == `i'
}
drop py1name - mpppy10
order country ccode govtseq name seat mpp

tempfile np6
save `np6', replace
restore

* 7 government parties
preserve
keep if `no_parties' == 7
expand 7
sort ccode govtseq
tempvar id
bys ccode govtseq: gen `id' = _n
qui foreach i of numlist 2(1)7 {
replace seat = py`i'seat if `id' == `i'
replace name = py`i'name if `id' == `i'
replace cab_perc = py`i'cab_perc if `id' == `i'
replace mpp = mpppy`i' if `id' == `i'
}
drop py1name - mpppy10
order country ccode govtseq name seat mpp

tempfile np7
save `np7', replace
restore

* 8 government parties
preserve
keep if `no_parties' == 8
expand 8
sort ccode govtseq
tempvar id
bys ccode govtseq: gen `id' = _n
qui foreach i of numlist 2(1)8 {
replace seat = py`i'seat if `id' == `i'
replace name = py`i'name if `id' == `i'
replace cab_perc = py`i'cab_perc if `id' == `i'
replace mpp = mpppy`i' if `id' == `i'
}
drop py1name - mpppy10
order country ccode govtseq name seat mpp

tempfile np8
save `np8', replace
restore

* 9 government parties
preserve
keep if `no_parties' == 9
expand 9
sort ccode govtseq
tempvar id
bys ccode govtseq: gen `id' = _n
qui foreach i of numlist 2(1)9 {
replace seat = py`i'seat if `id' == `i'
replace name = py`i'name if `id' == `i'
replace cab_perc = py`i'cab_perc if `id' == `i'
replace mpp = mpppy`i' if `id' == `i'
}
drop py1name - mpppy10
order country ccode govtseq name seat mpp

tempfile np9
save `np9', replace
restore

* 10 government parties
preserve
keep if `no_parties' == 10
expand 10
sort ccode govtseq
tempvar id
bys ccode govtseq: gen `id' = _n
qui foreach i of numlist 2(1)10 {
replace seat = py`i'seat if `id' == `i'
replace name = py`i'name if `id' == `i'
replace cab_perc = py`i'cab_perc if `id' == `i'
replace mpp = mpppy`i' if `id' == `i'
}
drop py1name - mpppy10
order country ccode govtseq name seat mpp

tempfile np10
save `np10', replace
restore

keep if `no_parties' == 1

qui foreach i of numlist 2(1)10 {
append using `np`i''
}

drop py1name - mpppy10

*** 2: Include supporting parties? If so, comment out these lines:
tempvar supp
gen `supp' = strpos(name, "[")
drop if `supp' != 0

rename mpp party
sort ccode govtseq party
order country ccode govtseq party name seat cab_perc
cap drop __*

tempfile gp
save `gp', replace

************************************************** ********************
*** Generate the government partisanship dataset
************************************************** ********************
use `gp', clear

* Drop all those countries that are not in the MARPOR data:
drop if inlist(ccode, 51, 110, 315, 560, 565, 571, 750, 770, 771)

*** Change some of the previous election dates so that they are consistent with the manifesto data
recode peday (5=7) if ccode == 2 & peyear == 1946
recode peyear (1946=1944) if ccode == 2
recode peday (7=2) if ccode == 2 & peyear == 1950
recode peyear (1950=1948) if ccode == 2
recode peday (2=4) if ccode == 2 & peyear == 1954
recode peyear (1954=1952) if ccode == 2
recode peday (4=6) if ccode == 2 & peyear == 1958
recode peyear (1958=1956) if ccode == 2
recode peday (6=8) if ccode == 2 & peyear == 1962
recode peyear (1962=1960) if ccode == 2
recode peday (8=3) if ccode == 2 & peyear == 1966
recode peyear (1966=1964) if ccode == 2
recode peday (3=5) if ccode == 2 & peyear == 1970
recode peyear (1970=1968) if ccode == 2
recode peday (4=7) if ccode == 2 & peyear == 1974
recode peyear (1974=1972) if ccode == 2
recode peday (7=2) if ccode == 2 & peyear == 1978
recode peyear (1978=1976) if ccode == 2
recode peday (2=4) if ccode == 2 & peyear == 1982
recode peyear (1982=1980) if ccode == 2
recode peday (4=6) if ccode == 2 & peyear == 1986
recode peyear (1986=1984) if ccode == 2
recode peday (6=8) if ccode == 2 & peyear == 1990
recode peyear (1990=1988) if ccode == 2
recode peday (8=3) if ccode == 2 & peyear == 1994
recode peyear (1994=1992) if ccode == 2
recode peday (3=5) if ccode == 2 & peyear == 1998
recode peyear (1998=1996) if ccode == 2
recode peday (5=7) if ccode == 2 & peyear == 2002
recode peyear (2002=2000) if ccode == 2
recode peday (7=2) if ccode == 2 & peyear == 2006
recode peyear (2006=2004) if ccode == 2
recode peday (2=4) if ccode == 2 & peyear == 2010
recode peyear (2010=2008) if ccode == 2

recode peday (21=23) if ccode == 20 & peyear == 2006
recode peday (19=18) if ccode == 211 & peyear == 2003
recode peday (19=10) if ccode == 395 & peyear == 2003
recode peday (9=8) if ccode == 385 & peyear == 1985

gen past_ts = mdy(pemonth,peday,peyear)
format past_ts %td

*** Merge in the MARPOR data (2018b update)
preserve
use "MPDataset_MPDS201bb.dta", clear

gen ccode = country
recode ccode (11 = 380) (12 = 385) (13 = 390) (14 = 375) (15 = 395) (21 = 211) (22 = 210) (23 = 212) (31 = 220) (32 = 325) (33 = 230) (34 = 350) (35 = 235) (41 = 255) (42 = 305) (43 = 225) (51 = 200) (53 = 205) (54 = 338) (55 = 352) (61 = 2) (62 = 20) (63 = 900) (64 = 920) (71 = 740) (72 = 666) (73 = 780) (74 = 640) (75 = 339) (76 = 371) (77 = 373) (78 = 370) (79 = 346) (80 = 355) (81 = 344) (82 = 316) (83 = 366) (84 = 372) (85 = 265) (86 = 310) (87 = 367) (88 = 368) (89 = 343) (90 = 359) (91 = 341) (92 = 290) (93 = 360) (94 = 365) (96 = 317) (97 = 349) (98 = 369) (113 = 730) (171 = 70)

sort country edate party
order countryname country ccode edate party rile

gen past_ts = edate
format past_ts %td

*** Correct some of the previous election dates so that they are consistent with the SW data.
replace past_ts = date("18may1954","DMY") if past_ts == date("18apr1954","DMY") & ccode == 205
replace past_ts = date("18jun1969","DMY") if past_ts == date("16jun1969","DMY") & ccode == 205
replace past_ts = date("26jun1949","DMY") if past_ts == date("29jun1949","DMY") & ccode == 211
replace past_ts = date("24mar1990","DMY") if past_ts == date("25mar1990","DMY") & ccode == 310
replace past_ts = date("05jun1992","DMY") if past_ts == date("06jun1992","DMY") & ccode == 316
replace past_ts = date("02jun2006","DMY") if past_ts == date("03jun2006","DMY") & ccode == 316
replace past_ts = date("28may2010","DMY") if past_ts == date("29may2010","DMY") & ccode == 316
replace past_ts = date("25oct2013","DMY") if past_ts == date("26oct2013","DMY") & ccode == 316
replace past_ts = date("05jun1992","DMY") if past_ts == date("06jun1992","DMY") & ccode == 317
replace past_ts = date("25sep1998","DMY") if past_ts == date("26sep1998","DMY") & ccode == 317
replace past_ts = date("05apr1992","DMY") if past_ts == date("06apr1992","DMY") & ccode == 325
replace past_ts = date("27mar1994","DMY") if past_ts == date("28mar1994","DMY") & ccode == 325
replace past_ts = date("09apr2006","DMY") if past_ts == date("10apr2006","DMY") & ccode == 325
replace past_ts = date("17jun2001","DMY") if past_ts == date("18jun2001","DMY") & ccode == 355
replace past_ts = date("16sep1956","DMY") if past_ts == date("26sep1956","DMY") & ccode == 380
replace past_ts = date("20sep1998","DMY") if past_ts == date("21sep1998","DMY") & ccode == 380
replace past_ts = date("13sep1981","DMY") if past_ts == date("14sep1981","DMY") & ccode == 385
replace past_ts = date("10sep1989","DMY") if past_ts == date("11sep1989","DMY") & ccode == 385
replace past_ts = date("12sep1993","DMY") if past_ts == date("13sep1993","DMY") & ccode == 385
replace past_ts = date("15sep1997","DMY") if past_ts == date("16sep1997","DMY") & ccode == 385
replace past_ts = date("03nov1959","DMY") if past_ts == date("03jul1959","DMY") & ccode == 666
replace past_ts = date("01nov1965","DMY") if past_ts == date("02nov1965","DMY") & ccode == 666

sort ccode past_ts party
tempfile marpor
save `marpor', replace
restore

sort ccode past_ts party
merge ccode past_ts party using `marpor'
drop if _merge == 2
drop _merge

sort ccode govtseq party

tempvar nonmiss denom perc_gp_nonmiss
gen `nonmiss' = cond(!missing(rile), 1, 0)
bys ccode govtseq: egen `denom' = sum(seat * `nonmiss')
gen `perc_gp_nonmiss' = (seat * `nonmiss') / `denom'

tempvar nump denomp
bys ccode govtseq: egen `nump' = total((cond(rile == .), 1, 0) * seat)
bys ccode govtseq: egen `denomp' = total(seat)
gen percpartmiss = 100 * (`nump' / `denomp')
recode percpartmiss (.=100)
lab var percpartmiss "% of government seats with missing MARPOR data"

*** These composite variables have been used in other studies to measure free-market economic position (ecopos: Tavits 2007 AJPS), total economic emphasis (econ4: Williams, Seki and Whitten 2016 PSRM) and hawk score (hawk: Whitten and Williams 2011 AJPS).
gen ecopos = (per401 + per402 + per407 + per414) - (per403 + per404 + per405 + per406 + per412)
egen econ4 = rowtotal(per401 - per416)
gen hawk = (per104 - per105 - per106)

local V rile planeco markeco welfare intpeace ecopos econ4 hawk per101 per102 per103 per104 per105 per106 per107 per108 per109 per110 per201 per202 per203 per204 per301 per302 per303 per304 per305 per401 per402 per403 per404 per405 per406 per407 per408 per409 per410 per411 per412 per413 per414 per415 per416 per501 per502 per503 per504 per505 per506 per507 per601 per602 per603 per604 per605 per606 per607 per608 per701 per702 per703 per704 per705 per706

qui foreach v of local V {
tempvar `v'
gen ``v'' = `v' if mpp_pm == party
bys ccode govtseq: egen pm_`v' = sum(``v''), missing

tempvar perc_`v'
gen `perc_`v'' = `perc_gp_nonmiss' * `v'
bys ccode govtseq: egen govt_`v' = sum(`perc_`v''), missing

lab var pm_`v' "PM's `v' score for that government (govtseq)"
lab var govt_`v' "Government's weighted `v' score (only includes available MARPOR data)"
}

*** Save this as a temporary file that we can eventually merge into the data sets with different time periods:
keep ccode govtseq govt_* pm_* percpartmiss
duplicates drop ccode govtseq, force

sort ccode govtseq
tempfile p
save `p', replace

************************************************** ****************************
*** Make sure that you have created the Governments data set in your preferred time dimension!
************************************************** ****************************
use "`t_outfile'.dta", clear

* Drop all those countries that are not in the MARPOR data:
drop if inlist(ccode, 51, 110, 315, 560, 565, 571, 750, 770, 771)

sort ccode govtseq
merge ccode govtseq using `p'
drop if _merge == 2
drop _merge

*** Drop the "duration" values for the last government in each country.
replace duration = . if end_ts == date("31dec2018","DMY")

lab def ccode 2 "USA" 20 "Canada" 51 "Jamaica" 52 "Trinidad & Tobago" 70 "Mexico" 95 "Panama" 110 "Guyana" 200 "Great Britain" 205 "Ireland" 210 "Netherlands" 211 "Belgium" 212 "Luxembourg" 220 "France" 225 "Switzerland" 230 "Spain" 235 "Portugal" 255 "Germany" 265 "German Democratic Republic" 290 "Poland" 305 "Austria" 310 "Hungary" 315 "Czechoslovakia" 316 "Czech Republic" 317 "Slovakia" 325 "Italy" 338 "Malta" 339 "Albania" 341 "Montenegro" 343 "Macedonia" 344 "Croatia" 346 "Bosnia-Herzegovina" 349 "Slovenia" 350 "Greece" 352 "Cyprus" 355 "Bulgaria" 359 "Moldova" 360 "Romania" 365 "Russia" 366 "Estonia" 367 "Latvia" 368 "Lithuania" 369 "Ukraine" 370 "Belarus" 371 "Armenia" 372 "Georgia" 373 "Azerbaijan" 375 "Finland" 380 "Sweden" 385 "Norway" 390 "Denmark" 395 "Iceland" 560 "South Africa" 565 "Namibia" 571 "Botswana" 640 "Turkey" 666 "Israel" 730 "Korea" 740 "Japan" 750 "India" 770 "Pakistan" 771 "Bangladesh" 780 "Sri Lanka" 900 "Australia" 920 "New Zealand"
lab val ccode ccode

order ccode ts govtseq start_ts end_ts
sort ccode ts govtseq
compress

save "`p_outfile'.dta", replace

↧

Doubt interpreting margins after performing interactions with Piecewise exponential regression

July 17, 2019, 4:35 pm

≫ Next: confidence interval for mean difference

≪ Previous: Help with merging and updating data

Hello,
For a project I am doing, I need to perform survival analysis on a sample of firms. In order to do so, I am doing a Piecewise Exponential regression.

First of all, I have stset my data in the following way (it may be useful to show this in order to understand the doubts I have) - stset age, id(ID) failure(death==1)

One of the things I have to do is to perform interactions between some variables in my model. One of these interactions is the one between the binary variable "tech_intensity" (which is 1 for high-tech firms and 0 for low-tech firms) and the binary variable "innovator1" (which is 1 for innovators and 0 for non-innovators).
The code I'm using looks like this:

streg e2 e3 e4 log_trabalhadores exporter1 i.tech_intensity##i.innovator1 i. ano i.region, dist(exp) nolog nohr
margins tech_intensity#innovator

In which the other variables other than the ones I mentioned are other relevant variables and controls

The problem I am having is because the output I am obtaining from the margins command is the following:

margins tech_intensity#innovator

Predictive margins Number of obs = 93,317
Model VCE : OIM

Expression : Predicted median _t, predict()

-------------------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
tech_intensity#innovator1 |
0 0 | 12.42644 .2460552 50.50 0.000 11.94418 12.9087
0 1 | 13.95517 .8232267 16.95 0.000 12.34167 15.56866
1 0 | 14.15248 .6992042 20.24 0.000 12.78206 15.52289
1 1 | 29.37967 9.804994 3.00 0.003 10.16223 48.5971

From what I can ascertain, the margin values I am obtaining are the predicted median of _t. Well, if _t is the analysis time when record ends (time at which individual stops being at risk), I don't understand how can the values be like this, considering that the maximum age for firms in my sample is of 11 years. Am I interpreting something wrong, or is there some point to all this I am missing?

Furthermore, I have one additional doubt. One other interaction I need to understand is the one between this tech_intensity variable and a continuous logarithmic variable. When I run the margins command, it tells me that "factor variables may not contain noninteger values" - Is there something I can do in order to run the margins command when one of the variables involved is a logarithmic one, or is it a limitation I have to abide to?

Thank you very much,
Rui Agostinho

↧

confidence interval for mean difference

July 17, 2019, 5:06 pm

≫ Next: Listing values and labels

≪ Previous: Doubt interpreting margins after performing interactions with Piecewise exponential regression

Dear Listers,

I would like to know how to capture confidence interval for mean difference. I have two groups- intervention and control. After conducting ttest, I want to get the result of confidence interval for mean difference. I know ttest result shows 'differ' but I have more than 10 variables and don't want to copy and past the result from screen.

Code:

* Example generated by    -dataex-. To install:    ssc install    dataex
clear
input float(cost_strip    cost_monitoring) byte    rand_trt
0       0 1
0 128.205 0
.87       0 1
.29       0 1
0 167.055 0
0  168.35 0
.58 177.415 0
0 164.465 0
0   181.3 0
.29 185.185 0
1.45       0 1
.29 182.595 0
.29       0 1
.29       0 1
.29       0 1
0  189.07 0
.29       0 1
0       0 1
0 120.435 0
0 167.055 0
.58  186.48 0
.29       0 1
0       0 1
.29  170.94 0
0       0 1
.29  186.48 0
0       0 1
.29       0 1
.29       0 1
0       0 1
end

Code:

       global X cost_strip    cost_monitoring

       foreach x of global X {
     {
qui sum `x' if rand_trt==0

        
         gen int`x'= r(mean)
         qui sum `x' if rand_trt==1

         scalar con`x' = r(mean)
         
         
         gen     diff = int`x' - con`x'
     qui ci mean diff
     scalar c = r(lb)
     scalar d = r(ub)


         }
    *di "`x' " _skip(10) %12.1f c _skip(10) %12.1f d _skip(10) 
         }

rand_trt is an intervention dummy. But this code doesn't work.

Many thanks in advance.

BW

Kim

↧

Listing values and labels

July 17, 2019, 9:22 pm

≫ Next: Collapse with percentage of population in a dummy variable wrt region and year

≪ Previous: confidence interval for mean difference

I have a bunch of countries. They are coded as numbers (values) in an integer variable. For each value, the country name has been attached as value (label).

It's a sizable dataset and I need to do some manipulations based on country.

Initially, I didn't notice that what appear as values are actually labels, and hence did a

Code:

tab country

and wrote commands based on country names, e.g.

Code:

replace countrycode = "ALB" if country == "Albania"

Got a type mismatch and only then realized that country is not string.

So, now, I want to change the commands to reflect country (number, as opposed to name). But how do I figure out which number (value) corresponds to which country name (label). Is there a tab command that will show me a "dictionary" of values and labels?

↧

Collapse with percentage of population in a dummy variable wrt region and year

July 17, 2019, 9:32 pm

≫ Next: Calculate Percentile of Top Scale Response in Longitudinal data

≪ Previous: Listing values and labels

Hi everyone,

I am currently working with data of following Format

Year Region Dummy
2001 11 0
2001 11 0
2001 11 1
2001 12 1
2001 12 1
2001 12 0
2002 11 1
2002 11 1
2002 11 1
2002 12 1
2002 12 0

I want to collapse the data in such a way that i have result like this

Year Region percentage of population in a region in a given year that has dummy value 1
2001 11 0.33
2001 12 0.66
2002 11 1
2002 12 0.66

Any help in this regard is highly appreciated.

↧

Calculate Percentile of Top Scale Response in Longitudinal data

July 17, 2019, 10:14 pm

≫ Next: how do we get the significance of Beta's

≪ Previous: Collapse with percentage of population in a dummy variable wrt region and year

I have longitudinal international survey data, with a variable on a scale of 5.
I would like to assess what % of respondents answered `5` across various countries and then collapse by country.
Is there an easy way to calculate the percent this short of something like this

Code:

replace percentile_top=(100*count if response = 1)/(count if response != .), by country

↧

how do we get the significance of Beta's

July 18, 2019, 12:48 am

≫ Next: Marginal Effects with Factor Variables

≪ Previous: Calculate Percentile of Top Scale Response in Longitudinal data

Hi all,

So I estimate the beta's using GMM in MATA.

: void GMM_DL(todo,betas,crit,g,H)
{
PHI=st_data(.,("phi"))
PHI_LAG=st_data(.,("phi_lag"))
Z=st_data(.,(" lagloglab laglogmat logcapital "))
X=st_data(.,(" logcapital logmaterials loglabor "))
X=st_data(.,(" logcapital logmaterials loglabor "))
X_lag=st_data(.,(" lagloglab laglogmat logcapital "))
Y=st_data(.,(" logdeflatedrevenue "))
QR_lag=st_data(.,(" logprevioustariff "))
C=st_data(.,("const"))
OMEGA=PHI-X*betas'
OMEGA_lag=PHI_LAG-X_lag*betas'
OMEGA_lag_pol=(C,OMEGA_lag,QR_lag)
g_b = invsym(OMEGA_lag_pol'OMEGA_lag_pol)*OMEGA_lag_pol' OMEGA
XI=OMEGA-OMEGA_lag_pol*g_b
crit=(Z'XI)'(Z'XI)
}

: void DL()
{
S=optimize_init()
optimize_init_evaluator(S, &GMM_DL())
optimize_init_evaluatortype(S,"d0")
optimize_init_technique(S, "nm")
optimize_init_nmsimplexdeltas(S, 0.1)
optimize_init_which(S,"min")
optimize_init_params(S,(2,0.8,0.2))
p=optimize(S)
p
st_matrix("beta_NP",p)
}

I save the matrix and get beta's.

My question is: how do I get to know the significance and SD of the coefficients?. The matrix only shows the beta's.

Thanks!

↧

Marginal Effects with Factor Variables

July 18, 2019, 1:36 am

≫ Next: Identifying when a variable passes a certain threshold

≪ Previous: how do we get the significance of Beta's

I want to calculate the marginal effects of my control variables. I use factor variables for categorical variables and for interactions, but I have different results when I use the factor variables compared to not using using the factor variables. I do not understand what am I doing wrong, please find attached the two different set of results.

Code:

. glm A c1r c1l c1c B C D E F G H I J K L M N O i.country, fa(b) link(logit) vce(robust)
note: vote1 has noninteger values

Iteration 0:   log pseudolikelihood =  -42.94693  
Iteration 1:   log pseudolikelihood =  -42.86314  
Iteration 2:   log pseudolikelihood = -42.862902  
Iteration 3:   log pseudolikelihood = -42.862902  

Generalized linear models                         No. of obs      =        109
Optimization     : ML                             Residual df     =         66
                                                  Scale parameter =          1
Deviance         =  2.554759242                   (1/df) Deviance =   .0387085
Pearson          =  2.449876389                   (1/df) Pearson  =   .0371193

Variance function: V(u) = u*(1-u/1)               [Binomial]
Link function    : g(u) = ln(u/(1-u))             [Logit]

                                                  AIC             =   1.575466
Log pseudolikelihood = -42.86290183               BIC             =  -307.0742

-------------------------------------------------------------------------------------
                    |               Robust
              A |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
                c1r |  -8.777873   6.446331    -1.36   0.173    -21.41245    3.856704
                c1l |  -5.230066   9.232645    -0.57   0.571    -23.32572    12.86559
                c1c |   14.04833   12.72731     1.10   0.270    -10.89674     38.9934
      B |   .5111221   1.002131     0.51   0.610    -1.453019    2.475264
          C |  -.2158331   .1219806    -1.77   0.077    -.4549108    .0232445
       D |  -1.015791   .6120185    -1.66   0.097    -2.215326    .1837427
       E |   -1.34568   .6942753    -1.94   0.053    -2.706434    .0150751
           F |   1.913601   1.263446     1.51   0.130    -.5627074    4.389909
           G |    3.03058   1.415898     2.14   0.032     .2554705     5.80569
           H |  -1.232532    .465843    -2.65   0.008    -2.145567   -.3194963
         I |   9.664852   5.081045     1.90   0.057    -.2938134    19.62352
         J |  -6.138072   5.524545    -1.11   0.267    -16.96598    4.689838
   K |   6.307305   2.485234     2.54   0.011     1.436336    11.17827
L |   .2637226   .1204656     2.19   0.029     .0276144    .4998309
   M |   .0157357   .0524758     0.30   0.764     -.087115    .1185864
N |   -5.59704   1.892737    -2.96   0.003    -9.306737   -1.887342
  O |   1.622794   1.562146     1.04   0.299    -1.438957    4.684544
                    |
            country |
                 2  |  -.3174748   .2825339    -1.12   0.261    -.8712311    .2362816
                 3  |   .9050205   .5325704     1.70   0.089    -.1387983    1.948839
                 5  |   .6366366   .3458818     1.84   0.066    -.0412793    1.314553
                 6  |  -.8061525   .3245252    -2.48   0.013     -1.44221   -.1700948
                 7  |   .9979821   .5256645     1.90   0.058    -.0323014    2.028266
                 8  |  -.3019033   .2208869    -1.37   0.172    -.7348336    .1310271
                 9  |   1.793135   1.120034     1.60   0.109    -.4020915    3.988362
                10  |   2.935015   1.670142     1.76   0.079    -.3384039    6.208433
                11  |   1.032755   .2682285     3.85   0.000     .5070367    1.558473
                12  |   .6142258   .4606796     1.33   0.182    -.2886895    1.517141
                13  |   .6579136   .2909312     2.26   0.024     .0876989    1.228128
                14  |   2.322749    1.06672     2.18   0.029      .232017    4.413481
                15  |   .3860536   .4793302     0.81   0.421    -.5534164    1.325524
                16  |   .3461292   .6025894     0.57   0.566    -.8349244    1.527183
                17  |  -.4832077   .7065197    -0.68   0.494    -1.867961    .9015454
                18  |   1.483425   .3839561     3.86   0.000     .7308851    2.235965
                19  |   .0972213   .3887099     0.25   0.803     -.664636    .8590787
                20  |   .8722559   .5144817     1.70   0.090    -.1361096    1.880621
                21  |    .846566   .3115486     2.72   0.007     .2359419     1.45719
                22  |   1.283534   .5730708     2.24   0.025     .1603358    2.406732
                23  |   1.252627    .498347     2.51   0.012     .2758851    2.229369
                24  |   .4272264   .3648452     1.17   0.242    -.2878572     1.14231
                25  |     2.1732   .8145399     2.67   0.008     .5767307    3.769668
                26  |  -.5704112   .3033152    -1.88   0.060    -1.164898    .0240757
                27  |   1.839924   1.113983     1.65   0.099    -.3434435    4.023291
                    |
              _cons |  -3.763331    1.28113    -2.94   0.003      -6.2743   -1.252362
-------------------------------------------------------------------------------------

.

Code:

 
 glm A c.c1#D c.c1#E c.c1#creelection1 B i.C i.D i.E c.averagegovtexp#D c.averagegovtexp#E c.a
> veragegovtexp#creelection1 c.I c.J c.K c.L c.M c.N c.O i.country, fa(b) link(logit) vce(robus
> t)

note: 1.E#c.c1 omitted because of collinearity
note: 1.creelection1#c.c1 omitted because of collinearity
note: 1.E#c.averagegovtexp omitted because of collinearity
note: 1.creelection1#c.averagegovtexp omitted because of collinearity
note: A has noninteger values

Iteration 0:   log pseudolikelihood = -42.941667  
Iteration 1:   log pseudolikelihood = -42.859938  
Iteration 2:   log pseudolikelihood =   -42.8597  
Iteration 3:   log pseudolikelihood =   -42.8597  

Generalized linear models                         No. of obs      =        109
Optimization     : ML                             Residual df     =         64
                                                  Scale parameter =          1
Deviance         =  2.548356364                   (1/df) Deviance =   .0398181
Pearson          =  2.444726741                   (1/df) Pearson  =   .0381989

Variance function: V(u) = u*(1-u/1)               [Binomial]
Link function    : g(u) = ln(u/(1-u))             [Logit]

                                                  AIC             =   1.612105
Log pseudolikelihood = -42.85970039               BIC             =  -297.6979

-------------------------------------------------------------------------------------
                    |               Robust
              A |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
  rreelection1#c.c1 |
                 0  |   8.470525   19.76285     0.43   0.668    -30.26394      47.205
                 1  |  -1.580529   23.87824    -0.07   0.947    -48.38102    45.21996
                    |
  lreelection1#c.c1 |
                 0  |   6.771615   12.02922     0.56   0.573    -16.80523    30.34846
                 1  |          0  (omitted)
                    |
  creelection1#c.c1 |
                 0  |  -13.67147   15.70461    -0.87   0.384    -44.45194    17.10901
                 1  |          0  (omitted)
                    |
      B |   .5041213   1.014915     0.50   0.619    -1.485075    2.493317
        1.C |  -.2165275   .1214293    -1.78   0.075    -.4545245    .0214696
     1.D |  -1.076306    .632389    -1.70   0.089    -2.315766    .1631532
     1.E |  -1.412453   .6621664    -2.13   0.033    -2.710276   -.1146312
                    |
       D#|
   c.averagegovtexp |
                 0  |   1.571097   2.247716     0.70   0.485    -2.834346     5.97654
                 1  |   3.604304   2.404048     1.50   0.134    -1.107542    8.316151
                    |
       E#|
   c.averagegovtexp |
                 0  |  -3.163227   1.347199    -2.35   0.019    -5.803689   -.5227646
                 1  |          0  (omitted)
                    |
       creelection1#|
   c.averagegovtexp |
                 0  |    1.20869    .519934     2.32   0.020     .1896384    2.227742
                 1  |          0  (omitted)
                    |
         I |   9.703945   4.878562     1.99   0.047     .1421387    19.26575
         J |  -6.184813   5.818585    -1.06   0.288    -17.58903    5.219403
   K |     6.0959   2.741294     2.22   0.026     .7230625    11.46874
L |   .2592149   .1332625     1.95   0.052    -.0019748    .5204045
   M |    .024784    .063311     0.39   0.695    -.0993032    .1488713
N |  -5.528738   1.958356    -2.82   0.005    -9.367046    -1.69043
  O |   1.596015   1.553947     1.03   0.304    -1.449666    4.641695
                    |
            country |
                 2  |  -.3206109   .2972785    -1.08   0.281     -.903266    .2620443
                 3  |   .7806132   .6643167     1.18   0.240    -.5214236     2.08265
                 5  |   .5959516   .4866569     1.22   0.221    -.3578784    1.549781
                 6  |  -.7933917   .3294559    -2.41   0.016    -1.439113   -.1476701
                 7  |   .9247078   .6654651     1.39   0.165    -.3795798    2.228995
                 8  |  -.3031683   .2269195    -1.34   0.182    -.7479223    .1415857
                 9  |   1.808557   1.184715     1.53   0.127    -.5134411    4.130555
                10  |   2.941216   1.806134     1.63   0.103    -.5987413    6.481174
                11  |   1.011056   .3283157     3.08   0.002     .3675687    1.654543
                12  |   .5916792   .5207664     1.14   0.256    -.4290042    1.612363
                13  |    .611112   .3838553     1.59   0.111    -.1412306    1.363454
                14  |   2.317612   1.153793     2.01   0.045     .0562189    4.579004
                15  |   .2948902   .6324098     0.47   0.641    -.9446103    1.534391
                16  |   .2832857   .7178377     0.39   0.693     -1.12365    1.690222
                17  |  -.4926725   .7168042    -0.69   0.492    -1.897583    .9122379
                18  |   1.447373   .4602922     3.14   0.002     .5452173    2.349529
                19  |   .0948476    .443375     0.21   0.831    -.7741515    .9638466
                20  |   .8335054   .5799551     1.44   0.151    -.3031856    1.970196
                21  |   .8195496   .3933186     2.08   0.037     .0486594     1.59044
                22  |   1.217881   .7061443     1.72   0.085    -.1661364    2.601898
                23  |   1.207571   .6129117     1.97   0.049     .0062862    2.408856
                24  |   .4207905   .4571189     0.92   0.357     -.475146    1.316727
                25  |   2.149276   .9624491     2.23   0.026     .2629099    4.035641
                26  |  -.5625191    .294506    -1.91   0.056     -1.13974     .014702
                27  |   1.804801   1.245022     1.45   0.147    -.6353973    4.244999
                    |
              _cons |  -3.550846   1.789157    -1.98   0.047     -7.05753   -.0441618
-------------------------------------------------------------------------------------

. 
end of do-file

↧

Identifying when a variable passes a certain threshold

July 18, 2019, 1:38 am

≫ Next: Matching sample based on multiple variables

≪ Previous: Marginal Effects with Factor Variables

Hi I have a data set of athletes training times per day as well as the date the training was held. I wish to identify when the athlete passes 500,1000 and 2000 training minutes. So far I have used
bysort athlete_id: gen total_training_minutes= sum(minutes)
This has helped me obtain the cumulative training minutes of the athletes. After that I used drop total_training_minutes if <500 to identify the date which the athlete surpassed the 500 mark. Now, I wish to do the same with 1000 and 2000 minutes. I was thinking of using drop if total_training_minutes<1000 but then I would lose the data on when the athlete passed 500 minutes. Any advice on how to identify the date where the athlete passed 1000 training minutes without dropping the data when the athlete passed 500?

Thank you.

↧

Matching sample based on multiple variables

July 18, 2019, 1:56 am

≫ Next: Cross-equation test in Panel Analysis

≪ Previous: Identifying when a variable passes a certain threshold

Hi,
I have two samples. Sample 1 (main sample) includes 200 companies from Sweden. Sample 2 includes 5000 companies from the US. Using sample 2, I want to find one match company for each company includes in sample 1. I want to find my match sample based on sic code (company industry identity), assets (company's assets value), ROA and audit fees.
All the companies are identified by a specific identity number, ISIN number. Both samples show the variables: sic code, year, assets, ROA and audit fees.

I would be grateful if one could help me that how can I find the match based on the several variables or criteria.

Note:
I am using mac and Stata 15.

Best regards,
Mahmoud

↧

Cross-equation test in Panel Analysis

July 18, 2019, 2:19 am

≫ Next: Generate variable that takes the name of one of a list of dummy variables

≪ Previous: Matching sample based on multiple variables

HI,
I'm trying to compare coefficient from different regression models with panel data.
In a normal regression I would use cross-equation test, that allows to test for the difference between two regression coefficients across independent samples. The test answers the question: “does b1 = b2?”; where b1 reflects the effect of explanatory variable within group 1 and b2 is the effect of the same variable within group 2.
code:

regr Y X1 X2 if X3 == 0

est store zero

regr Y X1 X2 if X3 == 1
est store one

suest zero one

test [zero_Y=one_Y]: X1

If I do the same thing after "xtreg, fe" (fixed effect) the output is:

Constraint 1 dropped

chi2( 0) = .
Prob > chi2 = .

Is there a way to do cross-equation test using "xtreg, fe"?
thank you

↧

Generate variable that takes the name of one of a list of dummy variables

July 18, 2019, 2:50 am

≫ Next: Doubt when exporting string with quotes to .txt

≪ Previous: Cross-equation test in Panel Analysis

Dear statalist,

I am trying to solve the following problem:

I have 105 dummy variables for 105 months spanning Janury 2010 – September 2018, each taking the value 1 if something happened that month. They’re named mat201001, mat201002 … mat201809.

I need to generate a new variable, that will give me the month and year in which the event first occured, that is when a dummy variable first took the value 1. For example if the first dummy to take the value 1 was mat201202, the new variable would be 201202, if the first dummy to be 1 is mat201801, the new variable will be 201801.

Thank you!
Zuzana

↧

Doubt when exporting string with quotes to .txt

July 18, 2019, 4:01 am

≫ Next: Convergence fracreg logit

≪ Previous: Generate variable that takes the name of one of a list of dummy variables

Hey all,

Please consider the following scenario:

Code:

clear
set obs 2
gen id_genbank = ">Seq" + " -j " + `"""' + "bla" + `"""' in 1
replace id_genbank = ">Seq" + " -j " + "bla" in 2
export delimited using test.txt, delimiter(tab) replace

*to display where the file was saved
pwd

Can someone explain why the observation 1 gets exported with four extra quotes in the .txt file, as

Code:

">Seq -j ""bla"""

despite being shown in Stata browse as

Code:

>Seq -j "bla"

, given that I'm not specifying the option "quote" on the command export? I fail to see why does the observation 2 gets exported exactly the way it is shown on Stata browse, while observation 1 receives extra quotes.

In the end, I would like to export the observations in a manner such that they look like

Code:

>Seq -j "bla"

when shown in the txt file. Any ideas on achieving that?

Cheers

↧