reshape error: variable J contains all missing values

February 21, 2020, 5:39 am

≫ Next: DiD panel data with individual fixed effects and time county fixed effects

≪ Previous: Generate Variable - StataIC - Panel Data

Hello,

I relatively new to STATA and would really appreciate some help! I am trying to covert my dataset from long to wide using below command:

reshape wide pseudo_eventid eventcode days_from_diag_to_treat providercode providerdesc within_six_months_flag ca_of_providercode_name, i(pseudo_patientid) j(event)

However I receive the error message "variable event contains all missing values" yet I have no missing data in this variable. Event is numeric.

The dataset consists of 8 merged datasets and 170 variables...

Can anybody suggest where I might be going wrong by any chance please?

↧

DiD panel data with individual fixed effects and time county fixed effects

February 21, 2020, 5:43 am

≫ Next: Generating a new variable conditional on multiple values - using or command?

≪ Previous: reshape error: variable J contains all missing values

Hi,

I am new to Stata and in need of an advice on how to translate my model specification into code. I have an individual level panel data set, including 11 million observations from 21 counties covering a period of 19 years. For the purpose of my project, I have 2 time dimensions- calendar (t) and event year (e). I want to use a difference in differences model with individual fixed effects, as well as to add county time fixed effects. The goal is to estimate the effect of being able to take sick leave before the first childbirth (event year -1) on the long term propensity to be on sick leave afterwards. There is an exogenous regional variation that affects the likelihood of the mother to get sick leave. The individual propensity to be on sick leave is measured by their history of sick leave withdrawals in the years before pregnancy (event years from -14 to -2).
This is the my specification:

SL_ict = α_i + Σ¹⁷_e=-1α_φ1[ e=φ ] +α_tc+ Σ¹⁷_e=-1α_φ1[ e=φ ]* HL + β_iX_it+ β_cX_ct + ε_ict

Where:

- SL is number of sick leave days per year for individual i, living in county c, in calendar year t

- α_iis individual effect parameter which captures individual's propensity to be on sick leave during event years -14 to -2, which is to be used as a reference point

- α_φcaptures how much has the individual's propensity to be on sick leave has changed from its reference point during event years -1 to 17. The term in brackets is equal to 1 when φ=e (event year e=-1 to 17)

- α_φ* HL is an interaction between the parameter capturing the individual propensity to be on sick leave in event years -1 to 17 and the county’s leniency in that year. HL is dummy variable indicating weather the county is lenient or strict on giving sick leave.

- α_ct county time fixed effects for given calendar year, because time trends are very regional

- X_itis an individual time varying covariant, individual characteristic that change over time, which includes indicator variables for additional children, length of education, work sector, mother’s income, father’s income, household’s disposable income.

- X_ct county level characteristics that change over time, such as county level unemployment rates in calendar year.

My supervisor advised me to use xi:areg with absorb(id) and cluster(id). I don't know how to code it in a way that I can get all of the parameters for each event year. I'm confused about the whole code to be honest. It is way more complex than anything I have ever done. Therefore any help is more than welcome!

↧

Generating a new variable conditional on multiple values - using or command?

February 21, 2020, 6:14 am

≫ Next: Difference between -margins, dydx(categorical_variable)- and -margins r.i.categorical_variable- & Margins for multilevel mixed-effects logit

≪ Previous: DiD panel data with individual fixed effects and time county fixed effects

Hey,
I Hope someone can help me with this:

I have a dataset of households in various zip codes. I want to generate a variable that places the hholds into five different categories according to an area index of multiple deprivation using the zip codes that were correspondingly identified.

The logical way to do this in stata is to write a command that looks like this:

gen if aimd5=="12345" | zip=="82456" | zip=="56234" | zip==..............
replace if aimd5=="75687" | zip=="45688" | zip=="95689" | zip==..............
replace if aimd5=="14687" | zip=="34687" | zip=="64687" | zip==..............
replace if aimd5=="54687" | zip=="54645" | zip=="54687" | zip==..............
replace if aimd5=="64687" | zip=="78987" | zip=="21387" | zip==..............

There would be about 8000 "or" conditions The problem is, Stata doesn't allow this many "OR" conditions in that statement.
This command works fine if I only put in 10 zips or so.

Is there any way for me to perform this operation?
Kind regards

↧

Difference between -margins, dydx(categorical_variable)- and -margins r.i.categorical_variable- & Margins for multilevel mixed-effects logit

February 21, 2020, 6:16 am

≫ Next: Difference-in-difference

≪ Previous: Generating a new variable conditional on multiple values - using or command?

Hello everyone,

I am conducting a random-effects logistic regression (-xtlogit-) for a panel-like data structure (-xtset subject period-), in which a have 20 observations per subject. The dependent variable is binary (behavior: 0/1). Between subjects, I have five treatment and one control group (treatment: categorical 1/6) . Furthermore, I have a dispositional personality feature of subjects from a questionnaire (dis: quasi-continuous 1/6).

I want to "see" the effect of this dispositional personality feature on the behavior in the control group as well as the treatment effect and its interaction with this dispositional feature.

First, I ran a RE logistic regression and found an effect of some treatments compared to the control group. However, there is no effect of dis overall.

Code:

xtset subject period
xtlogit behavior i.treatment c.dis, cluster(subject)

Second, I ran a RE logistic regression and found a main and interaction effect of some treatments compared to the control group. I also find an effect of dis in the control group.

Code:

xtset subject period
xtlogit behavior i.treatment##c.dis, cluster(subject)

Third, to visualize the effect of dis in the control group I did the following:

Code:

margins 1.treatment, at(dis(1(1)6))

Fourth, to visualize the interaction effects of treatments and dis I did the following:

Code:

margins, dydx(2.treatment) at(dis(1(1)6))
margins, dydx(3.treatment) at(dis(1(1)6))
margins, dydx(4.treatment) at(dis(1(1)6))
margins, dydx(5.treatment) at(dis(1(1)6))
margins, dydx(6.treatment) at(dis(1(1)6))

I used marginsplot to visualize the results. However, I wonder whether the following command is better or worse. It yields almost, but not 100% the same results:

Code:

margins r.i.treatment_split if treatment==1 | treatment==2, at(dis(1(1)6))
margins r.i.treatment_split if treatment==1 | treatment==3, at(dis(1(1)6))
margins r.i.treatment_split if treatment==1 | treatment==4, at(dis(1(1)6))
margins r.i.treatment_split if treatment==1 | treatment==5, at(dis(1(1)6))
margins r.i.treatment_split if treatment==1 | treatment==6, at(dis(1(1)6))

Do you have any comments regarding my analysis and/or which command (-margins, dydx()- versus -margins r... if,-) you prefer for analyzing the marginal effects at representative values?

--

Also, I include this question here as it is based on the same case: In addition to what I described above, I also have a nested data structure. That's why I created a multilevel mixed-effects logistic regression (-melogit-) to back up my results.

Code:

melogit behavior i.treatment c.dis, || session: || group: || subject:
melogit behavior i.treatment##c.dis, || session: || group: || subject:

However, I was unable to find an "easy" way to replicate my approach from above (marginal effect at representative values in the control group and for treatment groups in contrast to the control group) with a multilevel mixed-effects logistic regression. Do you have any source where I can find an approach of how to do and interpret it?

--

Thanks a lot for any help you can provide!
Kim

↧

Difference-in-difference

February 21, 2020, 6:25 am

≫ Next: Survival analysis t, t0, _d missing at baseline

≪ Previous: Difference between -margins, dydx(categorical_variable)- and -margins r.i.categorical_variable- & Margins for multilevel mixed-effects logit

Hello everyone,

We are investigating the effects of an newly implemented law on the financial markets. We are going to conduct a difference-in-difference method. FYI We are beginners in Stata so please excuse me if our questions are very basic.

Our treatment group is: investment banks in the EU.
Control group: investment banks in the US.
Pre period: from Q1 2014 to Q4 2017
Post period: Q1 2018 to Q3 2019

We have panel data that looks like this (numbers in millions)

Array

Should we use xtreg or reg when we want to include the individual effects of each country? I have coded country as "encode values labels from string variable".
Would this be an appropriate code for DID?
Code: reg netinc treat post interaction

In order to create panel data we have done the following:
xtset name quarter
panel variable: name (strongly balanced)
time variable: quarter, 1 to 23
delta: 1 unit
Does this make sense?

Another question. We want to include the variable "age" which is years since the company's IPO, but as you can see we only have age listed once in the first period for each company. It is the age of each company in 2019 and we want stata to understand that the age is not connected to quarters. How can we do this?
Any help would be greatly appreciated!

Best regards
Ida

↧

Survival analysis t, t0, _d missing at baseline

February 21, 2020, 6:38 am

≫ Next: Creating a margins plot with confidence intervals

≪ Previous: Difference-in-difference

Dear Stata community,

When I declare the data to be survival data, I do not know how to specify that time0 is pouchage_days = 0 so that the first observation for each patient is the baseline covariates (thus it should have _t0 = 0 and _t equals the next observation time in days for each patient). In example, there are no entries for _d, _t, and t_0 in baseline row however. Do no know why this is, please help in how to populate these correctly when designating survival options to Stata.

This is the command I have used.

stset pouchage_days, id(id) failure(advanced_adenoma==3) origin(observation==1) scale(1)

Thank you so much for your help.

BW

Roshani

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float observation int id float(advanced_adenoma pouchage_days) byte(_st _d) int(_t _t0)
 1 1 1    0 0 .    .    .
 2 1 2 2671 1 0 2671    0
 3 1 1 3028 1 0 3028 2671
 4 1 1 3518 1 0 3518 3028
 5 1 2 3882 1 0 3882 3518
 6 1 2 4246 1 0 4246 3882
 7 1 . 4284 1 0 4284 4246
 8 1 2 4470 1 0 4470 4284
 9 1 2 4815 1 0 4815 4470
10 1 . 4855 1 0 4855 4815
11 1 2 5051 1 0 5051 4855
12 1 2 5219 1 0 5219 5051
13 1 2 5401 1 0 5401 5219
14 1 2 5590 1 0 5590 5401
15 1 2 5954 1 0 5954 5590
16 1 1 6311 1 0 6311 5954
17 1 2 6675 1 0 6675 6311
18 1 . 6864 1 0 6864 6675
19 1 1 7305 1 0 7305 6864
20 1 2 7753 1 0 7753 7305
21 1 3 8614 1 1 8614 7753
22 1 2 9092 0 .    .    .
23 1 1 9387 0 .    .    .
 1 2 1    0 0 .    .    .
 2 2 1 1033 1 0 1033    0
 1 3 1    0 0 .    .    .
 2 3 .  643 1 0  643    0
 3 3 1 1585 1 0 1585  643
 4 3 1 2088 1 0 2088 1585
 5 3 1 2684 1 0 2684 2088
 6 3 1 3422 1 0 3422 2684
 7 3 1 3783 1 0 3783 3422
 8 3 1 4149 1 0 4149 3783
 9 3 1 4513 1 0 4513 4149
10 3 1 5290 1 0 5290 4513
11 3 2 5794 1 0 5794 5290
12 3 2 6620 1 0 6620 5794
13 3 2 7208 1 0 7208 6620
14 3 1 7464 1 0 7464 7208
15 3 2 7691 1 0 7691 7464
 1 4 1    0 0 .    .    .
 2 4 1 5085 1 0 5085    0
 3 4 1 6198 1 0 6198 5085
 4 4 1 6609 1 0 6609 6198
 5 4 2 6994 1 0 6994 6609
 6 4 1 7203 1 0 7203 6994
 7 4 1 7609 1 0 7609 7203
 8 4 2 8095 1 0 8095 7609
 1 5 1    0 0 .    .    .
end

------------------ copy up to and including the previous line ------------------

↧

Creating a margins plot with confidence intervals

February 21, 2020, 6:57 am

≫ Next: About boxplot

≪ Previous: Survival analysis t, t0, _d missing at baseline

Using the marginsplot command, I'm able to create a graph that looks like this:

Array
However, is there a way in Stata to create a graph that looks like this?

Array

↧

About boxplot

February 21, 2020, 7:05 am

≫ Next: help with overlapping sctater plots

≪ Previous: Creating a margins plot with confidence intervals

Hi，
I want to make a box plot from which I can see the mean obviously,while I learnd that the

command

graph box only show the median of a variable in the box .Is there any possibility to add the mean on the box plot?

↧

help with overlapping sctater plots

February 21, 2020, 7:07 am

≫ Next: how to calculate buy_and_hold returns, event study

≪ Previous: About boxplot

Dear All,
I am using the code below to create multiple oberlapping scatter plots including fitted line. i am creating scatter plots for men and women for different depnedent variables and the same independent varible. when the the plot is created i don't know which color is for men and which oone is for women. what should i add the to the code to leable the colors by gender.

thanks in advance

Code:

ocal j = 1
local names
foreach v of varlist bmi_w waistcm_w hipcm_w whratio_w fatmass_w  fmi_w fat_w bai_w vat_w ffm_w bodyweighkg_w {
     twoway (scatter `v' psaaa_w if sex==0)(scatter `v' psaaa_w if sex==1) (lfit `v' psaaa_w if sex==0)(lfit `v' psaaa_w if sex==1), name(graph`j')
     local names `names' graph`j'
     local ++j
}

graph combine `names'

stata 15.1 mac

↧

how to calculate buy_and_hold returns, event study

February 21, 2020, 7:11 am

≫ Next: Paired data match - customer supplier pairs

≪ Previous: help with overlapping sctater plots

I am trying to calculate the buy and hold returns for an event study. The window is -210, -11. Data is from CRSP.

I tried the codes below but it gave first only "." in the returnproduct variable (first code), and with the second code I got error "not sorted".

Does someone know how to put in the code the estimation window and help me with this?

I have a dataset of CRSP and T1 combined. I have duplicates of 302 for each event id because I had to calculate cumulative abnormal returns over the event window (-2,+2) and estimation window (-91,-300).

Code:

bysort adate_and_cusip: gen mrkt_ret_1 = mrkt_ret+1
egen double returnproduct = total(ln(mrkt_ret_1))
replace returnproduct = exp(returnproduct)

Code:

bysort adate_and_cusip: gen mrkt_ret_1 = mrkt_ret+1
gen double returnproduct = 1 if _n == 1
bysort adate_and_cusip: replace returnproduct = L.returnproduct*mrkt_ret_1

↧

Paired data match - customer supplier pairs

February 21, 2020, 7:25 am

≫ Next: How to drop observations from large dataset

≪ Previous: how to calculate buy_and_hold returns, event study

Dear stata users and experts,

Below I will display the structure of my two datasets.
Data 1 reports how much a supplier sells to one (or multiple ) of its customer in year t.
Data 2, the master file, reports each firm’s financial data (e.g. annual sales), the master file has firm year observations, that is, customers and suppliers both show up in this data 2 . the data source I retrieve data from could have missing records for customers, so not all customers may show up in Data 2 in my real dataset.

Goal: merge data1 into data 2. Need to account for 1 customer buys from 1 or multiple supplier and 1 supplier sells to multiple customers.

Data 1:
Firms in the market buy and sell to one another, my data reports customer identifier (customerid) and how much it purchases from the corresponding supplier (supplierid) in that year.
in my real data, some customers buy from 1 suppliers , some from multiple suppliers. similarly, some suppliers have 1, some have more than 1 customers.
Saletocustomer indicates how much supplierid (e.g. 201) sells to this customer(e.g. id 100) in year 1990.
Data 2:
Id: is the firm id, it could be customers or suppliers in data 1. Sales , not the same variable as customersale in data1. This data could have observations that do not appear in data 1, e.g. id 103.
My the real data are over 10000 panel observations.

Can someone help me with merging Data 1 and 2, if I want to analyze
1. The paired data subsample, i.e. both customerid and supplierid has a match in data2
2. Customerid data, i.e. as long as customerid is matched with data2, keep it, otherwise, delete it.

Thank you,

Code:

clear
input int(year customerid supplierid) byte saletocustomer
1990 100 201 50 
1990 101 201 100 
1990 101 202 50 
1991 101 201 30 
1991 100 200 40 
1992 100 202 30 
end

Code:

clear
input int(year id sales)
1991 200 100 
1990 201 150 
1991 201 200 
1990 202 80 
1992 202 80 
1990 100 120 
1991 100 110 
1992 100 110 
1990 101 80 
1991 101 100 
1990 103 110 
1991 103 110 
end

↧

How to drop observations from large dataset

February 21, 2020, 7:30 am

≫ Next: Bootstrapping error: Too few observations?

≪ Previous: Paired data match - customer supplier pairs

Hi everyone,

I am using Nationwide Inpatient Sample dataset and I am interested in only patients with STEMI. After defining STEMI as a variable how do I remove rest of the observations (rows) from dataset which I will not be using for my analysis

↧

Bootstrapping error: Too few observations?

February 21, 2020, 8:21 am

≫ Next: Livingston - Lewis (1995) method

≪ Previous: How to drop observations from large dataset

Code:

. *Sensitivity check using Bootstrapping
. global x Family_Firm_Identifier

. global y ROA

.  
. global m Premium1DW

.  
. capture program drop bootm3

. program bootm3, rclass
  1.                
.                 global controls FirmAge FirmSize Growthopp Indebtedness i.SIC_2 i.Fiscal_Year
  2.                 reg $m $x $controls, vce(cluster GVKEY)
  3.                 matrix path1 = e(b)
  4.                 di path1[1,1]
  5.                 global bx1 path1[1,1]
  6.                                
.                 global controls FirmAge FirmSize Growthopp Indebtedness i.SIC_2 i.Fiscal_Year
  7.                 reg $y $x $m $controls, vce(cluster GVKEY)
  8.                 matrix path2 = e(b)
  9.                 di path2[1,2]
 10.                 global bm1 path2[1,2]  
 11.  
.                 return scalar ciemn1 = $bx1*$bm1
 12.  
. end

. bootstrap r(ciemn1), level(95) reps(500) seed(1): bootm3
(running bootm3 on estimation sample)

Bootstrap replications (500)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    50
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   100
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   150
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   200
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   250
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   300
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   350
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   400
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   450
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   500
insufficient observations to compute bootstrap standard errors
no results will be saved
r(2000);

Dear all, I am running a bootstrapping code in order to determine the statistical significance of my indirect effect in a mediation analysis. The underlying model is a pooled OLS model in this case (I run Random effects with clustered standard errors, too. However this should not matter at this point I think). My Dataset is a panel dataset with 1777 observations over 22 years.
In this example, I have reduced the replications to 500, normally using 5000. Nevertheless, how comes I get no results at all? As a potentially helpful aside, Premium1DW = Mediator, has very few observations (approx. 100). When I run the same mediation with another mediation, that has 800 obversations, from 5000 replications approx. 2000 observations are successfully drawn. Thank you in advance for any advice / ideas / help.

↧

Livingston - Lewis (1995) method

February 21, 2020, 8:40 am

≫ Next: auditor tenure

≪ Previous: Bootstrapping error: Too few observations?

Hello everyone,
I am working on getting the decision accuracy and decision consistency for an exam result (final scores).
What we have on the data set are ID, final score, passing score (Cutoff). The reliability coef alpha was already provided.
It seems that the Livingston Lewis can help get the result.
I was wondering if there were some STATA codes to run in order to get the results.
Or, they are clear formula to use.
Thank you very much in advance for the help as I really appreciate it.

↧

auditor tenure

February 21, 2020, 8:47 am

≫ Next: Reconciling coefficients

≪ Previous: Livingston - Lewis (1995) method

Hi All

I kindly request your help below.

My data consists of 'clientid' (client), 'auditorid' (auditor), year and 'tenure' (number of years auditor has audited client).

1. I would like to create new_var1 as shown below. New_var1 is such that for each client, if the last observation of tenure for an auditor is greater than or equal to 5, new_var receives an indicator 1, otherwise 0

2. new_var2 is such that, for each client, if an auditor has new_var1 =1, then the next auditor(if there is) will have an indicator of 1, otherwise 0 as shown below

3. new_var3 is such that, for each client, if the last observation of tenure for that auditor is less than 5 and there is a new auditor the following year for that same client, new_var3 receives an indicator 1, otherwise 0 as shown below

4.new_var4 is such that for each client, if an auditor has new_var3 =1, then the next auditor (if there is) will have an indicator of 1, otherwise 0 as shown below.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte clientid int(auditorid year) byte(tenure new_var1 new_var2 new_var3 new_var4)
 2  778 2009 1 0 0 0 0
 2  778 2010 2 0 0 0 0
 2  778 2011 3 0 0 0 0
 2  778 2012 4 0 0 0 0
 2  778 2013 5 0 0 0 0
 2  778 2014 6 1 0 0 0
 2  628 2015 1 0 1 0 0
 2  628 2016 2 0 0 0 0
 2  628 2017 3 0 0 0 0
 3  801 2013 1 0 0 0 0
 3  801 2014 2 0 0 0 0
 3  801 2015 3 0 0 0 0
 3  801 2016 4 0 0 0 0
 3  801 2017 5 1 0 0 0
 4  888 2010 1 0 0 1 0
 4 1073 2011 1 0 0 1 1
 4  875 2012 1 0 0 0 1
 4  875 2013 2 0 0 0 0
 4  875 2014 3 0 0 0 0
 4  875 2015 4 0 0 0 0
 4  875 2016 5 0 0 0 0
 4  875 2017 6 1 0 0 0
 5  810 2010 1 0 0 0 0
 5  810 2011 2 0 0 0 0
 5  810 2012 3 0 0 1 0
 5  581 2013 1 0 0 0 1
 5  581 2014 2 0 0 1 0
 5  618 2015 1 0 0 0 1
 5  618 2016 2 0 0 0 0
 5  618 2017 3 0 0 0 0
 6  971 2011 1 0 0 0 0
 6  971 2012 2 0 0 0 0
 6  971 2013 3 0 0 0 0
 6  971 2014 4 0 0 0 0
 6  971 2015 5 0 0 0 0
 6  971 2016 6 1 0 0 0
 6  254 2017 1 0 1 0 0
 8  638 2010 1 0 0 0 0
 8  638 2011 2 0 0 0 0
 8  638 2012 3 0 0 0 0
 8  638 2013 4 0 0 0 0
 8  638 2014 5 0 0 0 0
 8  638 2015 6 0 0 0 0
 8  638 2016 7 1 0 0 0
10  495 2010 1 0 0 0 0
10  495 2011 2 0 0 0 0
10  495 2012 3 0 0 0 0
10  495 2013 4 0 0 0 0
10  495 2014 5 0 0 0 0
10  495 2015 6 0 0 0 0
10  495 2016 7 1 0 0 0
13  156 2012 1 0 0 1 0
13  475 2013 1 0 0 0 1
13  475 2014 2 0 0 0 0
13  475 2015 3 0 0 0 0
13  475 2016 4 0 0 0 0
13  475 2017 5 1 0 0 0
14  898 2012 1 0 0 0 0
14  898 2013 2 0 0 0 0
14  898 2014 3 0 0 0 0
14  898 2015 4 0 0 0 0
14  898 2016 5 1 0 0 0
14  482 2017 1 0 1 0 0
15  337 2011 1 0 0 1 0
15  400 2012 1 0 0 0 1
17  978 2009 1 0 0 0 0
17  978 2010 2 0 0 0 0
17  978 2011 3 0 0 0 0
17  978 2012 4 0 0 0 0
17  978 2013 5 1 0 0 0
17  699 2014 1 0 1 0 0
17  699 2015 2 0 0 0 0
17  699 2016 3 0 0 0 0
17  699 2017 4 0 0 0 0
18   45 2010 1 0 0 0 0
18   45 2011 2 0 0 0 0
18   45 2012 3 0 0 0 0
18   45 2013 4 0 0 0 0
18   45 2014 5 1 0 0 0
18  139 2015 1 0 1 0 0
18  139 2016 2 0 0 1 0
18  773 2017 1 0 0 0 1
19  743 2011 1 0 0 0 0
19  743 2012 2 0 0 0 0
19  743 2013 3 0 0 0 0
19  743 2014 4 0 0 0 0
19  743 2015 5 1 0 0 0
19  805 2016 1 0 1 0 0
19  805 2017 2 0 0 0 0
20  116 2011 1 0 0 0 0
20  116 2012 2 0 0 0 0
20  116 2013 3 0 0 0 0
20  116 2014 4 0 0 0 0
20  116 2015 5 1 0 0 0
20 1038 2016 1 0 1 0 0
20 1038 2017 2 0 0 0 0
end

↧

Reconciling coefficients

February 21, 2020, 8:48 am

≫ Next: Using python integration to parse a strL variable

≪ Previous: auditor tenure

Good afternoon,

I have a conceptual question. I have a panel dataset and assume that I run a regression of one variable on the other (both in logs) and I include several sets of fixed effects. Due to the large number of fixed effects I use reghdfe. Now assume that I run the same regression, but I interact my explanatory variable with a dummy variable for three different groups, and therefore get a coefficient for each of the three interacted variables. Why is it that if I calculate a weighted average of the three coefficients (using the number of observations for each group) I do not get the same magnitude as the coefficient that I get if I do not interact my explanatory variable?

The same applies if in the original regression without interactions I further control for fixed effects for the three groups.

Let me know if this is clear or not, otherwise I can explain further.

many thanks, Michelle.

↧

Using python integration to parse a strL variable

February 21, 2020, 9:05 am

≫ Next: eaalogit to account for attribute non-attendance

≪ Previous: Reconciling coefficients

I am trying to parse a strL variable named "X" in Stata. One of the first steps I need to complete is to remove all characters in the strL variable that fall between the characters "<" and ">". I have attempted using regular expressions via the command:

Code:

replace X = regexr(X, "\<(.)+\>", "")

but this crashes Stata - I suspect because the strL variable X can be very long with lots of text falling between "<" and ">". Sometimes there are hundreds of separate "< text that needs to be removed >" occurrences in a single observation's value of X.

I thought that perhaps I could use Stata's python integration to (1) load the strL variable X into python, remove all the text in X between the characters "<" and ">", and return the modified strL variable X back to Stata for further parsing using Stata' excellent substring functions, with which I am already quite familiar. The problem is that I don't have much familiarity to python, and it appears that working with strL variables in Python is a bit complicated. Whereas I can easily load a str variable from Stata into python using the sfi module, loading a strL variable seems to work differently for reasons I don't fully understand. I am looking for any advice that might be helpful in this task - whether it be in native Stata (perhaps there is some other approach for removing the unwanted text that won't crash Stata) or through python integration.

Thanks in advance!

↧

eaalogit to account for attribute non-attendance

February 21, 2020, 9:07 am

≫ Next: How to do wald test of LR statistic of rho when estimating a logistic random effect panel data model with robust standard errors?

≪ Previous: Using python integration to parse a strL variable

Hello

I am trying to account for attribute non-attendance for my choice experiment via the eaalogit command by Hole, Arne, (2016), EAALOGIT: Stata module to estimate endogenous attribute attendance models, https://EconPapers.repec.org/RePEc:boc:bocode:s457903.

I am using the following code:

Code:

matrix b0=e(b) #after I estimated my latent class model via lclogit
eaalogit chosen fpr fsec ftert mnopr mpr msec mtert fnopr_nch fpr_nch fsec_nch ftert_nch mnopr_nch mpr_nch msec_nch mtert_nch fnopr_nch2 fpr_nch2 fsec_nch2 ftert_nch2 mnopr_nch2 mpr_nch2 msec_nch2 mtert_nch2, group(choicecard) id(respondent) keaa(23) eaaspec(x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23) from(b0)

I tried this on three versions of Stata. However I get the following error messages on the two versions:
Version 14
initial vector: extra parameter choice1:fpr found
specify skip option if necessary
r(111);
Version 15
Some variables are collinear - if this is intended use the coll option. However I can't find how to use this command online.
Version 16
convergence not achieved
r(430);

Anyone who has an idea how to solve this?

Thank you very much in advance!

Kind regards

Eva

↧

How to do wald test of LR statistic of rho when estimating a logistic random effect panel data model with robust standard errors?

February 21, 2020, 9:09 am

≫ Next: Removing the last digit of a variable with unequal length

≪ Previous: eaalogit to account for attribute non-attendance

I am new to Stata and also new to this statalist. Actually I only have moderate level skills for my statistical background. I'm currently running a logistic regression panel data in Stata 16. I have ever read that LR test of rho is an indicator to determine which logistic panel data to use between random effects and pooled ols. My first question here is why wald statistic of LR test of rho disapears every time I use robust standard errorsin the logistic random effect panel data? then how can I determine which logistic panel data model is best suited between random effects and pooled ols model?
Second, I would like to know how to determine which best logistic panel data model to use between fixed effects and pooled ols. Is it true that wald test can be used as a basis of choosing between logit fixed effects and pooled ols? Thank you so much for the answers.

↧

Removing the last digit of a variable with unequal length

February 21, 2020, 9:53 am

≫ Next: Why use dfmethod(satterthwaite) in mixed model?

≪ Previous: How to do wald test of LR statistic of rho when estimating a logistic random effect panel data model with robust standard errors?

Hi Statalist!

I have a variable (int format) and I would like to create another variable that removes the last digit of said variable.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int ward
141
 31
171
133
173
311
381
 91
101
291
101
111
113
123
243
 91
103
132
142
 13
 13
 21
103
 61
121
141
 41
 73
143
233
 21
141
311
 41
103
132
151
271
 41
 83
121
 91
341
241
 22
101
 31
171
193
111
 83
341
181
 83
101
 53
 71
 22
 93
 93
193
 83
 91
111
 62
123
 51
131
101
123
123
 71
 12
 42
 42
 62
 72
 82
 92
 92
112
152
232
242
242
 62
102
202
 12
102
132
182
192
212
222
 11
 61
 81
161
183
end

So if ward is equal to 141, I would like the new variable to be 14. If ward is equal to 92, I would like the new variable to equal 9.
The ward variable is either 2 digits or 3 digits long.

Any tips on how I can do this?
Do I have make ward into a string variable and then use subinstr() or substr?

Many thanks!
Kevin

↧