Help with PVAR and the Hansen J test

October 4, 2024, 11:55 am

≫ Next: Percentages in table (Stata 18) and the across option

≪ Previous: Merge IDs, start and end of recording with timeseries datasets

Hello everyone! I am someone who does not know Stata or any other programming language at all and I have been dependent on CHAT-GPT for the codes. I am facing an issue while running a code. I have a panel data in the following structure:
Array

The data in the image has a total of 15466 rows and there is data from 5 countries (France, Germany, UK, USA and Japan) I am trying to run a vector auto regression model on the above panel data with W_Excess_returns (winsorised excess returns), GDP growth, Inflation rate and Interest rate. What I want to do is see whether excess returns are explained by GDP growth, inflation rate and interest rates along with their lags to capture the lagging effect.

I ran a very basic code in stata using pvar, the code is below:

pvar W_Excess_returns GDP_growth Inflation_rate Interest_rate, lags (2) overid

the output for the hansen J test is provided below:
Array
I have a few questions:

Is my code correct? I think it looks fine but not sure. if not what is the correct code?
Is the output correct considering the results of the hansen test?
How do I fix the Hansen test results?
How do I improve the code? Is it even possible?

I understand these are really basic questions but I really need help!

Thank you!

↧

Percentages in table (Stata 18) and the across option

October 4, 2024, 12:59 pm

≫ Next: How to form a matrix of indicators showing whether any two elements of a vector share the same value? Vectorization of a double loop.

≪ Previous: Help with PVAR and the Hansen J test

I'm confused about how table handles the "across" option in percent across rows. The simplest example I can show is using the auto.dta data where the "across" percentages differ from the direct calculation from the frequencies. At first I thought this might be due to some case/listwise deletion, but with no missing data in the auto data, I'm perplexed as to why I get differing numbers from the two approaches.

Code:

. sysuse auto, clear
(1978 automobile data)

. table mpg foreign, stat(percent weight, across(foreign)) zerocounts

--------------------------------------------
| Car origin
| Domestic Foreign Total
--------------+-----------------------------
Mileage (mpg) |
12 | 100.00 100.00
14 | 85.55 14.45 100.00
15 | 100.00 100.00
16 | 100.00 100.00
17 | 54.16 45.84 100.00
18 | 83.02 16.98 100.00
19 | 100.00 100.00
20 | 100.00 100.00
21 | 69.27 30.73 100.00
22 | 100.00 100.00
23 | 100.00 100.00
24 | 78.18 21.82 100.00
25 | 19.98 80.02 100.00
26 | 72.19 27.81 100.00
28 | 74.19 25.81 100.00
29 | 100.00 100.00
30 | 51.71 48.29 100.00
31 | 100.00 100.00
34 | 100.00 100.00
35 | 100.00 100.00
41 | 100.00 100.00
Total | 77.20 22.80 100.00
--------------------------------------------

Code:

. table mpg foreign, stat(count weight) zerocounts

-------------------------------------------
| Car origin
| Domestic Foreign Total
--------------+----------------------------
Mileage (mpg) |
12 | 2 0 2
14 | 5 1 6
15 | 2 0 2
16 | 4 0 4
17 | 2 2 4
18 | 7 2 9
19 | 8 0 8
20 | 3 0 3
21 | 3 2 5
22 | 5 0 5
23 | 0 3 3
24 | 3 1 4
25 | 1 4 5
26 | 2 1 3
28 | 2 1 3
29 | 1 0 1
30 | 1 1 2
31 | 0 1 1
34 | 1 0 1
35 | 0 2 2
41 | 0 1 1
Total | 52 22 74
-------------------------------------------

For example, for mileage level 17, there are 2 foreign and 2 domestic, so I would expect the "across" table to generate 50% for each, but using percent I get 54.16 and 45.84. Is there an option that I'm missing? I feel like it must be something simple, but the difference between the percent option and doing the raw division is flummoxing me.

↧

How to form a matrix of indicators showing whether any two elements of a vector share the same value? Vectorization of a double loop.

October 4, 2024, 2:45 pm

≫ Next: Overlapping map

≪ Previous: Percentages in table (Stata 18) and the across option

Good evening,

For an N dimensional vector, I want to form an N X N matrix of indicators showing whether two elements of the vector share the same value. E.g.,

Code:

. mat v = (1,2,4,4)

. mat list v

v[1,4]
    c1  c2  c3  c4
r1   1   2   4   4

. mat R = (1,0,0,0 \ 0, 1 , 0, 0 \ 0 , 0, 1, 1 \ 0, 0, 1, 1)

. matlist R

             |        c1         c2         c3         c4 
-------------+-------------------------------------------
          r1 |         1                                  
          r2 |         0          1                       
          r3 |         0          0          1            
          r4 |         0          0          1          1

In the example the first value of the vector v of 1 is unique, so it shares a value only with itself; same for the second, the third and forth elements share the same value with themselves and with each other, resulting in the 2 X 2 square of 1s on the east-south end of R.

I managed to do this in Mata using a double loop.

I am wondering whether there is some vectorised clever way to do this, which results in faster computation that the double loop I am doing below?

The two versions of the doble loop follow. One is just a direct double loop from 1 to N; in the second version I tried to be clever and to cut down the loop in half, as the R matrix is symmetric. However this does not seem to save much execution time.

Code:

sysuse auto

expand 400

replace rep = 6 if missing(rep)

putmata N = rep

timer on 1
mata:
obs = rows(N)
R = I(obs)
for (i=1; i<=obs; i++) {
for (j=1; j<i; j++) {
R[i,j]=N[i]==N[j]
}
}
R = R + R' - I(obs)
end
timer off 1

timer on 2
mata:
obs = rows(N)
Rcorrect = I(obs)
for (i=1; i<=obs; i++) {
for (j=1; j<=obs; j++) {
Rcorrect[i,j] = N[i]==N[j]
}
}
end
timer off 2

mata: mreldif(Rcorrect, R)

timer list

In terms of the auto data I am looking for the following result, but hopefully in some vectorised way:

Code:

. sysuse auto
(1978 automobile data)

. 

. 
. keep in 1/13
(61 observations deleted)

. 
. replace rep = 6 if missing(rep)
(2 real changes made)

. 
. putmata N = rep
(1 vector posted)

. 

. 
. timer on 2

. mata:
------------------------------------------------- mata (type end to exit) ------------------------------------------------------------------------------------------------------
: obs = rows(N)

: Rcorrect = I(obs)

: for (i=1; i<=obs; i++) {
> for (j=1; j<=obs; j++) {
> Rcorrect[i,j] = N[i]==N[j]
> }
> }

: N
        1
     +-----+
   1 |  3  |
   2 |  3  |
   3 |  6  |
   4 |  3  |
   5 |  4  |
   6 |  3  |
   7 |  6  |
   8 |  3  |
   9 |  3  |
  10 |  3  |
  11 |  3  |
  12 |  2  |
  13 |  3  |
     +-----+

: Rcorrect
[symmetric]
         1    2    3    4    5    6    7    8    9   10   11   12   13
     +------------------------------------------------------------------+
   1 |   1                                                              |
   2 |   1    1                                                         |
   3 |   0    0    1                                                    |
   4 |   1    1    0    1                                               |
   5 |   0    0    0    0    1                                          |
   6 |   1    1    0    1    0    1                                     |
   7 |   0    0    1    0    0    0    1                                |
   8 |   1    1    0    1    0    1    0    1                           |
   9 |   1    1    0    1    0    1    0    1    1                      |
  10 |   1    1    0    1    0    1    0    1    1    1                 |
  11 |   1    1    0    1    0    1    0    1    1    1    1            |
  12 |   0    0    0    0    0    0    0    0    0    0    0    1       |
  13 |   1    1    0    1    0    1    0    1    1    1    1    0    1  |
     +------------------------------------------------------------------+

: end
------------------

↧

Overlapping map

October 4, 2024, 3:58 pm

≫ Next: Issue with xtline

≪ Previous: How to form a matrix of indicators showing whether any two elements of a vector share the same value? Vectorization of a double loop.

Hi,

I have the first map and I want to outline with with a different color the municipalities in the center (identified with the variable coastal). When I try to do that I get the second map. I am using the following code:

spmap hom1 using "coords.dta", id(id) fcolor("`colors'") ocolor(white ..) ///
osize(0.05 ..) clmethod(custom) clbreaks(0 1 5 15 50 100 200 400) ///
legend(pos(5) size(1.8) region(fcolor(gs15)))

preserve

keep if coastal==2

rename id _ID
keep _ID

merge 1:m _ID using "coords.dta"

keep if _merge==3

save coords2.dta, replace

restore

spmap hom1 using "coords.dta", id(id) fcolor("`colors'") ocolor(white ..) ///
osize(0.05 ..) clmethod(custom) clbreaks(0 1 5 15 50 100 200 400) ///
legend(pos(5) size(1.8) region(fcolor(gs15))) ///
polygon(data("coords2.dta") ocolor(maroon) osize(.1))

Array Array
Thanks,

Tessa

↧

Issue with xtline

October 5, 2024, 12:22 am

≫ Next: Can I do gr_edit "quietly"?

≪ Previous: Overlapping map

Good afternoon everyone,
I am currently working on a research project. I'm working on creating multiple panel graphs showing GDP trends for several Asian countries from 2005-2020. I've noticed that while some panels display the y-axis labels properly, others are missing them completely (especially for Bangladesh, India, Iran, Kyrgyzstan, and Mongolia panels). Only the bottom panels (Pakistan, Philippines, and Vietnam) show the complete y-axis labels.
I'd like to know:
-Why are the y-axis labels missing in some panels?
-How can I ensure all panels display their y-axis labels consistently?

Any suggestions would be greatly appreciated. Thank you in advance!Array

↧

Can I do gr_edit "quietly"?

October 5, 2024, 1:04 am

≫ Next: meta forestplot

≪ Previous: Issue with xtline

Hi all,

Whenever I use `gr_edit` to manipulate a graph, the graph window enters an editing mode and "flashes". When I execute multiple `gr_edit` commands at once, this flashing is not very elegant (it will flash quite a few times very quickly...).

I am wondering if it is possible to suppress the window operations of `gr_edit` to avoid this flashing of the window (but still edit the graph as I wanted)?

Kind regards,
Hall

↧

meta forestplot

October 5, 2024, 4:22 am

≫ Next: Probit with vce(robust)

≪ Previous: Can I do gr_edit "quietly"?

Hi Statalisters,

I have produced a forest plot with 4 subgroups using meta forest. I am trying to add pooled ORs for subgroups 1 & 2, and 3 & 4 respectively in the forest plot. I know customoverall can add additional lines of results at the end of the forest plot, but can it add one additional line of results in the middle of the forest plot (here i.e. after subgroup 2's results)?

Thank you for your time. Any suggestion would be appreciated.

Sifan

↧

Probit with vce(robust)

October 5, 2024, 8:15 am

≫ Next: Sample Size Estimation

≪ Previous: meta forestplot

Hi
I have read contrary opinions about using vce(robust) option in probit models. I am not sure yet whether it makes sense to use r option for probit/logit model. I appreciate it if you could answer this question.

↧

Sample Size Estimation

October 5, 2024, 4:05 pm

≫ Next: Exposure in mediation model

≪ Previous: Probit with vce(robust)

Hi there, I was hoping you could help me estimate the sample size using either the -power- or -ciwidth- command.

I have pilot data (N=17) which consists of analyte concentrations from matched samples collected via 2 different methods, methodA and methodB. It contains 3 variables: id, methodA, methodB. I've fitted a simple linear regression model to fit the measurements from methodB into the methodA range, the results are as follows:

Code:

.         regress methodA methodB

Source |       SS           df       MS      Number of obs   =        17
-------------+----------------------------------   F(1, 15)        =   2849.23
Model |  12.3342823         1  12.3342823   Prob > F        =    0.0000
Residual |  .064934729        15  .004328982   R-squared       =    0.9948
-------------+----------------------------------   Adj R-squared   =    0.9944
Total |   12.399217        16  .774951065   Root MSE        =    .06579
------------------------------------------------------------------------------
methodA | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
methodB |   1.000668   .0187468    53.38   0.000     .9607106    1.040626
_cons |   .1832411   .0526242     3.48   0.003     .0710754    .2954069
------------------------------------------------------------------------------

I want to refine the model using an additional, external dataset of matched samples, which also contains data on the age of the sample. I have no indication whether age will have an effect, but want to include both age and the interaction term (methodB*age) in the model to find out. The final model I generate will be applied to a much larger dataset of unmatched samples. The pilot dataset is not representative of the distribution of sample age in the additional training dataset or in the wider, unmatched dataset. I do not have access to the additional dataset as yet.

As well as serving as a training dataset to refine the model and enable the inclusion of the age and interaction predictors, I want to use this additional dataset as a means of validating the model. I plan to perform k-fold cross-validation.

I want to estimate how many samples to include to: 1) ensure I have enough power to detect significant relationships between age / the interaction term and methodA; and 2) to limit the prediction error to X, where X is a defined value (that I'm also unsure exactly what should be set).

I read that for a training dataset, sample size should be based on effect size of the predictors (which I have calculated for methodB) whilst for a test dataset, effect size should be based on the magnitude of the prediction error we are willing to detect and the variance of the prediction errors (which I can estimate for methodB). I would be grateful for any general advice on whether this sounds correct. Also, as I only have preliminary data for methodB and not age or their interaction, how best should I estimate total effect size, especially given the magnitude of methodB's effect (Cohen's f²; see below)?

Code:

.**estimate effect size for methodB
local r2 : di e(r2)
local f2methodB = `r2'/(1-`r2')
di "Cohens's f2: `f2methodB'"
            
Cohens's f2: 189.9490166125627

I originally set out using the -power- command, as below. However, I was unsure how to include the number of predictors or effect size (and how to include age or the interaction term in the estimate of total effect size)? I also wondered if maybe I should set the effect size to the smallest estimated effect size of all the predictors to ensure sufficient sample size to detect such small estimated size? e.g. Cohen's f²= 0.02?

Code:

.         power rsq 0.9948, ntested(3)

Performing iteration ...

Estimated sample size for multiple linear regression
F test for R2 testing all coefficients
H0: R2_T = 0  versus  Ha: R2_T != 0

Study parameters:

        alpha =    0.0500
        power =    0.8000
        delta =  191.3077
         R2_T =    0.9948
      ntested =         3

Estimated sample size:

            N =         6

Then I decided -ciwidth- may be a better option for sample size determination as it may be more appropriate for sample size determination for model validation. I wrote the following code, but wondered if there was a way of specifying a multivariate regression as with the power command?

Code:

.         quietly regress methodA methodB
.        
.         local r = sqrt(e(r2)) // correlation coef

.        
.         su methodA

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   methodA |         17    2.859964     .880313   1.539736   4.555959

.                 local asd = r(sd)

.         su methodB

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   methodB |         17    2.674935    .8774185   1.396721   4.384808

.                 local bsd = r(sd)

.                
.         ciwidth pairedmeans, sd1(`asd') sd2(`bsd') corr(`r') probwidth(0.95)
> width(0.1)

Performing iteration ...

Estimated sample size for a paired-means-difference CI
Student's t two-sided CI

Study parameters:

        level =   95.0000          sd1 =    0.8803
     Pr_width =    0.9500          sd2 =    0.8774
        width =    0.1000         corr =    0.9974
         sd_d =    0.0637

Estimated sample size:

            N =        14

Finally, I wondered if either code allowed for any other options that I'm unaware of that may allow me to utilise even more from my pilot data?

Thank you so much in advance, any advice is greatly appreciated!

↧

Exposure in mediation model

October 5, 2024, 5:29 pm

≫ Next: Synthetic difference in differences with repeated cross sectional data

≪ Previous: Sample Size Estimation

Hello,

I'm trying to use the mediate command to run a mediation model with a Poisson outcome model. My understanding of Poisson models is that one needs to include an exposure term to capture the size of the population at risk for the outcome modeled. In my case, the outcome is a count of child maltreatment reports, so the exposure would be the size of the child population.

I haven't found a way to include this exposure term in either the outcome or the mediator model, even though it is possible to specify that the outcome and/or mediator models should be Poisson.

Thank you!

↧

Synthetic difference in differences with repeated cross sectional data

October 6, 2024, 8:03 am

≫ Next: Regression in a Panel-Dataset

≪ Previous: Exposure in mediation model

Hi All,

I want to get a better understanding of synthetic difference in differences and use the method in my research. My data set is a repeated cross-sectional survey with 10 provinces over 18 years. For each of the provinces and years, I have individual-level responses. I am seeking some clarification on sDID before I plan my data analysis, as I read a few posts which indicate that sdid may not ideal for my kind of dataset.

1. Is it possible to run sDID with the type of data set I have, i.e., individual-level data for each province over time?

2. If I am to aggregate the data (e.g., take the average) by each province and year and use for sDID, how do I deal with categorical variables, e.g. binary and ordinal variables?

Any thoughts on my questions above will help me kick start my research.

Thanks and best regards,
Nadia

↧

Regression in a Panel-Dataset

October 6, 2024, 8:03 am

≫ Next: Egen max by cutoff age in panel data

≪ Previous: Synthetic difference in differences with repeated cross sectional data

Hello there,

ich have a problem and need help. I need to conduct a regression analysis with an dependent count variable, which is heavily overdispersed, most of the datapoints are zero or close to zero. However our independent variables are binare (Dummy Variables) that we need to use as moderators. With our Dummy Variables we have extreme differences (for example not many 1 and a lot of 0).

If we know use the fix effect our variables are so highly insignificant, and we dont know why, because if we use the xtpoisson that is not correct to use in this situation, we guess, it fits perfectly. The pannel is highly unbalanced and with gaps. Array
This is done with the nbreg and it does not fit and is insignificant:
Array

Doing it with the poisson it looks like that:
Array

Do you have any tipps, or need more information, please contact me. I would highly appreciate it. :D

↧

Egen max by cutoff age in panel data

October 6, 2024, 8:39 am

≫ Next: Elbow plot of AIC and BIC after LCA in gsem

≪ Previous: Regression in a Panel-Dataset

I have some panel data that is somewhat messy in capturing individuals' education, because of a combination of relatively high missingness in the education variable as well as the fact that respondents are fairly different ages at the onset of the survey.

What I want to do, then, is just simply create a variable that is the max value of -educ- (years of education) by the time the respondent is age 21. I've tried a few combinations of bysort (): gen max but have run into trouble because 21 is not the max age of anyone in the survey. Here's some basic data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte id int year byte(age educ)
1 1990 16 10
1 1991 17  .
1 1992 18 12
1 1993 19 12
1 1994 20  .
1 1995 21  .
1 1996 22 13
1 1997 23 14
1 1998 24 15
2 1990 20  .
2 1991 21 14
2 1992 22 14
2 1993 23 14
2 1994 24 14
2 1995 25 14
2 1996 26  .
2 1997 27  .
2 1998 28  .
end

↧

Elbow plot of AIC and BIC after LCA in gsem

October 6, 2024, 6:18 pm

≫ Next: Spatial weight matrix

≪ Previous: Egen max by cutoff age in panel data

Hi,
After conducting Latent Class Analysis (LCA) with different numbers of classes (k), I'm able to get gsem to report the AIC and BIC using "estimates store" and "estimates stats" commands. Does anyone know a convenient way to then plot the AIC and BIC on an "elbow plot" as is common in the LCA literature? Thank you!
Jeremy

↧

Spatial weight matrix

October 6, 2024, 9:14 pm

≫ Next: Encode and generate with different starting number than 1

≪ Previous: Elbow plot of AIC and BIC after LCA in gsem

Dear STATAlist,

How can I modify an exising weight matrix (queen contiguity of 77 provinces, 539 obs.) with another binary variable (whether it is a downwind product). The downwind variable is created in a separate file in pairwise combination (mun and mun1), removing the duplicate id=id1 of the same province. It has left with 5,852 obs. Although the downwind file contains lat-long variables for each province, I cannot create a neighboring identification to remove the non neighboring pair. So, is it possible to use an existing weight.dta file and integrate the downwind variable ? My ultimate goal is to have a weight matrix of provinces that share borders and are downwind province.

An exising weight matrix (weight.dta):
id m1 m2 m3 m4 m5 ..... m77
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
6 0 0 1 0 0
7 0 0 0 0 0
......
77 1 0 0 0 0 ...

A downwind file:

mun1 mun latitude longitude dist_km angle downwind
53 1 15.99495 104.7329 506.80356 506.80356 0
38 1 15.99495 104.7329 1090.5817 1090.5817 0
10 1 15.99495 104.7329 731.87837 731.87837 0
8 1 15.99495 104.7329 324.20785 324.20785 0
32 1 15.99495 104.7329 532.73979 532.73979 0
15 1 15.99495 104.7329 549.61571 549.61571 0
74 1 15.99495 104.7329 569.56714 569.56714 0
63 1 15.99495 104.7329 122.68487 122.68487 1
36 1 15.99495 104.7329 527.72403 527.72403 0
14 1 15.99495 104.7329 132.98626 132.98626 1
18 1 15.99495 104.7329 1065.2679 1065.2679 0
25 1 15.99495 104.7329 87.973974 87.973974 1
60 1 15.99495 104.7329 442.69432 442.69432 0
11 1 15.99495 104.7329 678.89276 678.89276 0
59 1 15.99495 104.7329 590.88932 590.88932 0

......

I reall appreciate your suggestion,
VS

↧

Encode and generate with different starting number than 1

October 7, 2024, 2:41 am

≫ Next: Matching oberservations with different characteristics

≪ Previous: Spatial weight matrix

Dear colleagues,

I am trying to encode and generate a list of 30 villages. I know encode and generate will convert the variable into numeric with value labels of village names in alphanumeric order.

I would like to keep that but I would like that the encode starts the numbering from 901 instead of 1. I don't want to manually add 900 onto the encoded variables because this will remove my value labels and I will have to manually add it.

Anyone knows how to change the starting number for encode, gen?

Thanks,

↧

Matching oberservations with different characteristics

October 7, 2024, 2:49 am

≫ Next: Time-dependent covariate in Cox regression - model taking too long to run - any tips?

≪ Previous: Encode and generate with different starting number than 1

Dear all,

I am combining exact and propensity matching using the function kmatch.
This example code matches male and female patients. First three categories of age are matched using exact matching and then bloodpressure is used for ps-macthing nearest neighbor matching within these age-groups.

Code:

sysuse bplong
keep if when == 1
kmatch ps sex bp, idgenerate idvar(patient) ematch(agegrp) wor nn(1)

What I however seek to do is to match units of different age groups. I.e., a unit of age group of 1 should not be exactly match to another unit of age-group 1 (as it is in the example) but to a unit of age-group 2 or 3.

Could perhaps anyone provide suggestions on how to implement this?

↧

Time-dependent covariate in Cox regression - model taking too long to run - any tips?

October 7, 2024, 3:04 am

≫ Next: Model diagnostics for three-dimensional panel data with reghdfe

≪ Previous: Matching oberservations with different characteristics

Hi!

I'm having a dataset of approx 35,000 patients. The data has been imputed, resulting in 10 imputed datasets. I'm now running a model (see below) on risk of dying post surgery, with a time-dependent covariate "re-surgery at 1 year yes/no". The problem is that it's taking forever to run. I'm now wondering if anyone has any tips for making it quicker? Perhaps it's just the nature of the analysis, I know it's a lot of data to process, but would help with logistics

Code:

mi stset survivaltime, failure(deathatfollowup==1)

mi estimate, saving(model, replace) post hr: stcox b1.gender##b2.agegroups i.bloodgroup i.ecmo i.ethnicity i.cancerstage i.cyt c.surgtime i.surgyear i.icupostsurg c.waitingtime i.surgmalefemale i.surgexperience i.previousabdominalsurg i.hb i.hypertension i.smoking, robust tvc(resurg_1y)

↧

Model diagnostics for three-dimensional panel data with reghdfe

October 7, 2024, 6:49 am

≫ Next: Include mean and standard deviation in my histogram

≪ Previous: Time-dependent covariate in Cox regression - model taking too long to run - any tips?

Hello! I am having some trouble running the necessary model diagnostics for a fixed effects regression undertaken with reghdfe for a three-dimensional panel dataset (index variables are reporting_econ_id, partner_econ_id and year). I first set up the panel structure of my dataset like this:

Code:

encode reportingecon, gen(reporting_econ_id)
encode partnerecon, gen(partner_econ_id)

egen panel_id = group(reporting_econ_id partner_econ_id)

xtset panel_id year

The resulting dataset looks like this:

Code:

 Observations:        12,200                  
    Variables:            65                  7 Oct 2024 08:40
-----------------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-----------------------------------------------------------------------------------------
partnerecon     str14   %14s                  PartnerEcon
reportingecon   str37   %37s                  ReportingEcon
agree           float   %9.0g                
year            int     %8.0g                 Year
idealpointdis~e float   %9.0g                 IdealPointDistance
region          str27   %27s                  
buildingprodu~x float   %9.0g                 Building Productive Capacity.x
economicinfra~x float   %9.0g                 Economic Infrastructure.x
tradepolicyre~x float   %9.0g                 Trade Policy & Regulations.x
traderelateda~x float   %8.0g                 Trade-related Adjustment.x
total_aftx      float   %9.0g                 Total_AfT.x
col_fr          byte    %8.0g                 COL_FR
col_uk          byte    %8.0g                 COL_UK
col_us          byte    %8.0g                 COL_US
col_jp          byte    %8.0g                 COL_JP
dist            float   %8.0g                
distcap         float   %8.0g                
distw           float   %8.0g                
distwces        float   %8.0g                
inflowsofasyl~s double  %10.0g                Inflows of asylum seekers
inflowsoffore~n long    %8.0g                 Inflows of foreign population
incomegroup     int     %8.0g                 IncomeGroup
hdi             double  %10.0g                
le              float   %9.0g                
mys             float   %8.0g                
gni_capita      double  %10.0g                
lldcs           byte    %8.0g                 LLDCs
count_interna~d int     %8.0g                
count_intrast~e int     %8.0g                
count_interst~e int     %8.0g                
share_world_m~h float   %8.0g                 Share_World_Merch
civ_liberties   byte    %8.0g                
pol_freedom     byte    %8.0g                
balanced_exp    float   %8.0g                 Balanced_EXP
balanced_imp    float   %8.0g                 Balanced_IMP
final_exp       float   %8.0g                 Final_EXP
final_imp       float   %8.0g                 Final_IMP
goodsexports    float   %8.0g                 GoodsExports
goodsimports    float   %8.0g                 GoodsImports
net_fdi         double  %8.0g                 Net_FDI
sum_total_aft   float   %8.0g                
other_aft       float   %8.0g                
net_oda         float   %8.0g                 Net_ODA
gdp             float   %8.0g                 GDP
natural_resou~s float   %8.0g                
reporting_eco~d long    %37.0g     reporting_econ_id
                                              ReportingEcon
partner_econ_id long    %14.0g     partner_econ_id
                                              PartnerEcon
panel_id        float   %9.0g                 group(reporting_econ_id partner_econ_id)
mean_totalaft   float   %9.0g                
ln_aft          float   %9.0g                
ln_gnicap       float   %9.0g                
ln_servicesexp  float   %9.0g                
ln_servicesimp  float   %9.0g                
ln_goodsexp     float   %9.0g                
ln_goodsimp     float   %9.0g                
ln_fdi          float   %9.0g                
ln_gdp          float   %9.0g                
ln_otheraft     float   %9.0g                
ln_oda          float   %9.0g                
ln_dist         float   %9.0g                
partner_us      float   %9.0g                
partner_jp      float   %9.0g                
partner_fr      float   %9.0g                
partner_uk      float   %9.0g                
partner_ge      float   %9.0g                
-----------------------------------------------------------------------------------------
Sorted by: panel_id  year

I run several nested models, all of which present a good (<0.001) F-statistic. I am presenting below the last model I run and its output:

Code:

reghdfe ln_aft ///
    c.hdi#i.partner_econ_id ///
    c.ln_gnicap#i.partner_econ_id ///
    c.pol_freedom#i.partner_econ_id ///
    lldcs#i.partner_econ_id ///
    c.share_world_merch#i.partner_econ_id ///
    c.count_internationalised#i.partner_econ_id ///
    c.count_intrastate#i.partner_econ_id ///
    c.agree#i.partner_econ_id ///
    c.ln_dist#i.partner_econ_id ///
    c.inflowsofforeignpopulation#i.partner_econ_id ///
    c.col_fr#i.partner_fr ///
    c.col_uk#i.partner_uk ///
    c.col_us#i.partner_us ///
    c.ln_servicesexp#i.partner_econ_id ///
    c.ln_servicesimp#i.partner_econ_id ///
    c.ln_goodsexp#i.partner_econ_id ///
    c.ln_goodsimp#i.partner_econ_id ///
    c.ln_fdi#i.partner_econ_id ///
    c.ln_gdp#i.partner_econ_id ///
    c.ln_otheraft#i.partner_econ_id ///
    c.ln_oda#i.partner_econ_id ///
    c.natural_resources#i.partner_econ_id, ///
    absorb(reporting_econ_id year) ///
    vce(cluster reporting_econ_id)

Code:

HDFE Linear regression                            Number of obs   =      3,394
Absorbing 2 HDFE groups                           F( 101,    105) =   21660.24
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.6215
                                                  Adj R-squared   =     0.5958
                                                  Within R-sq.    =     0.3635
Number of clusters (reporting_econ_id) =        106Root MSE       =     1.5892

                            (Std. err. adjusted for 106 clusters in reporting_econ_id)
--------------------------------------------------------------------------------------
                     |               Robust
              ln_aft | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
---------------------+----------------------------------------------------------------
     partner_econ_id#|
               c.hdi |
             France  |  -8.365468   4.863386    -1.72   0.088    -18.00866    1.277727
            Germany  |   3.294893   4.579765     0.72   0.473    -5.785934    12.37572
              Japan  |  -.2702511   5.346708    -0.05   0.960    -10.87179    10.33128
     United Kingdom  |  -21.16168   8.192214    -2.58   0.011    -37.40532    -4.91803
      United States  |   3.296014   4.816464     0.68   0.495    -6.254144    12.84617
                     |
     partner_econ_id#|
         c.ln_gnicap |
             France  |   1.654401   .3692053     4.48   0.000     .9223349    2.386467
            Germany  |  -.2690675   .3143735    -0.86   0.394     -.892412     .354277
              Japan  |   .1885545   .4065309     0.46   0.644    -.6175211    .9946301
     United Kingdom  |   1.550292    .751632     2.06   0.042     .0599441    3.040639
      United States  |  -.1473826   .3312212    -0.44   0.657    -.8041331    .5093679
                     |
     partner_econ_id#|
       c.pol_freedom |
             France  |   -.020727   .1377542    -0.15   0.881    -.2938681    .2524141
            Germany  |  -.2221254    .093777    -2.37   0.020    -.4080678    -.036183
              Japan  |  -.1019459     .09783    -1.04   0.300    -.2959248     .092033
     United Kingdom  |  -.0557523   .2033243    -0.27   0.784    -.4589068    .3474023
      United States  |  -.0473356   .1130292    -0.42   0.676    -.2714517    .1767804
                     |
               lldcs#|
     partner_econ_id |
          0#Germany  |     19.686   8.013725     2.46   0.016     3.796268    35.57574
            0#Japan  |   33.09791   7.802076     4.24   0.000     17.62784    48.56799
   0#United Kingdom  |   14.82509   14.15851     1.05   0.297    -13.24861    42.89879
    0#United States  |   1.247056    8.29337     0.15   0.881    -15.19716    17.69128
           1#France  |  -2.582377   8.305405    -0.31   0.756    -19.05046    13.88571
          1#Germany  |    17.4672   7.124526     2.45   0.016      3.34058    31.59382
            1#Japan  |   30.54564    7.63674     4.00   0.000      15.4034    45.68789
   1#United Kingdom  |   15.82774   14.82191     1.07   0.288    -13.56137    45.21685
    1#United States  |          0  (omitted)
                     |
     partner_econ_id#|
 c.share_world_merch |
             France  |   5.06e-09   1.56e-09     3.25   0.002     1.97e-09    8.15e-09
            Germany  |   7.84e-09   1.87e-09     4.20   0.000     4.14e-09    1.15e-08
              Japan  |   1.33e-09   1.75e-09     0.76   0.450    -2.14e-09    4.80e-09
     United Kingdom  |   4.03e-09   2.98e-09     1.35   0.180    -1.89e-09    9.94e-09
      United States  |  -5.86e-10   1.88e-09    -0.31   0.756    -4.32e-09    3.15e-09
                     |
     partner_econ_id#|
                  c. |
count_internationa~d |
             France  |  -.1853612   .3224705    -0.57   0.567    -.8247606    .4540381
            Germany  |   .3177221   .2043412     1.55   0.123    -.0874488     .722893
              Japan  |  -1.002047   .3420969    -2.93   0.004    -1.680362   -.3237322
     United Kingdom  |   1.223445   .3890161     3.14   0.002     .4520983    1.994792
      United States  |   .3365063   .2292649     1.47   0.145    -.1180837    .7910963
                     |
     partner_econ_id#|
  c.count_intrastate |
             France  |  -.1750101   .2163461    -0.81   0.420    -.6039844    .2539642
            Germany  |  -.0597992   .1261983    -0.47   0.637    -.3100271    .1904287
              Japan  |   .1070002   .1399799     0.76   0.446     -.170554    .3845544
     United Kingdom  |   .0940048   .2631432     0.36   0.722    -.4277596    .6157693
      United States  |  -.0741203   .2729971    -0.27   0.787    -.6154232    .4671826
                     |
     partner_econ_id#|
             c.agree |
             France  |  -.3751798   2.000095    -0.19   0.852       -4.341     3.59064
            Germany  |  -.8791131   1.258728    -0.70   0.486    -3.374939    1.616713
              Japan  |  -3.347968   1.446109    -2.32   0.023    -6.215335   -.4806007
     United Kingdom  |  -7.730057   4.073688    -1.90   0.061    -15.80743    .3473138
      United States  |   1.982738    1.11721     1.77   0.079    -.2324823    4.197958
                     |
     partner_econ_id#|
           c.ln_dist |
             France  |    1.06597   .4135629     2.58   0.011     .2459514    1.885989
            Germany  |  -.2094744   .2356301    -0.89   0.376    -.6766853    .2577365
              Japan  |  -1.430836   .3959467    -3.61   0.000    -2.215925   -.6457466
     United Kingdom  |   3.303015    1.02918     3.21   0.002      1.26234    5.343689
      United States  |  -.3303684   .4735732    -0.70   0.487    -1.269377    .6086397
                     |
     partner_econ_id#|
                  c. |
inflowsofforeignpo~n |
             France  |   .0000397   .0000323     1.23   0.221    -.0000242    .0001037
            Germany  |  -4.63e-06   9.71e-06    -0.48   0.635    -.0000239    .0000146
              Japan  |   .0000239   .0000188     1.27   0.208    -.0000135    .0000612
     United Kingdom  |   .0000103    .000019     0.54   0.588    -.0000273    .0000479
      United States  |   1.34e-06   7.16e-06     0.19   0.852    -.0000129    .0000155
                     |
 partner_fr#c.col_fr |
                  0  |  -.0216491   .0117221    -1.85   0.068    -.0448919    .0015937
                  1  |          0  (omitted)
                     |
 partner_uk#c.col_uk |
                  0  |   .0311397   .0112371     2.77   0.007     .0088585    .0534208
                  1  |          0  (omitted)
                     |
 partner_us#c.col_us |
                  0  |  -.0453175   .0155515    -2.91   0.004    -.0761532   -.0144818
                  1  |          0  (omitted)
                     |
     partner_econ_id#|
    c.ln_servicesexp |
             France  |  -.1009793   .2092745    -0.48   0.630    -.5159319    .3139733
            Germany  |   -.207759   .1561096    -1.33   0.186    -.5172955    .1017776
              Japan  |  -.2755614   .1402912    -1.96   0.052    -.5537329    .0026101
     United Kingdom  |   .8352024    .433984     1.92   0.057    -.0253078    1.695713
      United States  |  -.1605652    .185467    -0.87   0.389     -.528312    .2071815
                     |
     partner_econ_id#|
    c.ln_servicesimp |
             France  |   .6009642   .3221383     1.87   0.065    -.0377766    1.239705
            Germany  |   .2302156   .1748801     1.32   0.191    -.1165393    .5769705
              Japan  |   .2120279   .1970995     1.08   0.285    -.1787841    .6028398
     United Kingdom  |  -1.694077   .4940615    -3.43   0.001     -2.67371   -.7144445
      United States  |  -.0054276   .1892335    -0.03   0.977    -.3806427    .3697875
                     |
     partner_econ_id#|
       c.ln_goodsexp |
             France  |   .1951882    .097819     2.00   0.049     .0012312    .3891453
            Germany  |   .0984622   .0580864     1.70   0.093    -.0167125    .2136368
              Japan  |   .0798342   .0472478     1.69   0.094    -.0138493    .1735178
     United Kingdom  |  -.1348263   .2461635    -0.55   0.585    -.6229231    .3532705
      United States  |   .0000854   .0625908     0.00   0.999    -.1240207    .1241915
                     |
     partner_econ_id#|
       c.ln_goodsimp |
             France  |   .1697972   .1846939     0.92   0.360    -.1964168    .5360111
            Germany  |   .3880155   .1473953     2.63   0.010     .0957578    .6802732
              Japan  |   .2614983   .1054035     2.48   0.015     .0525025     .470494
     United Kingdom  |   2.576976   .6400928     4.03   0.000     1.307791    3.846162
      United States  |   .1719541   .1727773     1.00   0.322    -.1706314    .5145395
                     |
     partner_econ_id#|
            c.ln_fdi |
             France  |  -.0145596   .1069726    -0.14   0.892    -.2266665    .1975473
            Germany  |   .1115391   .0939308     1.19   0.238    -.0747083    .2977864
              Japan  |   .0547202   .0833109     0.66   0.513      -.11047    .2199104
     United Kingdom  |  -.2255714   .2416279    -0.93   0.353    -.7046748    .2535321
      United States  |   .2981682   .0998416     2.99   0.004     .1002007    .4961356
                     |
     partner_econ_id#|
            c.ln_gdp |
             France  |   .2970983   .7400153     0.40   0.689    -1.170215    1.764412
            Germany  |   .3629671   .6773162     0.54   0.593    -.9800258     1.70596
              Japan  |   .6390817   .6335713     1.01   0.315    -.6171731    1.895337
     United Kingdom  |  -.2613469   .9532113    -0.27   0.784    -2.151389    1.628695
      United States  |   .7210528   .7093308     1.02   0.312    -.6854192    2.127525
                     |
     partner_econ_id#|
       c.ln_otheraft |
             France  |  -.3257992   .1550674    -2.10   0.038    -.6332691   -.0183292
            Germany  |  -.4180295   .1444829    -2.89   0.005    -.7045124   -.1315466
              Japan  |  -.3335356   .1180767    -2.82   0.006    -.5676599   -.0994112
     United Kingdom  |   .0687922   .3324009     0.21   0.836    -.5902974    .7278818
      United States  |  -.7653098    .162958    -4.70   0.000    -1.088425   -.4421941
                     |
     partner_econ_id#|
            c.ln_oda |
             France  |   .5525117   .1661286     3.33   0.001     .2231093    .8819141
            Germany  |   .4890086   .1321145     3.70   0.000     .2270501    .7509672
              Japan  |    .183862   .1384814     1.33   0.187    -.0907211    .4584451
     United Kingdom  |   .0938438   .2027709     0.46   0.644    -.3082134     .495901
      United States  |   .8958777   .2354309     3.81   0.000     .4290618    1.362694
                     |
     partner_econ_id#|
 c.natural_resources |
             France  |   .0266695   .0220096     1.21   0.228    -.0169715    .0703105
            Germany  |   .0160756   .0178215     0.90   0.369    -.0192611    .0514123
              Japan  |   .0079434   .0157231     0.51   0.614    -.0232325    .0391194
     United Kingdom  |   -.039139   .1104802    -0.35   0.724    -.2582009    .1799229
      United States  |   .0093805   .0170574     0.55   0.584    -.0244412    .0432021
                     |
               _cons |  -36.39202   14.46701    -2.52   0.013    -65.07744   -7.706611
--------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------------+
       Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------------+---------------------------------------|
 reporting_econ_id |       106         106           0    *|
              year |        11           1          10     |
-----------------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

However, I am now running into issues trying to run model diagnostics, such as B-P/LM test using xttest2 or Pasaran CD test using xtcsd. For xttest2, I receive the following error: "Error: too few common observations across panel to compute full rank VCE"; while for xtcsd I receive the following "Error: The panel is highly unbalanced. Not enough common observations across panel to perform Pesaran's test. Insufficient observations". I believe these erros might be due to the fact that my panel dataset has panel_id as the cross-section identifier. How can I fix these issues and still run my model diagnostics? Thank you in advance!

↧

Include mean and standard deviation in my histogram

October 7, 2024, 7:56 am

≫ Next: Replicating Bar Graph From Article (Flors-Mas 2021)

≪ Previous: Model diagnostics for three-dimensional panel data with reghdfe

Hello everyone,

I'm stuggeling to add (vertical) lines for the mean and the standard deviation to my histogram. Attached you find the histogram showing the physical attractiveness by gender and migration status. I think I should first create means for all six groups and then include them in the histogram command. Can anyone advise me in the best way to do this?

This is my code to create the histogram:

Code:

    egen mig_sex=group(sex migstatus)
        tab mig_sex
        table migstatus sex mig_sex
        lab def mig_sex 1 "Männlich: Keine Einwanderungsgeschichte" 2 "Erste Generation" 3 "Zweite Generation" ///
                        4 "Weiblich: Keine Einwanderungsgeschichte" 5 "Erste Generation" 6 "Zweite Generation"
        lab val mig_sex mig_sex
        tab migstatus sex
        fre mig_sex

    tab migstatus att if sex==0, chi2 V
    tab migstatus att if sex==1, chi2 V
    
    tab mig_sex if att!=.
    
    forvalues i=1/6    {
        sum mig_sex if mig_sex==`i' & att!=.
        local n`i' = r(N)
    }
    
    gen double wt=round(cdweight*100)
    
            histogram att [fweight=wt], by(mig_sex, row(2) title("Verteilung der physischen Attraktivität, erste Welle: Einwanderungsgeschichte" " ")                                ///
                                   note(`"{bf:Männlich}: Keine Einwanderungsgeschichte N = `n1', Erste Generation N = `n2', Zweite Generation N = `n3'"'                                ///
                                        `"{bf:Weiblich}: Keine Einwanderungsgeschichte N = `n4', Erste Generation N = `n5', Zweite Generation N = `n6'"', size(vsmall)) legend(off))    ///
                                percent discrete name(mig_hist, replace)                                ///
                                addlabel addlabopts(mlabposition(12) mlabformat(%4.1f) mlabsize(vsmall))    ///
                                yscale(r(0(10)40)) xscale(r(1(1)7))                                     ///
                                xlabel(1 "Sehr unattraktiv" 2 3 4 5 6 7 "Sehr attraktiv", angle(45))    ///
                                xtitle("Physische Attraktivität" " ")                                    ///
                                scheme(plotplainblind)

This is what the histogram looks like at the moment:

Array
Online I found some code to add vertical lines for mean and SD. I'm able to recreate a simple histogram with the lines but I'm struggling to implement it in my more complicate (loop) case.

Code:

summarize att
local m=r(mean)
local sd=r(sd)
local low = `m'-`sd'
local high=`m'+`sd'

histogram att, ///
fc(none) lc(green) xline(`m') ///
xline(`low', lc(blue)) xline(`high', lc(blue)) scale(0.5) ///
text(0.12 `m' `"mean = $`=string(`m',"%6.2f")'"', ///
color(red) orientation(vertical) placement(2))

addlabel addlabopts(mlabposition(12) mlabformat(%4.1f) mlabsize(vsmall))

Thank you in advance!

↧