High-dimensional FE & Marginsplot

October 3, 2024, 3:13 am

≫ Next: Applying different colors to data points when using aaplot

≪ Previous: Bar graph coloring with two over categories

Hello All,

I am trying to find the impact of firms and other macro-economic variables on exports.

I am interested in the relationship between firms and exports across different regions (OECD / non-OECD).

To test this relationship, I use the following command:

Code:

reghdfe Exports Firms RD Reg FDI i.OECD#(c.Firms),absorb (ccode year) cluster (ccode)

. reghdfe Exports Firms RD Reg FDI i.OECD#(c.Firms),absorb (ccode year) cluster (
> ccode)
(dropped 11 singleton observations)
(MWFE estimator converged in 5 iterations)

HDFE Linear regression                            Number of obs   =        289
Absorbing 2 HDFE groups                           F(   5,     81) =       7.19
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.9969
                                                  Adj R-squared   =     0.9955
                                                  Within R-sq.    =     0.0441
Number of clusters (ccode)   =         82         Root MSE        =  6.435e+09

                                 (Std. err. adjusted for 82 clusters in ccode)
------------------------------------------------------------------------------
             |               Robust
     Exports | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       Firms |  -151498.9   33226.54    -4.56   0.000    -217609.3   -85388.46
          RD |   8.18e+09   5.16e+09     1.58   0.117    -2.09e+09    1.85e+10
         Reg |  -2.90e+09   4.56e+09    -0.64   0.526    -1.20e+10    6.17e+09
         FDI |   -2489519    6286609    -0.40   0.693    -1.50e+07    1.00e+07
             |
OECD#c.Firms |
          1  |   127215.7   43671.96     2.91   0.005      40322.2    214109.2
             |
       _cons |   2.96e+10   5.75e+09     5.14   0.000     1.81e+10    4.10e+10
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       ccode |        82          82           0    *|
        year |         4           1           3     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

The model shows there is a statistical difference between OECD and non-OECD regions, with firms leading to a decrease in exports for OECD and an increase in firms for non-OECD.

so I proceed with a marginsplot to visualize the relationship.

Code:

reg Exports Firms RD Reg FDI OECD c.OECD#c.Firms

      Source |       SS           df       MS      Number of obs   =       300
-------------+----------------------------------   F(6, 293)       =     14.43
       Model |  6.0689e+23         6  1.0115e+23   Prob > F        =    0.0000
    Residual |  2.0532e+24       293  7.0077e+21   R-squared       =    0.2281
-------------+----------------------------------   Adj R-squared   =    0.2123
       Total |  2.6601e+24       299  8.8968e+21   Root MSE        =    8.4e+10

--------------------------------------------------------------------------------
       Exports | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
         Firms |   72683.64   60353.12     1.20   0.229    -46096.93    191464.2
            RD |   3.14e+10   6.75e+09     4.65   0.000     1.81e+10    4.46e+10
           Reg |   1.17e+10   8.28e+09     1.41   0.158    -4.58e+09    2.80e+10
           FDI |   2.10e+08   2.66e+08     0.79   0.431    -3.13e+08    7.33e+08
          OECD |   3.98e+10   1.50e+10     2.66   0.008     1.04e+10    6.93e+10
               |
c.OECD#c.Firms |   374095.9   97669.49     3.83   0.000     181873.2    566318.6
               |
         _cons |  -4.86e+10   1.48e+10    -3.28   0.001    -7.78e+10   -1.94e+10
--------------------------------------------------------------------------------


margins, at (Firms=(0(10000)200000) OECD==(0 1))

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         _at |
          1  |  -5.99e+09   9.93e+09    -0.60   0.547    -2.55e+10    1.36e+10
          2  |   3.38e+10   8.71e+09     3.89   0.000     1.67e+10    5.10e+10
          3  |  -5.27e+09   9.73e+09    -0.54   0.589    -2.44e+10    1.39e+10
          4  |   3.83e+10   8.37e+09     4.58   0.000     2.18e+10    5.48e+10
          5  |  -4.54e+09   9.56e+09    -0.47   0.635    -2.34e+10    1.43e+10
          6  |   4.28e+10   8.09e+09     5.29   0.000     2.69e+10    5.87e+10
          7  |  -3.81e+09   9.43e+09    -0.40   0.686    -2.24e+10    1.47e+10
          8  |   4.72e+10   7.88e+09     6.00   0.000     3.17e+10    6.27e+10
          9  |  -3.09e+09   9.33e+09    -0.33   0.741    -2.15e+10    1.53e+10
         10  |   5.17e+10   7.73e+09     6.69   0.000     3.65e+10    6.69e+10
         11  |  -2.36e+09   9.28e+09    -0.25   0.799    -2.06e+10    1.59e+10
         12  |   5.62e+10   7.67e+09     7.33   0.000     4.11e+10    7.13e+10
         13  |  -1.63e+09   9.26e+09    -0.18   0.860    -1.99e+10    1.66e+10
         14  |   6.06e+10   7.68e+09     7.90   0.000     4.55e+10    7.58e+10
         15  |  -9.06e+08   9.28e+09    -0.10   0.922    -1.92e+10    1.74e+10
         16  |   6.51e+10   7.77e+09     8.38   0.000     4.98e+10    8.04e+10
         17  |  -1.79e+08   9.34e+09    -0.02   0.985    -1.86e+10    1.82e+10
         18  |   6.96e+10   7.93e+09     8.77   0.000     5.40e+10    8.52e+10
         19  |   5.48e+08   9.44e+09     0.06   0.954    -1.80e+10    1.91e+10
         20  |   7.40e+10   8.17e+09     9.07   0.000     5.80e+10    9.01e+10
         21  |   1.27e+09   9.57e+09     0.13   0.894    -1.76e+10    2.01e+10
         22  |   7.85e+10   8.46e+09     9.28   0.000     6.19e+10    9.52e+10
         23  |   2.00e+09   9.74e+09     0.21   0.837    -1.72e+10    2.12e+10
         24  |   8.30e+10   8.82e+09     9.41   0.000     6.56e+10    1.00e+11
         25  |   2.73e+09   9.95e+09     0.27   0.784    -1.69e+10    2.23e+10
         26  |   8.75e+10   9.22e+09     9.48   0.000     6.93e+10    1.06e+11
         27  |   3.46e+09   1.02e+10     0.34   0.735    -1.66e+10    2.35e+10
         28  |   9.19e+10   9.67e+09     9.50   0.000     7.29e+10    1.11e+11
         29  |   4.18e+09   1.05e+10     0.40   0.689    -1.64e+10    2.48e+10
         30  |   9.64e+10   1.02e+10     9.48   0.000     7.64e+10    1.16e+11
         31  |   4.91e+09   1.07e+10     0.46   0.648    -1.62e+10    2.61e+10
         32  |   1.01e+11   1.07e+10     9.44   0.000     7.98e+10    1.22e+11
         33  |   5.64e+09   1.11e+10     0.51   0.611    -1.61e+10    2.74e+10
         34  |   1.05e+11   1.12e+10     9.37   0.000     8.32e+10    1.27e+11
         35  |   6.36e+09   1.14e+10     0.56   0.577    -1.61e+10    2.88e+10
         36  |   1.10e+11   1.18e+10     9.29   0.000     8.65e+10    1.33e+11
         37  |   7.09e+09   1.18e+10     0.60   0.547    -1.61e+10    3.03e+10
         38  |   1.14e+11   1.24e+10     9.21   0.000     8.98e+10    1.39e+11
         39  |   7.82e+09   1.22e+10     0.64   0.521    -1.61e+10    3.17e+10
         40  |   1.19e+11   1.30e+10     9.11   0.000     9.31e+10    1.44e+11
         41  |   8.54e+09   1.25e+10     0.68   0.497    -1.62e+10    3.32e+10
         42  |   1.23e+11   1.37e+10     9.02   0.000     9.63e+10    1.50e+11
------------------------------------------------------------------------------


marginsplot

Array

Would the margins plot be interpreted as:
- The effect of the level of development (OECD) on exports
- Exports increase at different speeds for OECD and non OECD nations
- Non-OECD nations will export significantly more, as firms increase

Isn’t there a conflict between the results of the model and the marginsplot?

Are both procedures correct for this type of model?

Data is attached.

Thanks in advance

↧

Applying different colors to data points when using aaplot

October 3, 2024, 4:59 am

≫ Next: Matrix type mismatch when multiplying

≪ Previous: High-dimensional FE & Marginsplot

Hi Stata Users,
I am using a Stata user written command aaplot and would like to colour code the points using the category variable. I am using the syntax and the dataset below

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(value_2014 growth category)
21  1 1
11  5 1
20  6 2
16  7 3
14  3 3
18  3 1
19  1 2
20  4 3
21 10 1
22 11 2
10  2 3
17  5 3
18  9 3
12  7 3
11  8 2
10  6 2
14  5 2
17  7 1
20 11 1
22 12 1
21 15 1
15  2 1
15  5 2
17  8 2
19  9 2
10  6 3
10  4 3
14  4 3
20  5 1
16  7 2
19  8 3
18  9 1
10  3 1
11  9 1
12  8 2
14  7 2
15  3 2
18  5 3
19  5 1
13  3 2
14  9 3
15  2 3
15  1 3
16 10 2
17 12 1
19 11 2
18 16 3
12  6 2
10  9 2
20  2 2
end

Code:

aaplot growth value_2014

Thanks in advance!

↧

Matrix type mismatch when multiplying

October 3, 2024, 5:00 am

≫ Next: anova and post hoc

≪ Previous: Applying different colors to data points when using aaplot

I am trying to construct fitted values and confidence intervals for these predictions. Constructing the fitted values works fine, but trying to compute the variance-covariance matrix leads to type mismatch error.

I have the covariates saved in a matrix X_D. I extracted the variance-covariance matrix of coefficients from a regression and saved it with the tempname `varcov_D'. The coefficients themselves are saved in the matrix `b_all_D'.

When I run

Code:

matrix list X_D
matrix list `varcov_D'

I receive the output
X_D[48,14]
symmetric __00000R[14,14]

As best as I can tell, the content of both matrices is numeric, although some columns in X_D are either 0 or 1.

Running

Code:

matrix var_y_D = X_D * `varcov_D'

Then gives the error

type mismatch
r(109);

When I previously ran

Code:

matrix fit_D = X_D * (`b_all_D')'

everything went smoothly.
I am quite at my wit's end. Does anybody have an idea what the issue is or might be?

↧

anova and post hoc

October 3, 2024, 6:05 am

≫ Next: Accounting for baseline in a multiple outcomes radiology analysis. Help!

≪ Previous: Matrix type mismatch when multiplying

Is it possible that the anova is not significant for the interaction effect (e.g. diagnosis and time) and then by doing the post hoc tests we find significance in some comparisons of the interactions? thanks in advance to everybody

↧

Accounting for baseline in a multiple outcomes radiology analysis. Help!

October 3, 2024, 7:03 am

≫ Next: Rescaling a 7-Point Likert Scale

≪ Previous: anova and post hoc

Dear all,
I have a study with different outcomes represented by differences in brain areas measured at two different timepoints. The length of followup varies from subject to subject and is indicated by the variable "Months_between_MRI Diagnosis".

The areas are indicated starting from the first, "a2", while the others go progressively up to 67 (a3, a4...a67).

So the outcomes of my analysis are "deltaa2", "deltaa3"..."deltaa67".

The goal of the analysis is to understand whether the delta of each area depends on the followup time or not.

I am using the "wyoung" command to perform a p correction for multiple testing.

If I didn't consider the baseline, the command would be:

Code:

wyoung deltaa2-deltaa67, cmd(regress OUTCOMEVAR Gender Months_between_MRI Diagnosis) familyp(Months_between_MRI) bootstraps(1000) seed(20) replace

However, if I wanted to include the baseline, for each area, how could I do it?
It's a problem, because the wyoung command corrects all the regression models for the same covariates, while the baseline of the different areas are obviously different from each other.

Does anyone have an idea? I really don't know how to get out of this problem.

For convenience, below I report the dataset for the first 5 areas

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id Age Gender) float(deltaa2 deltaa3 deltaa4 deltaa5 baselinea2 baselinea3 baselinea4 baselinea5)
 1 30 0   .0019999924  -.02300009 -.036999952  -.23999995 2.406 2.544 1.987 2.917
 2 36 0          .739   .05800011    .3919999   .26099998 1.969 2.105 1.668 1.083
 3 34 0    -.12099997       -.182        .661        .038 2.637 2.274 1.796  .944
 4 44 0    -.13100006  -.14399996    .9399999        .603 1.936 2.137 1.374   .83
 5 49 0          .207       -.337    .7459999 -.003000028 1.783 2.322 1.403  .945
 6 25 0     -.3330001       -.789        .313      -1.806 3.064 2.445 1.973 3.265
 7 24 0     .12799993       -.327       1.289        .495 2.386 2.184  1.28  .905
 8 15 1     -.7780001   -.1940001    .2489999       -.928 3.226 2.501 2.098 3.217
 9 32 0         -.329   .51400006    .7830001        .808 2.497 1.735 1.562  .935
10 13 0         -.284  -.57399994    .3190001        .608  2.35 2.353 1.821 1.099
11 49 0      .9899999   .58000004        .916        1.46 1.646 1.771 1.396   .79
12 67 1     .14900011   .17399994        .541  -.22399995 3.412 2.567 2.179 3.151
13 47 0    -.26400006   .17299993  -.20400003        .587 2.712 2.453  2.15 2.979
14 30 1     -.7479999       -.673        .488      -2.184 3.253  2.56  2.13 3.245
15 22 0     -.7120001  -.03599991       -.406      -1.771 2.609 2.617 2.101 3.211
16 55 0     -.5170001       -.405         .23      -1.223 2.969  2.57 2.125 2.792
17 28 0     -.2819999  -.23100004    .6030001        .828 2.468 2.373 1.797 1.965
18 44 0          .622    .2019999        .525       2.658 1.987 2.293 1.431 1.117
19 35 0    -.04399993   .02899999   1.0109999        .753 2.145 2.333 1.375 1.023
20 20 1     -.3540001  -.13600005   .04399997       -.352 3.204 2.571 2.167 3.707
21 18 0      .5030001   -.4240001    .4259999        .991 1.864  2.84 1.822 1.295
22 54 0      .9620001   .17199998       1.673       1.788 2.043 2.006  1.06  .785
23 39 0    -.20200007   .03000001   .59200007    .3920001 2.272 1.983 1.575 2.005
24 21 0     -.7720001       -.414        .398       -1.82 2.865 2.237 2.006 3.356
25 24 0      .7089999   -.6439999       1.101    .5389999 2.039 2.561 1.637 1.574
26 38 1     .04400001    .4290001    .7620001        .379 2.492 1.823  1.73 1.453
27 33 0      .3289999   -.4630001    .8069999   .22400004 2.113 2.546 1.392 1.468
28 75 1         1.193    .5079999   .11899997        .309  1.88 1.753 1.633 1.648
29 38 1     -.6810001  -.57400006   .04899991  -1.1570001  2.84 2.458 2.097 3.077
30 19 1         -.664   -.4990001   .11499997      -1.177 2.615 2.619 2.267   3.2
31 44 0     -.8909999   -.3210001   .16800007        -.67 3.337 2.539 2.285 2.756
32 45 0          .213    .3359999        .772         .74 2.284 1.599  1.35  .837
33 34 0          .614         .51       1.039       1.702 2.965 2.174   .97  .945
34 12 1      .4209999  -.26199993   .02200002 -.019000005 2.077 2.373 1.856 1.575
35 53 0     1.1710001    .7800001       1.607       1.911 1.851 1.804 1.223 1.132
36 45 1             0           0           0           0     0     0     0     0
37 20 0        -1.591       -1.62      -1.253       -1.57 1.591  1.62 1.253  1.57
38 50 0     -.5209999  -.11800011   .27200004        .252 2.809 2.409 1.905 1.896
39 47 1         -.883       -.488    .8060001       -1.64 3.199  2.39 2.127 3.321
40 40 0          .424        .443       1.588        .862 2.343 1.909 1.219 1.089
41 40 1      .4859999    .6459999    .5619999    .6940001 2.479 1.967 1.689  1.69
42 63 1    .008999996   -.4620001        .357       -.934 2.945  2.69 2.268 3.273
43 34 0             0           0           0           0     0     0     0     0
44 24 0        -2.191      -2.684      -1.053      -1.124 2.191 2.684 1.053 1.124
45 63 0     -.6969999   -.5000001 -.019000055      -1.937 2.755 2.712 2.102 3.464
46 40 1         -.781       -.394    .0510001      -1.963 3.278 2.581 1.964 3.181
47 33 0             0           0           0           0     0     0     0     0
48 44 0    -.13000004 -.005999957       1.058  -.04499999 2.101 1.836 1.228 1.035
49 27 1             0           0           0           0     0     0     0     0
50 19 0          .398   -.2399999        .647    .9030001 2.059  2.38 1.573 1.409
51 52 1    .019999957   .13000005        .645        .066 2.451 2.162 1.693 1.008
52 30 0          .488   -.1919999        .915        .474 1.946 2.204 1.441  .945
53 14 0    -1.1639999       -.708       -.246      -2.251   3.4 2.573 2.047 3.414
54 66 0    .063000105       -.214       1.225    .6380001  2.35 2.053 1.158 1.177
55 63 0    -.56600004   .11599991  -.11699998  -.14500012 2.711 2.201 2.041 3.242
56 36 1        -1.102        .003   .28900003      -1.459  3.28 2.372 2.023 3.276
57 18 1     -.6940001       -.381  -.10599995      -1.207 3.001 2.347 2.074  3.15
58 37 1     -.7559999   -.7770001  -.09600002      -2.003 3.166 2.406 1.917 3.173
59 50 0         1.307   .24700004    .8360001   .54899997 1.543 2.144 1.502 1.149
60 33 0     -.6990001   -.6819999    .1060001      -1.595 2.772 2.576 2.113 3.672
61 49 0         -.452  -.12399995        .315   -.8080001 3.179  2.62 2.113 2.987
62 48 1    -.09499995  -.22099994    .1659999    .2479999 2.739 2.592  1.98 3.332
63 20 1         -.798       -.434   .08999995   -.8780001 2.966 2.339 2.108 3.234
64 72 1     .07299996   .53900003        .226       1.267 2.115  2.03 1.958 1.716
66 45 1     -.4070001  -.44300005  -.12899995      -1.443 3.188 2.595 2.125 3.444
67 53 1    -1.1020001   -.1880001        .867      -1.405 2.764 2.459 2.037 3.434
68 24 0    -.22599994   .14999995 -.021000044   .36200005 2.837 2.354 2.117 3.075
69 61 1    -.28399992  -.08000001   -.1970001   .15700006 3.125 2.465 2.501 2.852
70 43 0     -.3899999       -.697  -.21200003      -1.397 2.793 2.358 2.145 3.063
71 50 0     -.4679999   -.2589999    .3410001  -1.1519998 3.348 2.442 2.151 3.157
72 25 0      .4440001   .17399994         1.4        .356 2.147 1.886 1.399 1.252
73 16 0         1.297    .7469999        .296       1.918  1.42 1.846 1.569  1.71
74 29 0 1.4305114e-08       -.473    .9079999    .7419999 1.985 2.349 1.603 1.165
75 42 1     -.3659999   -.2980001  -.16099995   -.4830001 3.332 2.589 2.259 3.425
76 50 0     .03100011    .1570001        .842  -.10600004 2.132 1.996 1.536 1.378
77 41 0         -.396       -.659  -.26399997      -1.815 2.639 2.418 2.071 3.061
78 24 0     .28700003         -.6       1.004   .17199996  2.42 2.572 1.813 1.039
79 49 1     .10500007  -.11799993   1.1110001        .317 2.144 2.275 1.319   1.4
80 39 0      .1520001       -.434   1.0070001   .33699995 2.317 2.228 1.271  .889
81 17 1     .53999996   .05200009        .829    .7009999  2.28 1.983 1.408 1.547
82 23 1   -.003000055   .15999997    .8239999    .3749999 2.586 1.796 1.299 1.198
83 57 1     -.8339999       -.665   .02900001       -.961  3.07  2.55 2.234 3.205
84 21 1         1.475   -.1119999        .402   .53900003 1.591 2.377 1.562  .936
85 46 0          .719    .3800001        .465   .56799996 2.047 2.293 1.735 1.281
end

↧

Rescaling a 7-Point Likert Scale

October 3, 2024, 8:07 am

≫ Next: loop with rowtotal across categorical data

≪ Previous: Accounting for baseline in a multiple outcomes radiology analysis. Help!

Hello everyone,

I have conducted a survey where all responses are based on a 7-point Likert scale. To test the robustness of my findings, I am considering rescaling the responses to a 4-point Likert scale, eliminating the neutral option.

Does this approach make sense for my analysis?

If so, could anyone provide guidance or Stata code for performing this rescaling?

Thank you!

↧

loop with rowtotal across categorical data

October 3, 2024, 8:26 am

≫ Next: Resolving Data Inconsistencies in Survey Responses

≪ Previous: Rescaling a 7-Point Likert Scale

Dear All,

I have several categorical variables for each member of the hh indicating whether the person leave in the house within the farm, or close to the farm etc.
I would like to construct new variables counting the number of person leaving in the household with the interviewed. That is sum-up to q26_1_6==1 to q26_@_6==1, if q26_1_6==2 then sum all the q26_@_6==2 and so on.

I tried this:

Code:

foreach var of varlist q26_1_6 q26_2_6 q26_3_6 q26_4_6 q26_5_6 q26_6_6 q26_7_6 q26_8_6 q26_9_6 q26_10_6 q26_11_6 q26_12_6 {
forvalues j=1/4 {
egen hh_member`j'=rowtotal (`var') if `var'==`j'

}
}

But it does not work.
I copy a part of the dataset. Any help?
thanks in advance

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(q26_1_6 q26_2_6 q26_3_6 q26_4_6 q26_5_6 q26_6_6 q26_7_6 q26_8_6 q26_9_6 q26_10_6 q26_11_6 q26_12_6)
1 1 . . 3 . . . 1 4 . .
1 1 . . 2 . . . 1 . . .
1 1 . . . . . . 2 . . .
1 1 . . 2 . . . 2 3 . .
1 1 . . 2 . . . 1 . . .
1 . . 1 2 . . . . . . .
1 1 . . . . . . 4 . . .
2 . . . 4 . . . . . . .
1 1 . . 2 2 . . 1 . . .
1 1 . . 2 . . . 3 . . .
1 1 . . 4 4 . . 3 3 . .
1 1 . . 2 4 . . 1 . . .
1 1 . . 4 . . . 4 . . .
1 . 2 . . . . . . . . .
1 . . . . . . . 1 . . .
1 . 1 . . . . . . . . .
1 . 1 . . . . . . . . .
1 . 2 . . . . . . . . .
1 . . . 1 . . . . . . .
1 . 1 . . . . . . . . .
2 . 1 . . . . . . . . .
1 . 1 . . . . . . . . .
1 . . . . . . . . . . .
1 . 2 . . . . . . . . .
1 . . . . . . . . . . .
1 . . . . . . . . . . .
1 . . . . . . . . . . .
1 1 . . . . . . 4 1 . .
1 . 1 . . . . . . . . .
1 1 . . . . . . . . . .
1 . 2 . . . . . . . . .
1 . 2 . . . . . . . . .
1 . . . . . . . . . . .
1 1 . . . . . . 1 . . .
1 . . . . . . . . . . .
1 . . . . . . . . . . .
1 1 1 . . . . . . . . .
1 1 . . . . . . 1 . . .
1 1 . . . . . . 2 1 . .
1 1 . . . . . . 1 1 . .
1 . . . . . . . 2 2 . .
1 . . . . . . . 2 1 . .
1 1 . . . . . . . . . .
1 1 . . . . . . . . . .
1 1 . . . . . . . . . .
1 1 . . . . . . 1 1 . .
1 1 . . . . . . 1 1 . .
1 1 . . . . . . . . . .
1 1 . . . . . . . . . .
1 . . . 1 . . . . . . .
1 1 . . . . . . 1 . . .
1 1 2 . . . . . . . . .
1 . 2 . . . . . . . . .
1 . 1 . . . . . . . . .
1 . . . . . . . 2 . . .
1 1 1 . . . . . . . . .
1 1 . . . . . . 1 4 . .
1 1 . . . . . . 1 . . .
1 . 1 . . . . . . . . .
1 1 . . . . . . 1 1 . .
1 1 . . . . . . 1 1 . .
1 . . . . . . . . . . .
1 . . . . . . . . . . .
1 . . . . . . . 2 3 . .
1 1 . . . . . . . . . .
1 1 2 . . . . . . . . .
1 1 1 1 . . . . . . . .
1 1 . . . . . . . . . .
1 . 1 . . . . . . . . .
1 . . . . . . . . . . .
1 . 2 . . . . . . . . .
1 1 . . . . . . . . . .
1 . . . . . . . . . . .
1 . 1 . . . . . . . . .
1 . . . 1 . . . . . . .
1 . . . . . . . . . . .
1 1 . . . . . . 1 1 . .
1 . . . . . . . 1 1 . .
1 . . . . . . . 1 1 . .
1 1 . . . . . . 2 . . .
1 1 . . . . . . . . . .
1 . . . . . . . . . . .
1 1 . . . . . . 1 . . .
1 . . . . . . . 2 . . .
1 1 . . . . . . . . . .
1 . . . 1 . . . . . . .
1 . 1 . . . . . . . . .
1 . . . . . . . . . . .
1 . . . . . . . 2 . . .
1 1 . . . . . . 1 1 . .
1 . . . . . . . 2 1 . .
2 2 . . . . . . 2 . . .
1 1 . . . . . . 1 1 . .
1 . . . . . . . . . . .
1 1 . . 4 . . . . . . .
1 1 . . 4 . . . 2 . . .
1 1 . . . . . . 4 4 4 .
1 1 . . 1 . . . 1 . . .
1 2 . . 2 . . . 4 . . .
2 2 . . 4 . . . 2 . . .
end
label values q26_1_6 q26_1_6
label def q26_1_6 1 "home within the farm", modify
label def q26_1_6 2 "home close to the farm", modify
label values q26_2_6 q26_2_6
label def q26_2_6 1 "home within the farm", modify
label def q26_2_6 2 "home close to the farm", modify
label values q26_3_6 q26_3_6
label def q26_3_6 1 "home within the farm", modify
label def q26_3_6 2 "home close to the farm", modify
label values q26_4_6 q26_4_6
label def q26_4_6 1 "home within the farm", modify
label values q26_5_6 q26_5_6
label def q26_5_6 1 "home within the farm", modify
label def q26_5_6 2 "home close to the farm", modify
label def q26_5_6 3 "home in a town/village easy to reach from the farm", modify
label def q26_5_6 4 "further residences", modify
label values q26_6_6 q26_6_6
label def q26_6_6 2 "home close to the farm", modify
label def q26_6_6 4 "further residences", modify
label values q26_7_6 q26_7_6
label values q26_8_6 q26_8_6
label values q26_9_6 q26_9_6
label def q26_9_6 1 "home within the farm", modify
label def q26_9_6 2 "home close to the farm", modify
label def q26_9_6 3 "home in a town/village easy to reach from the farm", modify
label def q26_9_6 4 "further residences", modify
label values q26_10_6 q26_10_6
label def q26_10_6 1 "home within the farm", modify
label def q26_10_6 2 "home close to the farm", modify
label def q26_10_6 3 "home in a town/village easy to reach from the farm", modify
label def q26_10_6 4 "further residences", modify
label values q26_11_6 q26_11_6
label def q26_11_6 4 "further residences", modify
label values q26_12_6 q26_12_6

↧

Resolving Data Inconsistencies in Survey Responses

October 3, 2024, 11:44 am

≫ Next: First-level sampling weights in melogit and re-scaling approach

≪ Previous: loop with rowtotal across categorical data

Hi,

I am currently tabulating a string variable, and I've noticed an issue: the enumerator recorded "Don't know" in two different formats. I'm having trouble resolving this. I've attached the results for your reference.

Code:

 . tab vwsc_functionality

vwsc_functio |
      nality |      Freq.     Percent        Cum.
-------------+-----------------------------------
  Don't Know |         54        4.21        4.21
  Don’t know |         65        5.06        9.27
          No |        280       21.81       31.07
         Yes |        885       68.93      100.00
-------------+-----------------------------------
       Total |      1,284      100.00

I’m not sure how to fix this. I look forward to your guidance.

I have attached a screenshot for your reference also.

Thank you.

Array

↧

First-level sampling weights in melogit and re-scaling approach

October 3, 2024, 12:59 pm

≫ Next: Parallel trends before the treatment

≪ Previous: Resolving Data Inconsistencies in Survey Responses

Dear Statalist users,

I was struggling with the implementation of sampling weights (pweight) for individuals using the melogit command, with first-level observations weighted and nested in countries. It gave both convergence problems and unrealistically small standard errors, suggesting that the sample size of the first-level observations is artificially inflated. Although this is not an issue when using the same weight in the regular logit command.

Code:

melogit outcome [pweight=weight] || country:

A brief scan through Google and Statalist learnt that I am not the only one encountering this issue. Nevertheless, it appears that the mixed command for multilevel linear regression provides the pwscale option to scale first-level weighted observations so that they sum to the sample size of their corresponding second-level cluster (more information here: https://www.stata.com/features/overv...h-survey-data/). As there does not seem to exist a similar option for melogit, this inspired me to manually re-scale the weights by following procedure:

Code:

bysort country: gen n_country = _N
egen totalcntry_weight = total(weight), by(country)
gen scaled_weight = weight*(n_country/totalcntry_weight)

Given that only relative sizes may matter in sampling weights. Using the scaled weight seems to solve convergence problems and results in seemingly reasonable standard errors.
Any thoughts on the validity of this approach? I remain a bit reluctant since there might be a reason that there is not a standard pwscale option in melogit...

↧

Parallel trends before the treatment

October 3, 2024, 6:16 pm

≫ Next: Allowing different numeric formats within table and collect or dtable

≪ Previous: First-level sampling weights in melogit and re-scaling approach

Hi all,

I am looking at trends of a control and treatment group. I can see the parallel trends before the treatment, but after the treatment, trends change for both of my control and treatment groups. Does that mean I can't use a diff in diff any more?

Thank you,
Mahtab

For example, is the attached picture violation of diff in diff? Blue is control and Red is treatment. right side of the black line is after treatment.

↧

Allowing different numeric formats within table and collect or dtable

October 3, 2024, 7:58 pm

≫ Next: Fine and Gray model with categorical time varying covariate

≪ Previous: Parallel trends before the treatment

I'd like to know if the new -table-/-collect- system is capable of allowing different numeric formats in the same column? Let's consider the simplest possible example, I want on result with a %6.2f format and one with a %6.1f format. This is a reasonable request, especially for descriptive tables, where variables are measured with more or less precision than each other.

Here is a minimal attempt, and failure, to produce such a table. I've tried other attempts, but also without success.

Result:

Code:

---------------------
                Mean
---------------------
Price         6165.26
Mileage (mpg)   21.30
---------------------

Code;

Code:

clear *
cls

sysuse auto, clear

collect clear

dtable price, cont(price) nformat(%6.2f mean) name(D1)
collect layout (var) (result[mean])

dtable mpg, cont(mpg) nformat(%6.1f mean) name(D2)
collect layout (var) (result[mean])

collect combine Tbl = D1 D2

collect layout (collection[D1 D2]#var) (result[mean])
collect style header collection, level(hide)

↧

Fine and Gray model with categorical time varying covariate

October 3, 2024, 10:03 pm

≫ Next: interpreting results of svy command in STATA

≪ Previous: Allowing different numeric formats within table and collect or dtable

Hi everyone,

I am working on a project that investigates the relationship between obesity and the incidence of cancer in dialysis patients. By incorporating both baseline obesity status and chronological changes in BMI (which can happen multiple times) as time-varying covariates, the study aims to determine whether obesity impacts the risk of developing cancer and cancer-related mortality, while accounting for competing risks such as getting a transplant and non-cancer death. The analysis uses a Fine and Gray competing risks model to assess these associations.

I have attached how my data look like after I formatted each person in multiple line according to their chronological BMI changes.
[ATTACH=CONFIG]temp_36026_1728017463139_829[/ATTACH]

This is by using

Code:

  stset bmi_time, failure(failuretype==1) id(id) origin(startdate) scale(365.25)

bmi_date is when the obesity status has been documented
bmi_time is when the obesity status ended
failuretype =1 is the event of interest
failure type 2 and 3 are competing risks

I then use

Code:

 stcrreg i.obese, compete(failuretype==2 3)

to fit Fine and Gray model. After this I will add other potential confounders to the model.

My questions are:

Is my approach correct? This is my first time working with time-varying covariates with multiple changes. I manually split the data without using the "split at failure times" method, which is often recommended. Interestingly, my Fine and Gray model shows a reduced risk for the obese cohort, while the Kaplan-Meier curve suggested an elevated risk. Could this discrepancy be due to my approach?
The model runs very slowly, and I need to fit it multiple times to test different covariates. Is this typical, and are there any ways to improve the speed?
For an upcoming analysis, I will have both transplant status and obesity status as time-varying covariates. Can I apply the same approach that I used here?

Many thanks to your help in advance!

Kind Regards,
Bree

↧

interpreting results of svy command in STATA

October 3, 2024, 11:48 pm

≫ Next: Anova Repeated Measures and post-hoc and R2 low

≪ Previous: Fine and Gray model with categorical time varying covariate

Hi all,

May be this is a very silly question. I am trying to generate population estimates with sample weights using the svy command in STATA.

I used the following code.

Code:

svyset id [pweight=weight]
svy: total id

and got the following output.

Number of strata = 1 Number of obs = 8,245
Number of PSUs = 8,245 Population size = 36,385,946
Design df = 8,244

--------------------------------------------------------------
| Linearized
| Total std. err. [95% conf. interval]
-------------+------------------------------------------------
id | 3.64e+14 2.34e+12 3.60e+14 3.69e+14
--------------------------------------------------------------

The number in the table 3.64e+14 matches the number in population size (36,385,946), which is what one would expect. My problem is when I do this with a different round of data, I get the following output.

Number of strata = 1 Number of obs = 7,859
Number of PSUs = 7,859 Population size = 41,786,760
Design df = 7,858

--------------------------------------------------------------
| Linearized
| Total std. err. [95% conf. interval]
-------------+------------------------------------------------
id | 7.11e+14 7.34e+12 6.97e+14 7.25e+14
--------------------------------------------------------------

Why is the number in the table (7.11e+14) and the number in population size (41,786,760) not matching? For context, this is the NHATS data and 41 million matches with the report. Then what is 71 million? Is this the correct command to generate population estimate? If both the numbers were 71 million, I would certainly think so. But, 41 million is the correct population.

Appreciate any inputs. Thanks in advance!

↧

Anova Repeated Measures and post-hoc and R2 low

October 4, 2024, 12:41 am

≫ Next: Mundlak's Approach and clustering standard errors

≪ Previous: interpreting results of svy command in STATA

Good morning to everybody i have tu run an Anova Repeated Measures with the outcome (LR, HR-no diagnosis, and HR-NDD infants) as factor and age (10 days and 6, 12, 18, and 24 weeks) as the repeated measures.

i wrote the code below

HTML Code:

anova y diagnosis##timepoint, repeated(timepoint) bseunit(id) bse(Diagnosi_num

for the post hoc comparison i use the code below

HTML Code:

margins Diagnosi##timepoint, pwcompare(effects) mcompare(bonferroni)

When I run the model I get a very low R2. So I have to ask you: 1) Is the written syntax correct? 2) can I accept the results of the model with an R2 of 0.08 and therefore the post-hoc results? 3) if the syntax is wrong, what is the right one please? Thanks a million in advance

Tommaso Salvitti

↧

Mundlak's Approach and clustering standard errors

October 4, 2024, 1:32 am

≫ Next: combination variable and display the name

≪ Previous: Anova Repeated Measures and post-hoc and R2 low

Hi all,

I am doing an analysis of the effect of sovereign ESG scores on total factor productivity. Originally, I wanted to use a fixed effects model as the Hausman test indicated I should. however, after reading Bell, A., & Jones, K. (2015). Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data. Political Science Research And Methods, 3(1), 133–153. https://doi.org/10.1017/psrm.2014.7 I decided to go for the adjusted Mundlak's approach (within-between) which should the same as a fixed effects model and more. What I am now wondering is: should I cluster my standard errors? I cannot seem to find literature on clustering standard errors in random effects models, and most of the literature is on fixed effects models. A Breusch Pagan LM test indicated the rejection of homoskedastic standard errors. I used Drukker's (2003) test for serial correlation and confirmed the presence of it. and in general, when there is serial correlation, clustering standard errors will help.

Looking forward to your insights.
Kind regards,
Maarten

↧

combination variable and display the name

October 4, 2024, 6:15 am

≫ Next: Logging independent variables

≪ Previous: Mundlak's Approach and clustering standard errors

hello

I work on 9 symptoms of a disease. these are dummy variables- If someone has a symptom 1, otherwise 0. I need to make a combination of 9C2 to present dual symptoms. If I want to show the name of symptoms only if they are 1, how to make it?

For example,

Symptom A =1 and Symptom B= 1 but other symptoms are zero.

Then I want to only show "symptom A" & 'symptom B".

Many thanks in advance for your help!

BW
Kim

↧

Logging independent variables

October 4, 2024, 6:58 am

≫ Next: Handling missing values in multi-level models: at least 5 observations per group

≪ Previous: combination variable and display the name

Hello!

I am trying to run a panel data regression where 2 of my independent variables (in 2 different regressions of the same DV) are human impacs and economic impacts of certain climatic extremes. My problem is that I am not sure if I should log the variables or use them as they are. Not logging gives me more significant results, but there is higher variability. Also since for most economic variables such as gdp I usually use their logged versions I am not sure if I should do the same in this case? Here are my code lines for each of the regressions:
xtreg env_conc_u econ_impact_floods econ_impact_storms econ_impact_droughts econ_impact_wildfires econ_impact_extreme_temps log_active_pop log_gdp perc_sectA urban_pop secondary_educ avg_irrigated, fe robust
xtreg env_conc_u log_econ_impact_droughts log_econ_impact_extreme_temps log_econ_impact_floods log_econ_impact_storms log_econ_impact_wildfires log_active_pop log_gdp perc_sectA urban_pop secondary_educ avg_irrigated, fe robust
xtreg env_conc_u human_impact_floods human_impact_storms human_impact_droughts human_impact_wildfires human_impact_extreme_temps log_active_pop log_gdp perc_sectA urban_pop secondary_educ avg_irrigated, fe robustxtreg env_conc_u log_human_impact_droughts log_human_impact_extreme_temps log_human_impact_floods log_human_impact_storms log_human_impact_wildfires log_active_pop log_gdp perc_sectA urban_pop secondary_educ avg_irrigated, fe robust

Thank you!

↧

Handling missing values in multi-level models: at least 5 observations per group

October 4, 2024, 8:58 am

≫ Next: Entropy balancing procedure

≪ Previous: Logging independent variables

Hi everyone,

I am working with a multi-level dataset of individuals nested in counties/localities/municipalities (I cannot post any example because of data privacy). I have read elsewhere that, ideally, I would need 5 level-1 (individuals) observations per each group/ level 2 (counties). Before I run the multilevel models, I try to accomplish this by typing :

bysort county_code: gen n=_N
keep if n>5

However, I am aware that this only removes rows based on the number of observations per county without considering whether there are missing values in the variables that I later use in my regressions. Since I am appending various individual surveys with different numbers of observations and variables, my multilevel models end up including level-2 units or groups with less than 5 observations ("min. observations per group = 1"), which, from what I understand, is not recommended.

How could I handle this issue without simply dropping rows, as I change both the dependent and independent variables across the models I run?

Thanks in advance,

↧

Entropy balancing procedure

October 4, 2024, 9:02 am

≫ Next: Merge IDs, start and end of recording with timeseries datasets

≪ Previous: Handling missing values in multi-level models: at least 5 observations per group

Dear Stata users,

I have just applied Entropy BP. I would like to obtain the output that "pstest" command gives back after the command psmatch2 (% bias; reduction in bias, t-test). Is there any similar command for Entropy Balancing?

Any help would be really appreciated.

Best,
N

↧

Merge IDs, start and end of recording with timeseries datasets

October 4, 2024, 10:08 am

≫ Next: Help with PVAR and the Hansen J test

≪ Previous: Entropy balancing procedure

Hi All,

I am having issues with this merge.

In one dataset, I have the ID, the device serial number, the start and the end date of the recording.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte ID str8 device int(StartRecording EndRecording)
1 "62-36-7d" 21489 21497
2 "62-36-7d" 21497 21503
3 "03-04-2e" 21489 21503
end
format %td StartRecording
format %td EndRecording

In another dataset, I have the device number, and a time series (Date 1 & Date 2) of the variable of interest.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str8 device int(Date1 Date2) long VariableofInterest
"62-36-7d" 21489 21490    700
"62-36-7d" 21490 21491    350
"62-36-7d" 21491 21492   1000
"62-36-7d" 21492 21493     55
"62-36-7d" 21495 21496    302
"62-36-7d" 21496 21497    650
"62-36-7d" 21497 21498     20
"62-36-7d" 21498 21499    852
"62-36-7d" 21499 21500     39
"62-36-7d" 21500 21501    102
"62-36-7d" 21501 21502    258
"62-36-7d" 21502 21503    657
"03-04-2e" 21489 21490     96
"03-04-2e" 21490 21491    104
"03-04-2e" 21491 21492  36987
"03-04-2e" 21492 21493 201598
"03-04-2e" 21493 21494  98745
"03-04-2e" 21494 21495   3698
"03-04-2e" 21495 21496   2015
"03-04-2e" 21496 21497    258
"03-04-2e" 21497 21498    357
"03-04-2e" 21498 21499    198
"03-04-2e" 21499 21500    963
"03-04-2e" 21500 21501     15
"03-04-2e" 21501 21502    367
"03-04-2e" 21502 21503    126
end
format %td Date1
format %td Date2

The problem is that the same device number can be associated with two different IDs; however, in this case, the start and the end date of the recording also differ.
My aim is to merge these two datasets resulting in a file with ID, device, Date1, Date2, Variable of Interest. Any suggestions? Many thanks in advance.

↧

Latest Images