Quantcast
Channel: Statalist
Viewing all 72772 articles
Browse latest View live

Common currency dummy should be zero with the same pair country?

$
0
0
Hello to everyone.

I am working in a gravity model with 133 countries from 2005 to 2018 and I am estimating it with PPML.

My question is: Should Common currency dummy be 0 or 1 with the same pair country? i.e. AFG - AFG = 0 or AFG - AFG = 1

This question arise because I have been reading some related papers where Common currency dummy is included. In all of them its result is positive.

In my gravity model, Common currency dummy got a coefficient of -0.57 and I think it may be because pair countries has 1 instead of 0 in the Common currency dummy. It is right?

Kind regards.

Carlos.


Iteratively saving output to matrix

$
0
0
Hi all,

I have one of the classic problems. Saving output from a regression to a matrix. I tried to search, google etc but couldn't find a solution.
I am running several regressions. It is actually a command called traj. A Stata plugin written i C. Downloadable from https://www.andrew.cmu.edu/user/bjones/index.htm In short it performs longitudinal finite mixture models. I am running several models, trying to find the best combination of trajectories and polynomials. I am saving BIC, AIC, etc in a matrix. I want to save another result as well. See below.

Dataset, it is actually 28 time points and values. Showing the first 5 values for 5 patients. Each timepoint ( time_n ) measures a organ failure score ( sofa_resp_orig_n ) in 660 critically ill patients.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(sofa_resp_orig_1 sofa_resp_orig_2 sofa_resp_orig_3 sofa_resp_orig_4 sofa_resp_orig_5 time_1 time_2 time_3 time_4 time_5)
2 1 0 0 0 1 2 3 4 5
2 3 2 3 2 1 2 3 4 5
2 2 . . . 1 2 3 4 5
2 . . . . 1 2 3 4 5
2 2 1 0 0 1 2 3 4 5
end
Here is my problem.


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear

// Setting polynomials and groups.
local polynom 3
local groups 6

// Creating matrix for later saving of results. This part is run once only.

// Creating matrix
local maxnum: display `polynom'^`groups' +1
display `maxnum'
local columns `groups'
matrix a = J(`maxnum',6,.)
matrix group = J(`maxnum',`columns',.)




timer clear
timer on 1
// Here I am running the code. I am running it several times with different starting points (from different combinations of polynomials.
//  Saving the matrix to excel every time I have run a number of iterations.
// This is because the regression sometimes does not converge and Stata freezes. I do not want to do it all from the first combination
// if this happens. 

    // State polynomials, number of groups (trajectories), maximum iterations (before saving to excel), which combination of
    // groups and polynomials to start from.
local polynom 3
local groups 6
local maxiter 2
local iter_start 1

// Setting variable list for traj command
local first "sofa_resp_orig_1"
local connect "-"
local second "sofa_resp_orig_28"
local maxnum: display `polynom'^`groups'


// Setting up matrix columns and naming them
matrix colnames a = "Groups" "Polynomials" "BIC(N)" "BIC(panels)" "e(AIC)" "e(ll)"
local bicn: display -10^99
local bicp: display -10^99
local aic: display -10^99
local e(BIC_N_data): display -10^99
local e(BIC_N_subjects): display -10^99
local e(AIC): display -10^99


// Filling up matrix with the possible combinations of polynomials. So I can start over at a specific combination later.
matrix b = J(`maxnum',6,.)
local i 1
forvalues a = 1/`polynom' {
    forvalues b = 1/`polynom' {
        forvalues c = 1/`polynom' {    
            forvalues d = 1/`polynom' {    
                forvalues e = 1/`polynom' {
                    forvalues f = 1/`polynom' {    
                        matrix b[`i',1] = `a'
                        matrix b[`i',2] = `b'
                        matrix b[`i',3] = `c'
                        matrix b[`i',4] = `d'
                        matrix b[`i',5] = `e'
                        matrix b[`i',6] = `f'
                        local i `++i'
                                    }
                                }
                            }
                        }                
                    }
                }        

// Extracting polynomial order to traj command
forvalues a = 1/`maxiter'{
local aa = b[`iter_start',1]
local bb = b[`iter_start',2]
local cc = b[`iter_start',3]
local dd = b[`iter_start',4]
local ee = b[`iter_start',5]
local ff = b[`iter_start',6]


// Running the traj command.
    quietly traj, var(`first'`connect'`second') indep(time_1-time_28) model(cnorm) min(0) max(4) order(`aa' `bb' `cc' `dd' `ee' `ff' )
    
    // Saving output to matrix
    matrix a[`iter_start',1]= e(numGroups1)
    matrix a[`iter_start',2]= `aa'`bb'`cc'`dd'`ee'`ff'
    matrix a[`iter_start',3]= e(BIC_N_data)
    matrix a[`iter_start',4]= e(BIC_n_subjects)
    matrix a[`iter_start',5]= e(AIC)
    matrix a[`iter_start',6]= e(ll)    
    
// !! here is my problem !!   See below for description.
    

    local iter_start `++iter_start'
}
    
timer off 1


di "`maxiter' iterations, total time"
timer list 1

// Saving matrix a (results summary) to excel in case Stata freezes.
cd "whereever/- Renal 5gr 3 poly"
putexcel set `first'`connect'`second'gr`groups'poly`polynom'_`maxiter'_`iter_start' , replace
putexcel A1 = matrix(a), names

end
In the code above, you can see where I say my probem is. The traj command in ereturn gives amongst other things a matrix of the percent of patients in each trajectory (e(groupSize1)). I want to save these numbers.
e(groupSize1) is a matrix with one row and n columns. n is the number of trajectories estimated (6 for 6 groups/trajectories, 5 for five groups/trajectories etc).

SO I want to:
1. I want to be able to run a number of traj estimations.Starting at a given combination of trajectories and poynomials.
2. Save all results (BIC, AIC etc) to matrix a which is done in the code above. 1-2 works fine.

3. I want to save e(groupSize1)for each estimation and add this matrix to matrix a. This I can not do.
I could easily save e(groupSize1) to a matrix ( matrix groupsize = e(groupSize1)) But either matrix groupsize needs to be increased with ++i for each new estimation, and last in the code appended with matrix a.
Or, perhaps better, be added to
matrix a after each run. But I then need to append matrix a with a number of columns (corresponding to the columns in e(groupSize1) every time.

Dummy code (for insertion in the code where my problem is:

matrix groupsize = e(groupSize1)
matrix a[`iter_start',7]= groupsize


This obviously does not work, I get a matrix configuration error. I get this even if I increase matrix a to be able to contain more columns. Any suggestions how to add e(groupSize1)to matrix a for every iteration of traj?

Hope I am making myself somewhat clear...


Best regards,
Jesper

Equality of regression coefficients when estimating regressions with same covariates and different outcomes

$
0
0
Consider a setting where the RHS of the two equations is identical, what is different is the LHS outcome:

Regression 1:
Code:
\[ y = a + bX + e \]
Regresion 2:
Code:
\[ y_2 = a_2 + b_2 X + e_2 \]

The goal is to test the equality of the coefficients on X in the two regressions. By browsing on the internet I found two options.

Option 1 is to use suest, as for instance explained here: https://stats.idre.ucla.edu/stata/co...s-using-suest/
Option 2 is to manually stack the data (one dataset copy per regression), creating a dataset dummy variable, and estimating a model where X is interacted with the dummy. This is explained here: https://www.stata.com/support/faqs/s...-coefficients/

In my "real life" application, I am using -reghdfe- since I also need to absorb a large number of fixed effects. Since the package does not support suest, I am trying to implement option 2.

When I do not "absorb" fixed effects I have no problems, in the sense that on top of running the test of interest I am also able to retrieve the coefficients b and b_2 (and the two intercepts) from the stacked regression. These are identical to those obtained when separately estimating the two original regressions.
However, when I absorb fixed effects this result doesn't hold anymore: the coefficients from the stacked regression are different from those obtained from the two separate regressions. I believe this is due to the fact that de-meaning the two separate models is different than de-meaning the stacked one.

Is it a right approach in this setting to first separately de-mean the two datasets, then stack the de-meaned datasets, and finally apply "Option 2"? My problem with this is that standard errors will be wrongly calculated since we have to take into account a preliminary estimation step. Do you know of alternative approaches to reach the same goal?

Generate time-variable for panel data in long format

$
0
0
Dear All, this is my first post here. Excuse me for doing anything wrong. I have a question I cannot find a solution for, even though I hope it is not too complicated.

I have panel data that is in a wide format containing individual survey responses over nine different waves. For some of my analyses, I need the panel data to be in a long format. To be able to reshape the data, however, I must have a variable representing the different waves, ie the time dimension. I know how to reshape the data, and create the ID-variable but I am not sure how to create the time-variable.

For now, the data includes nine different variables with the participants of each wave. It also includes nine variables with the start date of the survey and nine variables with the end date of the survey. Are these variables sufficient if I want to create a variable constituting the time component of my panel data in a long format? And how should I proceed with reshaping this?

Thanks in advance. If this were unclear, feel free to ask any questions.

//Tim

Edit:
I think I found a solution!

truncate independent variable only for regression

$
0
0
Hello,
I am trying to reproduce a model conducted in previous studies, and work from there forward. However, the regression of those papers do truncate two independent variables, both at (-1) and (1). The regression looks like; dependent-var independent-var (truncated) and [battery of control variables]

My data looks like the following, stata-version; STATA 14. want to truncate TAXRISK TAXAVOID.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(TOBINQ TAXRISK TAXAVOID PTROAw VOL_PTROAw)
 534.9625         .          . -.02838538         .
 716.5151         .          . .002406625         .
 940.1655         .          .  .02955219         .
1438.7303         .          .   .0464805         .
1799.5825         .          .    .081883         .
1524.2985  .2117158 -.05773715  .08523285 .21286123
1292.2108 .18533356 -.11306289  .08704758  .3091128
 1531.971 1.6120528 -.16798247  .04276229 .21286123
1917.7026 .15945534  -.1646288  .06368567 .21286123
1816.2937 .12556152  -.1762948  .04258824 .21286123
 2008.303 .11273167  -.2057642  .03846694 .21286123
 2161.308 .09726332  -.1876211  .04787452 .34941465
1716.2954 .15766433  -.3829571 -.02318152  .4452159
 1418.811  .5652186  -.4990783  .04112059 .29196942
2139.8582 .23024587  -.5566085  .05063291 .21286123
 1443.271         .          . -.02707996         .
 1434.441         .          .  .02324767         .
 2794.025         .          .  .06039088         .
 2415.032         .          .  .03506268         .
  2956.32         .          .  .06606981         .
end
*edit: I actually mean for calculating the correlations... not the regression

Multinominal logit model, IIA

$
0
0
Dear all,
I tray to analyze the determinants of loan repayment performance in a given organization.
The outcome variable was classified into three categories namely ‘paid on time’ for the clients who repaid loan before the due date, ‘delinquency’ for clients who repaid late from the due date or repaid less than the appropriate amount of their most recent loan, and ‘default’ for the clients who did not pay after three months of the due date. After I run the model using mlogit command i found the following result. But I have some question regarding the model and the iia test.

1. Is the overall estimation result good as per the title ?
2. What is the rationality behind choosing base category for specific model?
3. iia test doesnt work for this model what is the problem with it?

I need your help.


Code:
 mlogit loanstatus age loansize income area rooms floor tenur hhsize sex educ

Iteration 0:   log likelihood =  -145.5761  
Iteration 1:   log likelihood = -75.804603  
Iteration 2:   log likelihood = -72.239623  
Iteration 3:   log likelihood = -64.135462  
Iteration 4:   log likelihood = -62.519257  
Iteration 5:   log likelihood = -62.061845  
Iteration 6:   log likelihood = -62.058576  
Iteration 7:   log likelihood = -62.058576  

Multinomial logistic regression                   Number of obs   =        155
                                                  LR chi2(20)     =     167.04
                                                  Prob > chi2     =     0.0000
Log likelihood = -62.058576                       Pseudo R2       =     0.5737

------------------------------------------------------------------------------
  loanstatus |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Default      |
         age |   .0900269   .0497059     1.81   0.070    -.0073949    .1874487
    loansize |   .0000514   .0000259     1.99   0.047     7.49e-07    .0001021
      income |  -.0014927   .0004514    -3.31   0.001    -.0023774   -.0006079
        area |   .0167898   .0949439     0.18   0.860    -.1692968    .2028765
       rooms |  -3.228844   1.858627    -1.74   0.082    -6.871686    .4139975
       floor |   .0920056   .2192906     0.42   0.675    -.3377962    .5218073
       tenur |  -.0978784   .3479336    -0.28   0.778    -.7798156    .5840589
      hhsize |  -.1732183   .3867932    -0.45   0.654     -.931319    .5848824
         sex |   -.199167   .9524633    -0.21   0.834    -2.065961    1.667627
        educ |  -1.596048   .6462633    -2.47   0.014    -2.862701   -.3293956
       _cons |   .5467191   8.059603     0.07   0.946    -15.24981    16.34325
-------------+----------------------------------------------------------------
Delinquent   |
         age |   .0536494   .0354863     1.51   0.131    -.0159024    .1232012
    loansize |   .0000102   .0000203     0.50   0.615    -.0000296    .0000501
      income |  -.0002777   .0001202    -2.31   0.021    -.0005133   -.0000422
        area |     .06586   .0851608     0.77   0.439    -.1010522    .2327721
       rooms |  -1.079745   1.246765    -0.87   0.386     -3.52336    1.363869
       floor |   .3715418   .1467483     2.53   0.011     .0839205    .6591631
       tenur |  -.1872231   .2649991    -0.71   0.480    -.7066118    .3321656
      hhsize |   .0925946   .2187679     0.42   0.672    -.3361827    .5213719
         sex |    .473651   .6225617     0.76   0.447    -.7465475     1.69385
        educ |  -.8049165   .4456839    -1.81   0.071    -1.678441    .0686079
       _cons |   -2.80576   5.615922    -0.50   0.617    -13.81276    8.201244
-------------+----------------------------------------------------------------
Paid_on_time |  (base outcome)
------------------------------------------------------------------------------


. mlogtest, iia

Problem determining number of categories.

**** Hausman tests of IIA assumption

 Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives.
You used the old syntax of hausman. Click here to learn about the new syntax.

(storing estimation results as _HAUSMAN)
flat region resulting in a missing likelihood
r(430);
Thank you very much.
Ermiyas


Multinominal logit model, IIA

$
0
0
Dear all,
I tray to analyze the determinants of loan repayment performance in a given organization.
The outcome variable was classified into three categories namely ‘paid on time’ for the clients who repaid loan before the due date, ‘delinquency’ for clients who repaid late from the due date or repaid less than the appropriate amount of their most recent loan, and ‘default’ for the clients who did not pay after three months of the due date. After I run the model using mlogit command i found the following result. But I have some question regarding the model and the iia test.

1. Is the overall estimation result good as per the title ?
2. What is the rationality behind choosing base category for specific model?
3. iia test doesnt work for this model what is the problem with it?

I need your help.


Code:
 mlogit loanstatus age loansize income area rooms floor tenur hhsize sex educ

Iteration 0:   log likelihood =  -145.5761  
Iteration 1:   log likelihood = -75.804603  
Iteration 2:   log likelihood = -72.239623  
Iteration 3:   log likelihood = -64.135462  
Iteration 4:   log likelihood = -62.519257  
Iteration 5:   log likelihood = -62.061845  
Iteration 6:   log likelihood = -62.058576  
Iteration 7:   log likelihood = -62.058576  

Multinomial logistic regression                   Number of obs   =        155
                                                  LR chi2(20)     =     167.04
                                                  Prob > chi2     =     0.0000
Log likelihood = -62.058576                       Pseudo R2       =     0.5737

------------------------------------------------------------------------------
  loanstatus |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Default      |
         age |   .0900269   .0497059     1.81   0.070    -.0073949    .1874487
    loansize |   .0000514   .0000259     1.99   0.047     7.49e-07    .0001021
      income |  -.0014927   .0004514    -3.31   0.001    -.0023774   -.0006079
        area |   .0167898   .0949439     0.18   0.860    -.1692968    .2028765
       rooms |  -3.228844   1.858627    -1.74   0.082    -6.871686    .4139975
       floor |   .0920056   .2192906     0.42   0.675    -.3377962    .5218073
       tenur |  -.0978784   .3479336    -0.28   0.778    -.7798156    .5840589
      hhsize |  -.1732183   .3867932    -0.45   0.654     -.931319    .5848824
         sex |   -.199167   .9524633    -0.21   0.834    -2.065961    1.667627
        educ |  -1.596048   .6462633    -2.47   0.014    -2.862701   -.3293956
       _cons |   .5467191   8.059603     0.07   0.946    -15.24981    16.34325
-------------+----------------------------------------------------------------
Delinquent   |
         age |   .0536494   .0354863     1.51   0.131    -.0159024    .1232012
    loansize |   .0000102   .0000203     0.50   0.615    -.0000296    .0000501
      income |  -.0002777   .0001202    -2.31   0.021    -.0005133   -.0000422
        area |     .06586   .0851608     0.77   0.439    -.1010522    .2327721
       rooms |  -1.079745   1.246765    -0.87   0.386     -3.52336    1.363869
       floor |   .3715418   .1467483     2.53   0.011     .0839205    .6591631
       tenur |  -.1872231   .2649991    -0.71   0.480    -.7066118    .3321656
      hhsize |   .0925946   .2187679     0.42   0.672    -.3361827    .5213719
         sex |    .473651   .6225617     0.76   0.447    -.7465475     1.69385
        educ |  -.8049165   .4456839    -1.81   0.071    -1.678441    .0686079
       _cons |   -2.80576   5.615922    -0.50   0.617    -13.81276    8.201244
-------------+----------------------------------------------------------------
Paid_on_time |  (base outcome)
------------------------------------------------------------------------------


. mlogtest, iia

Problem determining number of categories.

**** Hausman tests of IIA assumption

 Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives.
You used the old syntax of hausman. Click here to learn about the new syntax.

(storing estimation results as _HAUSMAN)
flat region resulting in a missing likelihood
r(430);
Thank you very much.
Ermiyas

Multiple imputation

$
0
0
Hello,

I am analysing a dataset to recreate a prognostic score. Three of the 10 variables that go into building the score have missing data upwards of 20%. I ran multiple imputation to fill the missing values for the three variables.

However, when I use the data now to calculate the score ( varname - class; which ranges from 1-8), I get more than 1 score for the same dataset. As a result, stcox does not work because of the variation in score for the same patient.

I tried using "esampvaryok" but the results that I got were very obviously incorrect.
Code:
mi estimate, esampvaryok hr:stcox i.class , strata (trial)
I understand that averaging the scores to get the score is not the correct thing to do.
Is there anyway to have stata produce a single score for each patient in this imputed dataset?

Thank you.


Line Plots -- Proportion and Average with condition

$
0
0
Code:
     +----------------------------------------------------------------------------------------+
     | year                                                firm         sales         country |
     |----------------------------------------------------------------------------------------|
  1. | 2010   ZHONGHANG ELECTRONIC MEASURING INSTRUMENTS CO LTD    433.190944           China |
  2. | 2010                                         VERSATEL AG    288.621564         Germany |
  3. | 2010                                        BRILLIANT AG    13.4939952         Germany |
  4. | 2010                             ROYALE ENERGY FUNDS INC   23.01539744   United States |
  5. | 2010                            MOTRICITY, INCORPORATION   756.2029718   United States |
     +----------------------------------------------------------------------------------------+
I would like to have the following line plots together:
  • Proportion of firms that have sales < 100 if country == "United States" (scale left )
  • Average sales (scale right )
  • Average sales if country == "United States" (scale right)

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int year str105 firm double sales str22 country
2010 "ZHONGHANG ELECTRONIC MEASURING INSTRUMENTS CO LTD"                   433.190944 "China"        
2010 "VERSATEL AG"                                                         288.621564 "Germany"      
2010 "BRILLIANT AG"                                                13.493995199999999 "Germany"      
2010 "ROYALE ENERGY FUNDS INC"                                            23.01539744 "United States"
2010 "MOTRICITY, INCORPORATION"                                          756.20297178 "United States"
2010 "SILVERLEAF RESORTS, INC."                                           42.30089024 "United States"
2011 "NANFANG ZHONGJIN ENVIRONMENT CO LTD"                                411.5881728 "China"        
2011 "HOLLYSYS AUTOMATION TECHNOLOGIES, LIMITED"                   455.03851327999996 "China"        
2011 "DR HOENLE AG"                                                    66.75547738791 "Germany"      
2011 "HYMER AG"                                                             261.49396 "Germany"      
2011 "ONVIA, INC."                                                        24.29359504 "United States"
2011 "ALTIMMUNE INC"                                                      61.25993844 "United States"
2012 "SHENZHEN TOPBAND COMPANY LIMITED"                            212.43562703999999 "China"        
2012 "QINGDAO ZHONGZI ZHONGCHENG GROUP CO LTD"                             311.453604 "China"        
2012 "SURTECO GROUP SE"                                               248.39538361324 "Germany"      
2012 "HEIDELBERGCEMENT AG"                                              11336.5660875 "Germany"      
2012 "INTICA SYSTEMS AG"                                               16.82341644856 "Germany"      
2012 "DSP GROUP, INC."                                             124.84096704000001 "United States"
2012 "DYNAVOX INCORPORATION"                                               4.08980647 "United States"
2013 "BEIJING ARITIME INTELLIGENT CONTROL COMPANY LIMITED"             315.8520795495 "China"        
2013 "ZHEJIANG NARADA POWER SOURCE COMPANY LIMITED"                      782.89902096 "China"        
2013 "SHANGHAI HUAYI GROUP CORP LTD"                                  641.30546030714 "China"        
2013 "WHIRLPOOL CHINA CO LTD"                                      1322.7587971199998 "China"        
2013 "PROGRESS-WERK OBERKIRCH AG"                                       188.444865625 "Germany"      
2013 "SYNERGETICS USA, INC."                                               91.5605152 "United States"
2013 "TEXAS INSTRUMENTS INCORPORATED"                                  47545.87836879 "United States"
2014 "GUANGDONG HONGTU TECHNOLOGY HOLDINGS CO LTD"                       489.58464285 "China"        
2014 "LEO GROUP CO LTD"                                               1463.2291417812 "China"        
2014 "TIANJIN CHASE SUN PHARMACEUTICAL COMPANY LIMITED"              2229.34650143714 "China"        
2014 "CENTROTHERM INTERNATIONAL AG"                                    79.35253798209 "Germany"      
2014 "E.ON SE"                                                       33174.3689012708 "Germany"      
2014 "ADEPT TECHNOLOGY, INC."                                               113.28268 "United States"
2015 "TIANJIN MOTIMO MEMBRANE TECHNOLOGY CO LTD"                     1040.14742787776 "China"        
2015 "PACIFIC ONLINE LIMITED"                                         343.05631223424 "China"        
2015 "KWEICHOW MOUTAI CO., LTD."                                     42174.2028683804 "China"        
2016 "WUS PRINTED CIRCUIT (KUNSHAN) COMPANY., LIMITED"               1152.27762323149 "China"        
2016 "JIANGSU PHOENIX PUBLISHING&MEDIA CORP LTD"                        3836.62838097 "China"        
2016 "HAISCO PHARMACEUTICAL GROUP CO LTD"                          2407.2290850701897 "China"        
2016 "CYTOTOOLS AG"                                                        18.1413103 "Germany"      
2016 "ADM HAMBURG AG"                                                     264.6070128 "Germany"      
2016 "K&S AG"                                                            4581.0722001 "Germany"      
2016 "CALLAWAY GOLF CO"                                                 1031.47666064 "United States"
2017 "BEIJING STRONG BIOTECHNOLOGIES INC"                              1156.835655162 "China"        
2017 "SHENZHEN INSTITUTE OF BUILDING RESEARCH CO LTD"                 635.84602184376 "China"        
2017 "CHONGQING IRON & STEEL COMPANY LIMITED"                         2769.0095278376 "China"        
2017 "NEXWAY AG"                                                        7.04493487973 "Germany"      
2017 "KERYX BIOPHARMACEUTICALS, INC."                                     554.2444554 "United States"
2017 "NOVAN INC"                                                    67.54282176000001 "United States"
2018 "ZHEJIANG NARADA POWER SOURCE COMPANY LIMITED"                1813.9451278658098 "China"        
2018 "BAOTAILONG NEW MATERIALS CO LTD"                                1278.4957997234 "China"        
2018 "HARBIN ELECTRIC CORPORATION JIAMUSI ELECTRIC MACHINE CO LTD" 477.85802041465996 "China"        
2018 "FORMYCON AG"                                                    280.94941925362 "Germany"      
2018 "EBIX, INC."                                                         1300.962376 "United States"
2018 "MOMENTA PHARMACEUTICALS, INC."                                       1087.04256 "United States"
end

Mann-Kendall trend test in STATA (Tutorials)

$
0
0
Hello Everyone,

I am trying to find out how to do a non-parametric Mann-Kendall trend test to detect monotonic trends.

Here is some example data:

Year variable values = 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018
Events variable values = 420, 350, 305, 288, 250, 209, 175, 89

Thanks!

-RRS

Difference in Differences with unbalanced (asymmetric) time periods

$
0
0
Hello list,

I am troubled by a problem I face in my thesis: In a difference-in-differences approach, can I use an unbalanced (asymmetric) number of pre and post periods to the treatment? That is, if I were using a t-5 through t+5 years in my estimation, can I instead use t-1 through t+5? What are the implications? I fear including all the way back to t-5 adds a lot of irrelevant noise into my sample measure.

Thank you.

Eben

Displaying number of observations after &quot;Repest&quot; command

$
0
0
I use the user written command "Repest" to analyse PISA survey data.
I then use eststo and esttab to export the results to LaTeX.
"Repest" stores the number of observation and r-squared as coefficients and not as statistic, so I have to use estadd to add them as a scalar.
My problem is that when I use estadd I get an error message.


The code I use is -
Code:
repest PISA, estimate(stata: reg pv@math female) /// results(add(N r2)) fast estadd scalar No = e_N estadd scalar r2 = e_r2 eststo m1 esttab m1
When I run esttab I get
Code:
----------------------------
                      (1)   
                            
----------------------------
female             -13.62***
                 (-12.30)   

_cons               456.4***
                 (411.44)   

e_N                515956   
                      (.)   

e_r2              0.00422***
                   (5.99)   
----------------------------
N                           
----------------------------
But the code gives an error message after the command
Code:
estadd scalar No = e_N
The error message is:
Code:
"e_N not found"
Could you help me extract these scalars (N and r2) after the "Repest" command?


Log values in dependent variable

$
0
0
Hello There

I am in the process of writing an assignment for an econometrics course and I have a couple of questions:

1) I am estimating a multiple linear regression y x1 x2 x3... In order to check that my model is correctly specified, I run a RESET test (ovtest), and get p value of 0, rejecting the null that my model has no omitted variables. When transforming my dependent from y to log(y) this changes to a 0.8, meaning that my model is no longer misspecified. The question I have is if this creates problems if my dependent is a % value (% of waste recycled). Since most of the values are rather low percentages the data is rather skewed, so it makes sense to use a log to correct the distribution. However, if I have percentage values as decimals, all of my dependent variables become negative. I can correct this by multiplying my dependent by 100 and keeping values as full numbers (e.g. 0.33=33%).
So, I wanted to ask if it is possible to log the dependent variable if it is a percentage and if it is okay to multiply it by 100 to keep positive predicted values?

2) So far I have kept the model as log(y*100). Since I possess panel data and since I know regions have many time-invariant effects that may impact waste recycled (e.g. geography), I am using a FE (model) - I also ran a hausman test to statistically show this is preferred over a RE model. In my class, we have also learnt that there are situations where a first differences model is even better: e.g. with positive autocorrelation or when T is large and N is small. I tested for autocorrelation through the following method:

gen y = d.y
gen x = d.x1
gen x = d.x2
.... etc.

then: reg d.y d.x1 d.x2.... timedummy2 timedummy3 timedummy(n)....

I then predict the residuals and lag the residuals (which gives me fewer periods) and run the regression: reg resid resid_1 timedummy3 timedummy4 timedummy(n)....

Looking at the phat value I could theoretically tell if there is autocorrelation with one lag AR(1). But, the results show a small and insignificant value, suggesting no real evidence of autocorrelation. As a result, I chose to run both an FE and FD model and compare the two (I ran both robust e.g. (xtreg, robust) since a previous test showed heteroskedasticity). What I found when doing this is that some of Beta values of my independents changed from positive to negative. Does anyone know if it is possible for coefficients to change in sign when running FE vs FD?
The only reason I could come up with is because of the quite high standard errors so some Betas have confidence intervals that are both in the negative and positive range.

Besides this I am unsure if I am maybe misspecifying my model or if maybe I have any notable omitted variables.
Any help would be highly appreciated

How to define missing in value label and apply to multiple variables using this value label?

$
0
0
Hello!

New user here. I had 28 string variables with similar data that I encoded to numeric, using a defined value label (SIRorder).

. label define SIRorder 1 "S" 2 "I" 3 "R" 4 "NI" 5 "NA" 6 "NULL"

example:
. encode AugNEW , generate(Aug_new2) label(SIRorder)

I used the value label because not all of the original string variables had each of the data categories, and this gave me more consistency in how the data are encoded. None of the variables have any blank or missing data; the equivalent of missing in the string variables was originally coded as "NULL." In encoding the string variables, I now have a category (#6) for "NULL."

. label list SIRorder
SIRorder:
1 S
2 I
3 R
4 NI
5 NA
6 NULL
7 NB

How can I indicate that I want #6 to be treated as missing for all variables I generated using the value label SIRorder? It won't let me redefine #6 as missing:

. label define SIRorder . == 6, modify
may not label .
r(198);

Should I (could I?) have encoded NULL as missing when I originally encoded the string variables to numeric?

I could go through each of the 28 new variables and replace #6 with missing, but that seems like a lot of manual work. I may also ultimately want to make SIRorder categories #4, #5, and #7 also be missing (.), so understanding how best to do this for #6 will potentially be useful down the road.

Thanks in advance for your patience and help!
Edie Marshall

How to define missing in value label and apply to multiple variables using this value label?

$
0
0
Hello!

New user here. I had 28 string variables with similar data that I encoded to numeric, using a defined value label (SIRorder).

Code:
 label define SIRorder 1 "S" 2 "I" 3 "R" 4 "NI" 5 "NA" 6 "NULL"
example:
Code:
 encode AugNEW , generate(Aug_new2) label(SIRorder)
I used the value label because not all of the original string variables had each of the data categories, and this gave me more consistency in how the data are encoded. None of the variables have any blank or missing data; the equivalent of missing in the string variables was originally coded as "NULL." In encoding the string variables, I now have a category (#6) for "NULL."

. label list SIRorder
SIRorder:
1 S
2 I
3 R
4 NI
5 NA
6 NULL
7 NB

How can I indicate that I want #6 to be treated as missing for all variables I generated using the value label SIRorder? My attempt to redefine #6 as missing did not work:

Code:
 label define SIRorder . == 6, modify
may not label .
r(198);

Should I (could I?) have encoded NULL as missing when I originally encoded the string variables to numeric?

I could go through each of the 28 new variables and replace #6 with missing, but that seems like a lot of manual work. I may also ultimately want to make SIRorder categories #4, #5, and #7 also be missing (.), so understanding how best to do this for #6 and apply across all 28 variables using the SIRorder value label will potentially be useful down the road.

Thanks in advance for your patience and help!
Edie Marshall

Panel data regression by groups

$
0
0
Hello, Statalisters,

I am new to STATA. I have panel data for a time period of 36, 19 cross-sections, and 5 independent variables. My purpose is to estimate the coefficient of each independent variable for each cross-section. Is it possible to make such estimation in STATA? If so, could you please, advise me what model should I use for?

Here is more information on my model:

export credit longtermcredit insurance gdp exhangerate
i:19 t:36
Export is the dependent variable and the rest are independent.

How can I estimate each coefficient of dependable variables at each cross-section? I need to obtain 5x19 coefficient at the end of the estimation.

My apology if this question was asked before. Your assistance is highly appreciated.

Thank you,
Elnara

t-test p-values: philosophical question

$
0
0
I am using Stata 15.1 for Mac. My dataset after 1:4 case:control matching has 1174 cases (patients with brain tumors) and 4696 controls (patients without brain tumors). I am comparing the levels of a serum biomarker (the value is unit-less as it is a ratio). Using the t-test mean comparison test, two sample using groups (ttest serum, by(tumor)), I get the following output. I conclude that the difference between the means is not statistically significant.

Two-sample t test with equal variances

Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

0 4,696 .282771 .000907 .062152 .2809929 .2845491
1 1,174 .2789174 .001807 .0619147 .2753721 .2824627

5,870 .2820003 .0008108 .0621185 .2804109 .2835897

diff .0038536 .0020265 -.0001191 .0078263

diff = mean(0) - mean(1) t = 1.9016
Ho: diff = 0 degrees of freedom = 5868

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.9714 Pr(T > t) = 0.0573 Pr(T > t) = 0.0286


However, when I use the immediate form t-test calculator with rounded values (ttesti 4696 0.283 0.062 1174 0.279 0.062), I get the following output, now with a "significant" p value.

Two-sample t test with equal variances

Obs Mean Std. Err. Std. Dev. [.95% Conf. Interval]

x 4,696 .283 .0009047 .062 .2812263 .2847737
y 1,174 .279 .0018095 .062 .2754498 .2825502

5,870 .2822 .0008094 .0620154 .2806132 .2837868

diff .004 .0020231 .000034 .007966

diff = mean(x) - mean(y) t = 1.9772
Ho: diff = 0 degrees of freedom = 5868

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.9760 Pr(T > t) = 0.0481 Pr(T > t) = 0.0240


I realize there is considerable controversy regarding reliance on p values to establish statistical significance.
So, how do I reconcile the above? I would certainly like to say the difference in the biomarker level between cases and controls is significant and therefore has clinical utility.

Thank you very much,
Richard

How to obtain bootstrap ROC after logistic regression

$
0
0
I have a binary outcome (positive blood culture, coded 0/1) and a continuous predictor (risk score, where higher number indicates greater risk). I run the following code:
Code:
logistic positivebloodculture riskscore, vce(bootstrap, reps(1000) seed(102703) dots(1))
Now I need a bootstrapped ROC curve from the logistic model. Any suggestions on how I can do this. I use Stata 16.

Thank you,
Al Bothwell

How to stack marginal effects across outcomes in one columm for multinomial logit output?

$
0
0
Hello,
This is fairly simple question, but one I have not been able to find an answer to (although I have found a few unanswered threads). This is also my first time posting, so I apologize if I don't get this completely right (i have read the FAQs!).

I am running multinomial logits, and would like to include in my regression output a column for coefficients and a column for the marginal effects next to it. While I have no problem running the regression or calculating the marginal effects, the -margins- output will only report horizontally for each outcome. I have to manually drag the marginal effects for each outcome to match where the coefficients are in the column.

To help illustrate, my code:

Code:
mlogit zero2_end_cat_0015 di pi, base(1)
estimates store m3, title("1900-1915")
estpost margins, dydx(*) predict(outcome(1)) /* marginal effects - outcome 1 */
estimates store marg3a, title(1)
mlogit zero2_end_cat_0015 di pi, base(1)
estpost margins, dydx(*) predict(outcome(3)) /* marginal effects - increasing to 100% */
estimates store marg3b, title(3)
mlogit zero2_end_cat_0015 di pi, base(1)
estpost margins, dydx(*) predict(outcome(5)) /* marginal effects - partial increase */
estimates store marg3c, title(5)
esttab m3 marg3a marg3b marg3c using marginals1cq.csv, label se replace
What I get from this is a table with 4 columns, instead of a table with 2 columns (I would love marg3a, marg3b and marg3c to stack on top of each other in one column, corresponding to their relevant outcome from m3.

I have read everything that I can find, but nothing enables me to report them in one column. I can get the rrr reported in one column, and ultimately it would be great to have the coefficient, rrr, and margin listed together. Finally, I can see how this is done for logits or probits, but once there are multiple outcomes I don't see how I can get this to stack. Any help is much appreciated!

how to limit the number of graphs displayed on each page

$
0
0
I guess the technically correct way of stating this question is: "how to limit the number of groups displayed in a graph". In my full data set are a lot of graphs - but I'll just present three here as a demo. I can see a suggestion from Friedrich Huebler at https://www.statalist.org/forums/for...of-graphs-page
using long form data (after following successful guidance from Nick Cox at https://www.statalist.org/forums/for...variable-names ) I have
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str45 event byte month int media
"Hurricane Katrina"                             -2  709
"Hurricane Katrina"                             -1  502
"Hurricane Katrina"                              0 1193
"United Nations Sustainable Development Summit" -2   62
"United Nations Sustainable Development Summit" -1   42
"United Nations Sustainable Development Summit"  0  228
"Volkswagen emissions scandal"                  -2   49
"Volkswagen emissions scandal"                  -1   41
"Volkswagen emissions scandal"                   0 2071
end
if I use 2 graphs per page I believe this would mean
Code:
encode event, gen(id)
sum id
* Specify the number of iterations (=number of graphs)
local x = ceil(r(N)/2)
* Loop that creates x graphs with up to 2 sub-graphs each
forval y = 1/`x' {
  twoway (line media month) if id>((`y'-1)*2) & id<=(`y'*2), by(event) ylabel(0(1000)2000, angle(0)) xlabel(-2(1)0) name(graph`y', replace)
  graph save "graph`y'.png", replace
}
I think this is working but would appreciate any edits or observations of traps I might be setting for myself using this approach. Thank you, Dan
Viewing all 72772 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>