Is Stata's C function ST_var_is_string returning the wrong values, or is my C code incorrect?

December 23, 2015, 11:38 am

≫ Next: Generating random variable from 95% confidence interval

≪ Previous: xtfmb command: How to store first-stage regression coefficients

I have this C code:

Code:

#include <stdio.h>
#include "stplugin.h"

STDLL stata_call(int argc, char *argv[])
{
    ST_boolean b;
    int i;

    for (i = 1; i <= SF_nvars(); i++) {
        b = SF_var_is_string(i);
        if (b) {
            SF_display("string\n");
        }
        else {
            SF_display("numeric\n");
        }
    }
    return(0);
}

I compile it into the plugin -test.dll-, using the instructions here (http://www.stata.com/plugins/) and run it using this Stata code:

Code:

capture program drop ctest
program ctest, plugin using("C:/working/plugin/test/x64/Debug/test.dll")
sysuse auto, clear
plugin call ctest mpg make

this is the output:

Code:

string
numeric

This doesn't look correct. Even if I run it with a single numeric variable, I get this:

Code:

. plugin call ctest mpg
string

. plugin call ctest make
string

What am I doing wrong here? Is my C code incorrect?

I'm using Stata 14.1 MP on a Windows 7 x64 machine, and I compiled the code into a 64-bit DLL with Visual Studio 2012. Can anyone replicate this (on any system/compiler combination)?

↧

Generating random variable from 95% confidence interval

December 23, 2015, 12:47 pm

≫ Next: Possible misspecification in gravity model (PPML, RESET test)

≪ Previous: Is Stata's C function ST_var_is_string returning the wrong values, or is my C code incorrect?

Hello,

I would like to randomly generate a variable from the 95% confidence interval of a normally distributed variable. I've had no luck finding any suggestions so far. Any suggestions?

Thanks in advance.
Amit

↧

Possible misspecification in gravity model (PPML, RESET test)

December 23, 2015, 1:18 pm

≫ Next: number of obs in mixed effect model, panel data

≪ Previous: Generating random variable from 95% confidence interval

Dear all,
Brief overview: I' m trying to estimate the impact of intrawar presence(1) interwar presence(2) and economic sanctions(3) on exports, using a gravity model.
When it comes to estimate gravity equation, PPML is the new benchmark. All previous studies on the topic though, use OLS, therefore it might be interesting to see if conventionl wisdom holds, using this new approach. That's why before any inference on my main 3 variables of interest, I am running a sensitivity analysis to compare different OLS specification with different PPML specification.

What's the problem ?
My main concern is about the PPML with time-varying country dummies specification.

To be more specific I use dummies for every origin country and every destination country, on a three year basis (following a previous paper by Ruiz and Villarubia, which also use OLS, not PPML). To be more explicit, Germany has 14 dummies in total: Germany as EXporter for the years 1989-1991, Germany as IMporter for the years 1989-2001, Germany as EXporter for the years 1992-1995 and so on...
I need to use a 3 years-country dummy because my dataset is made of 89 countries (covering 92% of World Export) for a 21 years time-span, from 1989 to 2009, resulting in a balanced panel of 164472 observations, which would require 89x21x2 = 3738 dummies on a 1 year base, way too much for the computational power at my disposal.

What's my Stata code ?

I create the dummies using

Code:

*where year3 is categorical from 1 to 7 for the years
*origin is the origin country id and destination is the destination country id
xi, prefix(_G) noomit i.origin*i.year3 i.destination*i.year3

I drop time invariant country-dummies and time-dummies automatically created by the previous code and i run PPML

Code:

drop _Gorigin* _Gyear* _Gdestin*

ppml export2 lndistwces contig comlang_off colony _G* if year < 2010, cluster(dyad)
*Where: export2 is export in billion of 2005 US$ (to allow a quicker computation) FROM Feenstra/UN comtrade
*lndistwces is weighted distance from CEPII
*contig is 1 for contiguity from CEPII
*comlang_off is 1 for a common language from CEPII
* colony is 1 for previous colonial ties from CEPII

Then I run a RESET test:

Code:

predict XB,xb
gen XB2 = XB^2
quietly ppml export2 lndistwces contig comlang_off colony XB2 _G* if year < 2010, keep cluster(dyad)
test XB2 = 0

Results are as follows

Code:

Number of parameters: 1243
Number of observations: 164472
Pseudo log-likelihood: -61261.918
R-squared: .91971331
Option strict is: off
                                 (Std. Err. adjusted for 7,832 clusters in dyad)
--------------------------------------------------------------------------------
               |               Robust
       export2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
    lndistwces |  -.7634372   .0258945   -29.48   0.000    -.8141894    -.712685
        contig |   .3082213   .0659453     4.67   0.000     .1789708    .4374718
   comlang_off |   .2199701   .0614091     3.58   0.000     .0996105    .3403298
        colony |  -.0989539   .1018637    -0.97   0.331     -.298603    .1006952

 test XB2 = 0

 ( 1)  XB2 = 0

           chi2(  1) =    6.23
         Prob > chi2 =    0.0125

From a qualitative point of view results are in line with previous studies, but the RESET test p-value is a bit too low.

My plan is to run the same model including my variables of interest (intrawar, interwar, economic sanctions).
And to repeat everything subsetting for Heterogenous products, Reference Priced products and Differentiated Products following Rauch classification, to see what products are more sensitive to unstable conditions.

My questions are:

May the RESET test alone undermine the reliability of my results ?
May the RESET test of the others models undermine the reliability of those those results too ?
Am I overthinking this ?

Any comment on the code, on the RESET test in particular and on the project in general, would be much appreciated.

↧

number of obs in mixed effect model, panel data

December 23, 2015, 1:34 pm

≫ Next: multiple imputation, variance

≪ Previous: Possible misspecification in gravity model (PPML, RESET test)

Hi,

I have a data set of 4,500 children's' weight status and their lifestyle behaviors and two waves (baseline and follow-up) are included. Now I am trying to examine the relationship between weight status and the lifestyle behaviors, since these children are from different schools, I choose to use the mixed effect model. However, after I reshape my data to long format and run my command(below), I get the results: Number of obs = 9,000. I am very new to touch the panel data and wonder if I choose the right command for my aim above. I will appreciate very much if you can give me any suggestion.

my command: xtmixed bmiz sleep screen hw i.fastf i.ssb fruit veg year ||school: year,mle nolog cov(unstructure)

Thanks and happy holidays!!

Chris

↧

multiple imputation, variance

December 23, 2015, 2:53 pm

≫ Next: New Stata Module to Estimate Shirley Almon Generalized Polynomial Distributed Lag Model

≪ Previous: number of obs in mixed effect model, panel data

Dear all,

I have dataset which contains multiply imputed values (5 implicates) and survey weights & I am trying to estimate the standard deviation for the regressors using Rubin's combination rules.
The formula for the total variance is: T= W+ (6/5) B

where:

- W is the within imputation sampling variance (the average of the 5 complete data variance estimates (V)) : W= 1/5 Σ Vhat
- B is the between imputations variance (the variability due to imputation uncertainty): T= 1/4 Σ (Yhat- Ybar)^2

How can I create the loop for this estimation?

Thank you in advance.
BR

↧

New Stata Module to Estimate Shirley Almon Generalized Polynomial Distributed Lag Model

December 23, 2015, 3:34 pm

≫ Next: Creating a table showing number of unique subjects seen by each of several clinicians

≪ Previous: multiple imputation, variance

TITLE
'ALMON': Module to Estimate Shirley Almon Generalized Polynomial Distributed Lag Model

DESCRIPTION/AUTHOR(S)

almon estimates Shirley Almon Polynomial Distributed Lag Model
for many variables with different lag order, endpoint
restrictions, and polynomial degree order via (ALS - ARCH -
Box-Cox - GLS - GMM - OLS - QREG - Ridge) Regression models.
almon can compute Autocorrelation, Heteroscedasticity, and Non
Normality Tests, Model Selection Diagnostic Criteria, and
Marginal effects and elasticities in both short and long run.

KW: Regression
KW: Shirley Almon
KW: Autoregressive Least Squares (ALS)
KW: Autoregressive Conditional Heteroskedasticity (ARCH)
KW: Box-Cox Regression Model (Box-Cox)
KW: Generalized Least Squares (GLS)
KW: Generalized Method of Moments (GMM)
KW: Ordinary Least Squares (OLS)
KW: Quantile Regression
KW: Ridge Regression
KW: Polynomial Distributed Lag Model
KW: Autocorrelation
KW: Heteroscedasticity
KW: Non Normality
KW: Model Selection Diagnostic Criteria.

Requires: Stata version 11.2

Distribution-Date: 20151222

Author: Emad Abd Elmessih Shehata, Agricultural Economics Research Institute, Egypt
Support: email emadstat@hotmail.com

INSTALLATION FILES (click here to install)
almon.ado
almon.sthlp
almon.dlg

ANCILLARY FILES (click here to get)
almon.dta

↧

Creating a table showing number of unique subjects seen by each of several clinicians

December 23, 2015, 6:57 pm

≫ Next: Question: how to select several variables date1, date2, date3, ..., dateX at one time

≪ Previous: New Stata Module to Estimate Shirley Almon Generalized Polynomial Distributed Lag Model

Hello,
The short code below will simulate data and create a report showing how many subjects seen per care provider, where some subjects have seen multiple providers, and some subjects will have seen the same provider more than once (for example in the -list- command, subject 1 has seen provider 2 twice.)

I wish instead to create a table showing the number of unique subjects seen per provider. Any advice would be very much appreciated.

HTML Code:

clear
set obs 10
gen id=_n
gen provider=1 in 1/4
replace provider=2 in 5/10
replace provider=3 in 9/10
replace id=1 in 6/7
replace id=2 in 8/9
sort id provider
l, noo sepby(id)
tab id provider

↧

Question: how to select several variables date1, date2, date3, ..., dateX at one time

December 23, 2015, 11:00 pm

≫ Next: GMNL model

≪ Previous: Creating a table showing number of unique subjects seen by each of several clinicians

Hi Stata Masters,

Before I post my question, I would like to inform that I'm quite new with Stata and English is not my mother language. Please apologize if you find my questions or English is weird.

So, I have up to 38 variable named with examen1, examen2, ... examen38. I would like to make a syntax to select all these variables. Is there a way to select them all at once without clicking them one by one. For example to say "browse examen1 examen2 examen3 .... examen38" at one time.

Hope it is clear.

Many thanks in advance.

Vini

↧

GMNL model

December 24, 2015, 2:06 am

≫ Next: Using file command to link Stata with LaTeX

≪ Previous: Question: how to select several variables date1, date2, date3, ..., dateX at one time

Dear STATA users,

I am trying to run a GMNL model using this commands:

gmnl choice rand(espresso highcalories italy highprice) group(setid) id(id) nrep(100)

but I got this message:
"factor variables and time-series operators not allowed
r(101)"

Please, any suggestion on how to deal with this?

Thanks

Daniele

↧

Using file command to link Stata with LaTeX

December 24, 2015, 2:13 am

≫ Next: Difference in difference example

≪ Previous: GMNL model

Hello everyone.

Basic stuff: I'm using a Mac, Stata/MP 13.1 version, and this is my first post.

Well, the basic idea is that I want to link Stata outputs automatically to LaTeX (for my dissertation). My approach is to write a LaTeX-friendly log file, in which I could specify global macros that LaTeX could find anywhere in the document I'm writing.

For example,

Code:

sysuse auto, clear
mean price
matrix R = r(table)

* I will write a LaTeX-friendly log file with the result previously found
quietly log using outputstata.txt
di in w "\def\price#1{\gdef\@price{#1}}"     // LaTeX code for defining a command
di in w `"\price{`=string(`=R[1,1]',"%20.0fc")'}"'  // LaTeX code for declaring the values of the command
quietly log close

This will end up with a log file that I can later use it as an \input{outputstata.txt} in LaTeX, and therefore, write a document with the global \@price (commas included), instead of writing the number on my own and coming back and forth to Stata and check if my answer has changed given my new procedures.

Until here, I found that there is no problem at all and I have successfully compile this log file in LaTeX. Hooray!

My problem came when I wanted to check if that global has been previously defined, so I could replace it with the new value. That is, I can't just use the before procedure to write an ado-file such as:

Code:

outputlatex, name(price) value(`=R[1,1]')

With macros `name' and `value' because, as it is, this will only add and add and add new lines into my log file, even if I have already defined that same LaTeX global before.

So, what I found is that there is a file command, with which you can read and write to and from any file.

Therefore, my approach was the following,

Code:

program define outputlatex
    syntax , NAME(string) VALue(string) [NEW]

    * Open *
    if "`new'" == "new" {
         quietly log using "$results/outputstata.txt", name(latex) replace text
    }
    else {
         quietly log using "$results/outputstata.txt", name(latex) append text
    }
    
    * Check if there is a previous global *
    tempname myfile
    file open `myfile' using "$results/outputstata.txt", read write
    file read `myfile' line
    while r(eof) == 0 {
        if `"`line'"' == "\def\\`name'#1{\gdef\@`name'{#1}}" {
            file write `myfile' `"\\`name'{`=string(`value',"%20.0fc")'}"' _n
            * Close *
            file close `myfile'
            quietly log close latex
            exit
        }
        file read `myfile' line
    }
    file close `myfile'

    * If nothing was found *
    di in w "\def\\`name'#1{\gdef\@`name'{#1}}"
    di in w `"\\`name'{`=string(`value',"%20.0fc")'}"'
    * Close *
    quietly log close latex
end

What I do first is to check if I want a brand new log file. If I don't, then I will open the previously used log file, named here outputstata.txt. Then, I will check if I have already written a LaTeX global under the name `name'. I compare the complete line, to avoid the use of substr() or other string functions. If I do find a previous global, I will like to replace the value of `name' with the new `value'. I assume that if I found the first line, the second one will be the one I'm looking for replacement.

AND HERE IS THE PROBLEM, this procedure does not replace the line completely. It stops at the first comma. How should I tell Stata that I want to whole line to be replaced? And more over, I formerly tried to replace the first line too, but I encountered that I could not write on the exact line in which I though I was, but the next one.

So, any ideas of how can I code this ado-file with successful results? Maybe I'm not thinking it correctly from the first place, I there could be an easier more efficient and simple procedure to do. But that is what I come up with. So, any help is appreciated.

Thanks a lot!

↧

Difference in difference example

December 24, 2015, 7:34 am

≫ Next: Descriptive stats for analyzed sample -- different Ns for xtregar but not xtreg

≪ Previous: Using file command to link Stata with LaTeX

I want to apply DID method on a economic problem. I would like to estimate the effect of financial constraints (if there is any) on investment. I want to be sure I have done everything right so I will explain my procedure and regression equation. I would just like to hear your opinion about the procedure.

(YOU CAN SKIP THIS PART AND START TO READ BOLD TEXT )In 2012, the government in Croatia pass a law named "Pre-bankruptcy law" (in short). This enable companies that are illiquid and insolvent (by some parameters) to start Pre-bankraptacy process. During the process, firms try to make agreement with their creditors, restructure their business activities and make Pre-bankruptacy agreement. The main result of the agreement is debt forgiveness and grace period for debt repayments. So, after the process ends, the financial position of the firm is improved.

Now, I would like to test the impact of the Pre-bankruptacy agreement (companies that successfully finish the hole process and reduce their debt) on investment. I thing, the best method to test relationship between better financial position (lower debt) and investment would be DD estimation, where control group are firms that didn't experience improvement in financial position.

So, the treatment group are firms that have successfully finished the process (improve financial position) and the control group are firms that are illiquid and insolvent but didn't improve their financial position. I have data on dates when firms made Pre-bankruptacy agreement. Dates fall in time interval 23.04.2013 - 02.07.2014. I have chosen to take only the time period 23.04.2013 - 31.12.2013. I define variables like this:
treatment = 1 "if firm finish the process"
t = 0 (pretreatment periods: 2011, 2012)
replace t == 1 if year == 2013 | year == 2014 (post treatment periods)

As you can see I have two time periods before the treatment and two time periods after the treatment. This s is my DID equation (i is outcome variable - investment):

xtreg "(or reg?)" i treatment 2011 2012 2013 2014 treatment#t

where 2011 2012 2013 2014 are equal to 1 if year is 2011, 2012 ...

I'm not sure if this is the right specification because:
1) I have multiple time periods in DID model (not just 2)
2) treatment doesn't occur on one specific date but during the period 23.04.2013 - 02.07.2014 (I choose to include only observations in t h period 23.04.2013 - 31.12.2013 and treatment are firms that finish Pre-bankraptacy process in that period )

↧

Descriptive stats for analyzed sample -- different Ns for xtregar but not xtreg

December 24, 2015, 9:08 am

≫ Next: How to save the coefficients on a mode;

≪ Previous: Difference in difference example

Hello
Following a previous forum answer, I used the e(sample) method to get statistics on an analyzed panel sample.
Since it was a panel sample, I also wanted to collapse by participants since I was interested in demographic data.

Here's the code
. use ""Level 1 File-One Sub per row.dta", clear
. xtset PID SurveyYr, yearly
. xtregar jobsat unemprate inc_labLog if MEMORIG==1 & NevUnemp == 1, fe rhotype(dw)
. collapse (mean) sex_dum if e(sample), by (PID)
. tab sex_dum

The problem is that the N (Number of groups) for the xtregar command is 2,403, but the N for the tab command is 3,131. I don't understand why the N has changed.

If I use xtreg instead of xtregar, I get the same N in both analyses. I would guess this has something to do with the fact that auto-regressive errors can't be modeled for participants with just one year of data, but that information from those participants is involved in computing standard errors.

Chris

↧

How to save the coefficients on a mode;

December 24, 2015, 12:46 pm

≫ Next: Recoding

≪ Previous: Descriptive stats for analyzed sample -- different Ns for xtregar but not xtreg

Dear Statalist
I have this this model which will be run for each firm in each industry every year

y= a+Bx1+B2x1+B3x3+e

how can I save the the estimated coefficients B1,B2 &B3of this model to use it on another model please?

↧

Recoding

December 24, 2015, 1:32 pm

≫ Next: multicollinearity test

≪ Previous: How to save the coefficients on a mode;

Dears,

We have a large dataset (panel). For each observation (individual) in the dataset we have industry code 1 and industry code 2. But industry code 2 have missing values so does industry code 1, though lesser missing values in the latter.
How can one find those missing values in the incode2 based on the existing information (matched codes) given in the dataset for same or other individual in the dataset?

i.e. for each incode2==. incode2 should be 45 if incode1==23 and so on.

Regards

↧

multicollinearity test

December 24, 2015, 2:07 pm

≫ Next: Please help: Error r(2000) when running svy logistic

≪ Previous: Recoding

Dear all,

How can I test multicollinearity with survey data? I have tried:

display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))

but it returns:
tolerance = . VIF = .

Thank you in advance.
BR

↧

Please help: Error r(2000) when running svy logistic

December 24, 2015, 3:45 pm

≫ Next: j-1 or j number of IMR after mlogit

≪ Previous: multicollinearity test

Hello,
I am using Stata/1C 12.0 and have been having some difficulty running svy logistic. I would greatly appreciate any help.

I am able to run the model no problem using:
svy: logistic outcome mode i.agecat gender i.edcat i.incomecat i.prov

however, when I add a propensity score as a covariate to my model, I get the following error:
svy: logistic outcome mode i.agecat gender i.edcat i.incomecat i.prov propscore
an error occurred when svy executed logistic
r(2000);

I've looked into this error, and confirmed that all my variables are numeric (not string) and also confirmed that the outcome is coded as 0,1.

I'm stuck. Any ideas?

Thanks in advance!
Marcella

↧

j-1 or j number of IMR after mlogit

December 24, 2015, 10:43 pm

≫ Next: New command calculating sf12 version 2

≪ Previous: Please help: Error r(2000) when running svy logistic

Hello!
I am trying to estimate treatment effect (endogenous switching regression) using multinomial selection equation with three treatment categories . I have in fact tried to browse regarding the IMR after mlogit. But I still have two confusions from the materials I read: 1) is it three or two IMR that I need to calculate 2) should I include all the IMR in each of the three outcome equation (one for each category), or I need to include only IMR of each categories in their respective outcome regime? When the selection equation is binary I normally have two IMR and I include for example IMR1 in regime 1 and IMR2 in regime 2. But somehow become peplexed when selection eqn becomes multinomial. THANK YOU for your help.

↧

New command calculating sf12 version 2

December 25, 2015, 2:57 am

≫ Next: How can i reset normal menu view?

≪ Previous: j-1 or j number of IMR after mlogit

Hi
Thanks to Kit Baum a new command -sf12- is present at ssc.
It calculates the sf12 version 2 as described at Mr Hays homepage.

Have fun!

↧

How can i reset normal menu view?

December 25, 2015, 3:37 am

≫ Next: Alphas from dummy variables

≪ Previous: New command calculating sf12 version 2

Hi everyone and Merry Xmas!
I have a big problem with my menu view!
Evrything is written in capital letters and i would like to reset it to default!
Any suggestions?
Thank you

↧

Alphas from dummy variables

December 25, 2015, 4:05 am

≫ Next: Creating theoretical alternatives in Stata

≪ Previous: How can i reset normal menu view?

Hello,

I have a question about my regression running in STATA. At the moment my model is Gearing ratio= a + a1*Public+ a2*DebtCrisis + b₁LNStructure-Aggregate + b₂ Structure-Aggregate*Public + b₃Structure-Aggegrate *DebtCrisis + e wherein the variables Public and DebtCrisis are dummy variables.

So my variables are Gearing, LNStructureAggregrate, Public and DebtCrisis. I use panel data between 2006 and 2011 (included the excel file which I imported to stata with my variables).

The commands I want to use are:

-mport excel " .... ", sheet("Blad1") firstrow
-xtset id year
-xtdescribe
-gen LNStructureAggregate * Public
-gen LNStructureAggregate * DebtCrisis
-regress Gearing LnStructureAggregate LNStructureAggregate*Public LNStructureAggregate*DebtCrisis, vce(robust)

Will this be a good regression for the model I use? Specifically with the Alphas from the Dummy variables.

Thanks in advance.

↧