Export Very Long Local to CSV

September 23, 2016, 11:52 am

≫ Next: Subtotals of categorical variables in two-way tabulate

≪ Previous: Creating a Stata data file from a JSON formatted file

Hi all,

I am trying to create a vector of cross price elasticities so that I can see their distribution (look at median and percentiles).
I have run into the problem that the matrix expression is too long.

At the moment my code looks like:

Code:

local alpha = 3
foreach market in `markets' {
    foreach firm in `firms' {

        // this is not exactly what I'm doing, but close enough for example purposes

        qui sum mkt_shr if market == `market' & firm == `firm'
        local s = `r(mean)'

        // get statistic and add it to local macro
        local elasts `elasts' \ `alpha'*`s'
    }
}
// get rid of first \
local elasts: subinstr local elasts "\" "'
mat def elasticities = (`elasts')

Here I get the error that the expression is too long. This vector is going to be quite long (I imagine it will be on the order of 1 million elements)

Having the vector in Stata would be best, but if I can export it to csv that would be alright too.

Is Mata the right option here?

All the best,
Eric

↧

Subtotals of categorical variables in two-way tabulate

September 23, 2016, 12:10 pm

≫ Next: Svy:logit

≪ Previous: Export Very Long Local to CSV

Is it possible to calculate subtotals of categorical variables in a two-way tabulation without recoding the variables to generate new ones? I am not an SPSS user, but by talking to my coworkers I believe what I am trying to do is similar to the subtotals function when creating a table in SPSS. I am using Stata 14.2.

Using the auto dataset, this is what I am doing:

Code:

sysuse auto
tab rep78 foreign, row miss

I would like to easily combine repair records 3-5 in this table. I would also like to be able to combine rep78 in other ways, such as 1-4, 4-5, etc. I have searched for a way to do this and not found anything, besides simply using recode on rep78. In my actual data I need to do this for a number of different variables which may need to be recoded in multiple different ways each, so I am trying to find a solution that does not involve creating many new variables. I would also like to avoid manually adding together the row percents I am interested in, which would be possible from the regular output in tabulate, although tedious.

As well, the StataList registration rules require full names as a username, but my full name was prohibited because of the hyphen in my last name (Rose-Silverberg). Is there a way to fix this? I can't be the only person on the forum with a hyphenated name. Thank you!

↧

Svy:logit

September 23, 2016, 1:02 pm

≫ Next: replacing missings

≪ Previous: Subtotals of categorical variables in two-way tabulate

This is a conceptual question on the use of svy:logit. Let's say I have a survey question that has a skip pattern that restricted its presentation to individuals of a certain age and sex. For example, say a survey of health and well-being was given to parents of 0-18 year old children and one of the questions concerned whether or not a female child had seen a gynecologist. Suppose this question was restricted to females 15 years and older so it is missing for all males and for females less than 15 years. So I have to do a subpopulation analysis. Let grp be a variable denoting the population of interest with 1 being those in the group, and 0 those outside the group. Because of the skip pattern on the survey, responses for those with grp=0 are all missing, and for grp=1 either 0/1. So my question is when I run (after svyset):

Code:

svy, subpop(grp):logit y i.x

will everything be handled correctly with the data missing on the out of domain cases, or do I need to do something differently to account for them? I'm used to dealing with subpopulation analyses where all the out of domain data are non-missing, but this skip pattern scenario presents a new problem.

↧

replacing missings

September 23, 2016, 1:16 pm

≫ Next: xsmle

≪ Previous: Svy:logit

Hi,

I'm using stata 13 with OS windows 10.

I have a variable with a lot of missings. I want to replace all the missings with the following non-missing value.
Here is a sample:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long permno float(date pends2)
10001 16412     .
10001 16413     .
10001 16414     .
10001 16415     .
10001 16418     .
10001 16419     .
10001 16420     .
10001 16421     .
10001 16422 16252
10001 16425     .
10001 16426     .
end
format %td date

I used this command but is only replacing the previous one and not all of them.

by permno (date), sort: replace pends2 = pends2[_n+1] if missing(pends2)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long permno float(date pends2)
10001 16412     .
10001 16413     .
10001 16414     .
10001 16415     .
10001 16418     .
10001 16419     .
10001 16420     .
10001 16421 16252
10001 16422 16252
10001 16425     .
10001 16426     .
end
format %td date

↧

xsmle

September 23, 2016, 1:46 pm

≫ Next: Cox Regression - time varying

≪ Previous: replacing missings

I am using the xsmle package to do a spatial analysis. My output is not generating any direct or indirect effects; I did not include the noeffect option. I am using the following command:
xsmle y x1 x2 x3 x4, wmat(matrix name) model(sdm) fe
Thanks.

↧

Cox Regression - time varying

September 23, 2016, 2:13 pm

≫ Next: New on SSC: - prodest - module for production function estimation

≪ Previous: xsmle

Hello
I have a cohort of breast cancer survivors who had a biomarkers measured at baseline (single time-point only) and who have been followed up for over 12 years. I noticed that with the last download of vital status updates (mortality) that in some cases previously strong associations with mortality disappeared. I ran the Cox models censoring dates at 3 year intervals and found that (different biomarkers have very different types of associations with outcome over time - some are consistently strong; some increase over time; and some have an initial strong association which disappears quickly.
I would like to investigate these associations using time-varying effects. However, I am unsure how to frame it and/or express the findings in terms of the effect of varying followup time on the outcomes, as the biomarkers themselves don't vary - sorry, this might be a very obvious question...

Many thanks for your help.
Catherine

↧

New on SSC: - prodest - module for production function estimation

September 23, 2016, 2:32 pm

≫ Next: Order variables based on their mean

≪ Previous: Cox Regression - time varying

Code:

ssc install prodest

prodest is a new and comprehensive Stata module for production function estimation based on the control function approach. It includes Olley-Pakes (OP 1996), Levinshon-Petrin (LP 2003), Wooldridge (WRDG 2009) and Ackerberg-Caves-Frazer (ACF 2015) estimation techniques, plus a brand new methodology (Mollisi-Rovigatti, MR forthcoming) in order to better deal with short panels.
Its basic usage is similar to that of existing modules like opreg or levpet, but adds many features to control the optimization procedures and address estimation issues - gross output vs. value added, endogenous variables, attrition in the data. Type

Code:

help prodest

for a complete overview of options and features of the program, plus some clickable examples.

prodest is an ongoing project and the current version (1.0.2) is not meant not be definitive. Therefore suggestions, impressions and bug reporting are more than welcome.

Below some examples of the program usage

Code:

. insheet using https://raw.githubusercontent.com/GabBrock/prodest/master/prodest.csv, names clear
(8 vars, 1,758 obs)


. prodest log_y, free(log_lab1 log_lab2) state(log_k) proxy(log_investment) va met(op) poly(4) opt(
> bfgs) reps(40) id(id) t(year)
.........10.........20.........30.........40


op productivity estimator

Dependent variable: value added                 Number of obs      =      1758
Group variable (id): id                         Number of groups   =       386
Time variable (t): year
                                                Obs per group: min =         1
                                                               avg =       4.6
                                                               max =        12

------------------------------------------------------------------------------
       log_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    log_lab1 |   .2602104   .0454651     5.72   0.000     .1711005    .3493204
    log_lab2 |   .1609835   .0492148     3.27   0.001     .0645242    .2574428
       log_k |   .2963724   .0824046     3.60   0.000     .1348624    .4578824
------------------------------------------------------------------------------

. prodest log_y, free(log_lab1 log_lab2) state(log_k) proxy(log_investment) va met(op) acf opt(nm)
> reps(50) id(id) t(year)
.........10.........20.........30.........40.........50


op productivity estimator
ACF corrected
Dependent variable: value added                 Number of obs      =      1758
Group variable (id): id                         Number of groups   =       386
Time variable (t): year
                                                Obs per group: min =         1
                                                               avg =       4.6
                                                               max =        12

------------------------------------------------------------------------------
       log_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    log_lab1 |   .2161255   .0995254     2.17   0.030     .0210593    .4111917
    log_lab2 |   .2362255   .0637043     3.71   0.000     .1113673    .3610836
       log_k |   .4552823   .0930899     4.89   0.000     .2728295     .637735
------------------------------------------------------------------------------

. prodest log_y, free(log_lab1 log_lab2) state(log_k) proxy(log_materials) va met(lp) opt(dfp) reps
> (50) id(id) t(year)
.........10.........20.........30.........40.........50


lp productivity estimator

Dependent variable: value added                 Number of obs      =      1758
Group variable (id): id                         Number of groups   =       386
Time variable (t): year
                                                Obs per group: min =         1
                                                               avg =       4.6
                                                               max =        12

------------------------------------------------------------------------------
       log_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    log_lab1 |   .2262627   .0386052     5.86   0.000     .1505978    .3019276
    log_lab2 |   .1518426   .0389601     3.90   0.000     .0754821     .228203
       log_k |   .2155473   .0409458     5.26   0.000     .1352951    .2957996
------------------------------------------------------------------------------

↧

Order variables based on their mean

September 23, 2016, 2:43 pm

≫ Next: Creating a frequency table with two variables

≪ Previous: New on SSC: - prodest - module for production function estimation

Hi all,

I have the following problem: I have a large dataset that contains many variables and I have to order all variables based on their mean. The variable with the largest mean should be at the beginning of the dataset. For example, assume I have the following data set:

Code:

  Observation
Var1
Var2
Var3
Var4
Var5

1
3
10
100
1
200

2
3
12
80
2
300

3
4
11
90
1
250

4
4
9
100
2
400

Then, the code should order my variables as follows:

Code:

 Observation
Var5
Var3
Var2
Var1
Var4

1
200
100
10
3
1

2
300
80
12
3
2

3
250
90
11
4
1

4
400
100
9
4
2

I was playing around with collapse/reshape, but did not come up with a solution. Is there an (efficient) way to achieve this task?

Would really appreciate your help.

Best,
Antoine

↧

Creating a frequency table with two variables

September 23, 2016, 3:31 pm

≫ Next: ROC-Sensitivity & Specificity for a given variable value

≪ Previous: Order variables based on their mean

Hi All,

I am new to using stata and I needed some help in creating a table such that state gives the no. of times a variable (lets say x) occurs by (y). My dataset looks like this:

Comorbids Inc_Key
A 01
A 02
A 03
B 01
C 02
D 02
E 03
F 05

For the above, I want to count the no. of times each inc_key has 0 comorbids, 1 comorbid of >/= 2 comorbid. Sample table below:

Comorbid Frequency
0 0
1 1 [this corresponds to the Inc_key: 05]
>/= 2 3 [this corresponds to the two instances of 01, three instances of 02 and 2 instances of 03]

how do i get stata to create this?

↧

ROC-Sensitivity & Specificity for a given variable value

September 24, 2016, 4:02 am

≫ Next: panel data and outliers

≪ Previous: Creating a frequency table with two variables

Dear All,

Would appreciate any help with this question please: Is there a way to get the sensitivity, specificity or even LR+ LR- for a specific variable cutoff of interest?

I know the command "roctab outcome variable, detail " gives you the sensitivity & specificity for different variable cutoffs, but the cutoff may not be the exact one I need.

Is it possible to specify the cutoff, and do it the other way round?

Thank you very much.
Michelle

↧

panel data and outliers

September 24, 2016, 4:14 am

≫ Next: Problem with factor variables (1.var instead of i.var in regression changes results drastically)

≪ Previous: ROC-Sensitivity & Specificity for a given variable value

Hi

can anyone guide me how to do outlier detection for panel data after using xtreg? and how to winsorize?

↧

Problem with factor variables (1.var instead of i.var in regression changes results drastically)

September 24, 2016, 4:25 am

≫ Next: Generating variables within ID

≪ Previous: panel data and outliers

Hello,

when running a regression with factor variables, I encounter certain problems.
I made up an example using the auto dataset.
My problem basically is, that I do not want to report "empty results" in a regression, which of course occur if one interacts two (0/1) dummies. So reg y i.var1r##i.var2 is better done as reg y 1.var1##1.var2. However there might arise new collianrity.

Code:

sysuse auto, clear

gen hro=1 if (headroom ==1 | headroom ==3 | headroom ==5 )
recode hro (.=0)


gen light =1 if weight <=3000
recode light (.=0)

reg length (i.light##i.hro)##i.foreign, robust
reg length (1.light##1.hro)##1.foreign, robust

//So far: everything perfectly fine, i.e. there are exactly the same results


reg length (i.light c.mpg i.hro)##i.foreign, robust
reg length (i.light c.mpg 1.hro)##1.foreign, robust

//So why is there *now* collinarity?

reg length (i.light c.price i.hro)##i.foreign, robust
reg length (i.light c.price 1.hro)##1.foreign, robust


reg length (c.price  i.light##i.hro)##i.foreign, robust
reg length (c.price  1.light##1.hro)##1.foreign, robust

First; I get new collinearity, that hasn't been there before.

Second, which I cannot reproduce that easy with the auto dataset, in my dataset I get really strange results:Using 1.var instead of i.var, I get totally different results and R^2 of 0.000. This is really puzzling to me.

Thank you very much in advance for your help.

FYI: I use Stat 12 on a Mac OS Sierra.

↧

Generating variables within ID

September 24, 2016, 6:12 am

≫ Next: Dropped Time dummy - collinearity

≪ Previous: Problem with factor variables (1.var instead of i.var in regression changes results drastically)

Dear all,

I need a couple of variables that show whether some values are the same within a particular ID. Let me first show you the data on (hypothetical) mergers. As you can see, I have an unique ID variable, and variables on names, nationalities and SIC codes (industry classification) of the companies.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float ID str9 Name str14 Nationality float SIC
1 "Compaq"    "United States"  3570
1 "HP"        "United States"  3571
2 "Vodafone"  "United Kingdom" 4812
2 "AT&T"      "United States"  4812
3 "DuPont"    "France"         2899
3 "Microsoft" "United States"  7374
end

I need a variable that shows per ID whether the nationalities of the companies match. Thus, for the first merger it would be 1 since the nationalities match. However, for the second and third it would be 0 because the nationalities differ.

Secondly, as for the SIC code I need a variable that shows the relatedness of the SIC codes. The value of the industry-relatedness measure should be one 1 for companies operating in the same four-digit SIC code industry (as in the second merger), 0.5 for those for which the first three digits of the primary SIC code match (as in the first merger), 0.33 when the first two numbers of their primary SIC codes are equal, and 0.25 when only the first digit matches. Of course, 0 would be the value if no digit matches (as in the third merger)

I hope there is a solution to this problem since I could not find it online.

Thanks in advance

Frank Smeets

↧

Dropped Time dummy - collinearity

September 24, 2016, 6:31 am

≫ Next: Time Series Dummy

≪ Previous: Generating variables within ID

Dear all,

I am running a regression where i try to test the effect of employment policy on several outcomes, including some count data for which i use the negative binomial regression. More in detail, my model is the following

xtnbreg OUTCOME_VAR EMPLOYMENT_PROTECTION CONTROLS i.time, fe

where i.time is supposed to introduce time dummies. The data is a panel data at country level.

My problem is that when i introduce in my controls a variable that is supposed to capture how employment_protection rules are defined in other countries (not in country i) stata drops the time dummy for last year in the model because of collinearity. I have no idea what this happens, do you have some advice? correlation with what also??

Thanks a lot,
Mario

↧

Time Series Dummy

September 24, 2016, 7:54 am

≫ Next: Issue opening dataset

≪ Previous: Dropped Time dummy - collinearity

Dear all,

I am trying to analyse the "Twist-on-the-Monday" effect. This effect states that after an unfavourable stock market week, the returns on Monday will be abnormally low (negative), while after a favourable stock market week, this effect does not occur.

Therefore, I have to create a dummy variable that takes a value of 0 on Mondays when the preceding week was unfavourable and a value of 1 for all other days.

I have daily returns (trading days), date, and days of the week as text in different columns.

Any help would be greatly appreciated. Thanks,

Antoine

↧

Issue opening dataset

September 24, 2016, 8:08 am

≫ Next: Best way to include a dofile into a LaTeX Document?

≪ Previous: Time Series Dummy

Hi there. I am currently in a Data Analysis class and we have just started to learn how to use Stata. For my problem set this week, we have to use the dataset from this link, entitled "Data Set-Main": http://isps.yale.edu/research/data/d132#.Vfb4dCpViko. Unfortunately, when I download the dataset, my computer (MacBook) tells me that I don't have the application to open this kind of file. I've tried opening the CSV and using Excel and it still won't work. I cannot even locate the file when I'm in Stata. Does anyone have any advice or next steps?

↧

Best way to include a dofile into a LaTeX Document?

September 24, 2016, 8:13 am

≫ Next: Country pair Fixed Effect

≪ Previous: Issue opening dataset

Hello,

I want to append a dofile into my latex file.

What is the easiest way to do so?

Thank you very much in advance.

Even so I have stata on the same mac as latex, it does not work to simply type

Code:

\documentclass[6pt]{article}
\usepackage{amsmath}
\usepackage{a4}
\usepackage{graphicx}
\usepackage{caption}
\usepackage{setspace}
\usepackage{booktabs}
\usepackage[left=.5cm,right=.5cm,top=1cm,bottom=2cm]{geometry}
\usepackage{longtable}
\usepackage{float}

\usepackage{calc, ifthen, alltt}
\usepackage{subcaption}
\usepackage{stata}

\allowdisplaybreaks





\begin{document}
\catcode`\#=12

\input{tabzus.tex}
\clearpage
\newpage

\begin{stata}
***Regressionen

cd "..."
use "data.dta",clear
 
...

\end{stata}


%\catcode`\#=6

\end{document}

I always get the error stata.sty not found.

Any hints or suggestions? Thank you very much in advance.

↧

Country pair Fixed Effect

September 24, 2016, 10:23 am

≫ Next: Obtaining GoF under svy:gsem

≪ Previous: Best way to include a dofile into a LaTeX Document?

Dear Statalisters,
I am working with panel data with 634 country pairs across 12 years using the gravity model of trade. I want to run a Country pair Fixed effect regression. I used the following command (for country pair fxed effect) :
regress institutional_distance FDI contig comlang_off colony comcol col45 smctry dist landlocked popi popj percap_gdpi percap_gdpj gdp_growthi gdp_growthj i.country pair, robust cluster(country pair)
I know the time invariant dummy variables are supposed to drop in this regression but when i run the regression they don't. What am i doing wrong? Thank you very much in advance.

↧

Obtaining GoF under svy:gsem

September 24, 2016, 11:03 am

≫ Next: How to use stptime for calculation of persontime on a mi dataset?

≪ Previous: Country pair Fixed Effect

Hello Stata Forum Users

I am wondering if is there is any way of obtaining some GoF indicators when running an "svy:gsem" model. I am asking because unlike the post estimates for non-survey data "gsem" that allows for "estat ic", when I run it after the "svy:gsem" model, I get the message "invalid subcommand ic". I 've lso tried "estat (svy)" and "estat gof" without success (in this case the message is "invalid subcommand (svy)").

Many thanks in advance for any advice.

↧

How to use stptime for calculation of persontime on a mi dataset?

September 24, 2016, 11:44 am

≫ Next: fixed effects: excluding one entity?

≪ Previous: Obtaining GoF under svy:gsem

Hi.

Does any of you guys know how to code a calculation of person-time on a MI dataset?

I have tried:
mi estimate: by(varname) per(1000) dd(4)
as I would if the data was not MI-set.

But it say r(198), invalid "by"

Thanks.
Kind regards
Marie

↧