Selecting cases in multiply imputed data with if

February 14, 2020, 7:18 am

≫ Next: Outreg2 Example 9. Adding r( ) e( ) scalars

≪ Previous: how to add the labels of axis for 3D graphs

Dear all

I have a small data set with some missing data and first fit two models with listwise deletion:

Code:

capture drop touse
mark touse
markout touse y x1 x2 x3

regress y x1                 if touse     // N = 80
regress y x1 x2 x3           if touse     // N = 80

Then I do some multiple imputation with chained equations:

Code:

mi register imputed y x1 x2 x3 x4

mi impute chained (regress) y x1 x2 x4 ///
                  (logit) x3,  ///
                   add(5) rseed(2) force augment

Note that I have to use the -force- option here, because for one of the variables it's somehow not possible to impute all cases.

Now I want to run the models again with the multiply imputed data:

Code:

mi estimate: regress y x1                 // N = 100
mi estimate: regress y x1 x2 x3           // N = 95

Note how weird this looks with the lower N in the second model. So I now want to restrict the sample in the first model to the complete imputed cases from the second model (weird, I know):

Code:

capture drop touse
mark touse
markout touse y x1 x2 x3

mi estimate: regress y x1                 if touse     // N = 80
mi estimate: regress y x1 x2 x3           if touse     // N = 80

But this brings the N down to the unimputed listwise deletion N. What would be a better way to get a handle on the imputed data?

Thanks so much

↧

Outreg2 Example 9. Adding r( ) e( ) scalars

February 14, 2020, 7:28 am

≫ Next: predicted probabilities with mimrgns

≪ Previous: Selecting cases in multiply imputed data with if

Hi Everyone,

I am looking for some clarification with -help outreg2-.

In the examples provided with -help outreg2-, (specifically, reg price mpg rep78 head) is it possible to export Adj R-squared and Root MSE? How I can store Adj R-squared and Root MSE as macros?

In the next step (lincom mpg + rep), tstat and pval are calculated and stored as local macros. But, those values are already available after -lincom-. Is it possible to export t and P>|t| without doing the extra step of r(estimate)/r(se)?

I know that e() includes everything in the ereturn list. What can be included with r()?

I would be truly grateful for a response from someone who has worked with outreg2.

↧

predicted probabilities with mimrgns

February 14, 2020, 8:22 am

≫ Next: Problem Saving LASSO results

≪ Previous: Outreg2 Example 9. Adding r( ) e( ) scalars

Hi,

I am using Daniel Klein's excellent mimrgns command, downloaded from the ssc. I am running into an error when trying to use the predict(pr) option, to calculate probabilities. I am trying to do this for an interaction term in a modified poisson regression. The two variables in the interaction are both binary.

Code:

mi est, eform: glm y i.x##i.z, fam(poisson) link(log) vce(robust)
mimrgns, at(x=(0 1) z=(0 1)) predict(pr) cmdmargins post

The error says:

Code:

option pr not allowed
an error occurred when mi estimate executed mimrgns_estimate on m=1

Any suggestions about what might be going wrong would be helpful. Thanks a lot.
Robbe

↧

Problem Saving LASSO results

February 14, 2020, 8:38 am

≫ Next: How to calculate the 95%CI for sensitivity and specificity after using rocreg command?

≪ Previous: predicted probabilities with mimrgns

I had the following confusing error saving LASSO results.

I have fixed it but I would love to know what I did wrong.

When I included the path to the working directory, I could not save anything. When I left it out of the filename, I could save without a problem.

Code:


estimates save "\\path\to\my\directory\results.ster", replace
file "\\path\to\my\directory\results.ster" not found
r(601)

estimates save "\\path\to\my\directory\results.ster" // no 'replace', but file does not exist yet.
file "\\path\to\my\directory\results.ster" already exists.
r(602)

estimates save results.ster, replace  // no filepath
file "results.ster" saved
extended file "results.xster" saved

In this same file I make use of

Code:

graph save "\\path\to\my\directory\graph_name.gph", replace

with no problem. I am confused about why the command does not work with the file path in the estimates save command.

Thanks!

↧

How to calculate the 95%CI for sensitivity and specificity after using rocreg command?

February 14, 2020, 8:51 am

≫ Next: Label data

≪ Previous: Problem Saving LASSO results

Dear all users,
I have a data set which has a binary outcomes (0/1) variable, and a continuous test variable. I run the rocreg command to adjust for some covariates.
Here is my command:rocreg NRDS test if inclusion==1, ctrlcov( gestationalage) ctrlmodel(linear) probit ml.
Then there show the value of auc and its 95%CI. I also notice the values of sensitivity and false-positive rate are automatically stored as variables. Now the challenge is that how I can calculate the 95% CI for sensitivity and specificity for some given cutoffs (such test==15.32)? Are there any commands or programs?
Hope to get some replies.
Thank you very much in advance!!!

↧

Label data

February 14, 2020, 9:08 am

≫ Next: extracting single matrix from rmatrices returnset for use with putexcel

≪ Previous: How to calculate the 95%CI for sensitivity and specificity after using rocreg command?

Hi All,

I have a csv file which contains the following data:

Identifiant	Education	Gender
11112	1	0
11113	2	1
11114	3	1
11115	3	1
11116	4	0

The file is accompanied with this codebook:

Variable	Values	Label
Education	1	Primary
Education	2	Junior School
Education	3	High School
Education	4	College / or University
Gender	0	Male
Gender	1	Female

I would like to label data in the first table using values from the second table in Stata 13. I have 2K variables in my data and I would like to write a do-file that will help me to do all of that at once.

I will be grateful for any comments and suggestions.

Best regards,
Antoine

↧

extracting single matrix from rmatrices returnset for use with putexcel

February 14, 2020, 9:17 am

≫ Next: How can I return the last distinct substring in a string?

≪ Previous: Label data

I am trying to get the results of margins (marginal effects) into excel (for further manipulation). Specifically I want the entire matrix of results (dxdy, se, z, p, CI). I can get this matrix using etable, but can't use colwise unless I specify a return set like rmatrices. However, rmatrices returns all matrices (the first on is the one I want.

Additionally, I want to use the [, names] option to get the xvars but it I can only use this with etable (obviously not with rmatrices since it returns many with many names/labels).

I see that my xvars names are stored in a macro called r(xvars) but have no idea how to access this, get it into a matrix or get it into putexcel.

Thank you in advance to anyone who reads this mess! (I'm teaching myself Stata for my job. Normally I use SAS all day long and am basically trying to get the equivalent of out=)

↧

How can I return the last distinct substring in a string?

February 14, 2020, 9:20 am

≫ Next: asclogit with different choice alternatives by case - no common base alternative across choice sets

≪ Previous: extracting single matrix from rmatrices returnset for use with putexcel

I use stata for some of my data analysis. I wanted to ask how to go about solving this problem: I would like to return the last distinct substring in a given string.

For example, given a string: "orange, banana, melon, cocoa, cocoa" you'll agree that the last distinct substring is cocoa which is the result I want.

Also, we could have string like: "orange,cocoa, banana, orange, orange" and I would like to return the index of the last stable substring (i.e. orange because there was no other subtring that appear after orange at the end of the string ) in the string

I look forward to your response.

Many thanks,

↧

asclogit with different choice alternatives by case - no common base alternative across choice sets

February 14, 2020, 9:22 am

≫ Next: Mixed Model Interaction with Time

≪ Previous: How can I return the last distinct substring in a string?

I am new to the asclogit command and have reviewed the manual documentation and other online tutorials/examples. However, I haven't been able to find an answer to the scenario of when each case/individual has a different choice set. For example, I am trying to understand college choice, but each student has a different subset of colleges from which to choose based on my data set (admitted college - and student got admitted to different colleges with minimal overlap). Here is an example of my current data structure:

id = Student ID
college = admitting college (largely no overlap)
female = example of 'case-specific' variable that doesn't vary by choice
tuition = example of 'alternative-specific' variable that does vary by choice
choice_dummy = binary indicator of choice among alternative (only one choice among 2-3 options)

For example, student 1 chooses college 57, but was also admitted to college 1 and college 20.

id	college	tuition	female	choice_dummy
1	College 1	10	1	0
1	College 57	15	1	1
1	College 20	12	1	0
2	College 87	13	0	1
2	College 12	18	0	0
3	College 3	12	0	0
3	College 37	17	1	0
3	College 20	12	1	1

My challenge is that there is no common "base alternative" across cases/students. Students are admitted to differing institutions. I believe that I have the correct data structure, but how should I handle this lack of a common base alternative (as the Stata documentation suggests).

I haven't been able to locate the answer to this scenario. Any guidance is greatly appreciated.

↧

Mixed Model Interaction with Time

February 14, 2020, 9:22 am

≫ Next: Export tables using tabout

≪ Previous: asclogit with different choice alternatives by case - no common base alternative across choice sets

Hi all,

I am hoping to get some feedback on whether my modeling strategy is appropriate.

To give some detail on the data structure, the data are composed of cases sentenced in courts from 2006-2017. The focal question is whether the influence of race (i.race) on sentence length (sentenceln)has changed over this time period (c.time). Accordingly, there are approximately 5,295 individual cases over the course of 11 years (c.time) that are sentenced in about 90 courts (district) in the data. The cases represent sentenced individuals. Therefore, the individuals are different in any given year. I have sought to model this with the following approach.

I have approached this using a the mixed command with the case characteristics at level 1, then controlling for court (district)at level 2 and cross interacting time with race. However, given the structure of the data, cases, nested within courts, overtime, I am hoping to gain insight if this is the appropriate modeling approach.

mixed sentenceln i.race##c.time sex age uscitizen hsgrad somecoll college counts_d plea pretrial crimhist presumptiveln i.departure if sentence>0 || district: time

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log likelihood = -3712.7635
Iteration 1: log likelihood = -3712.145
Iteration 2: log likelihood = -3712.1135
Iteration 3: log likelihood = -3712.1127
Iteration 4: log likelihood = -3712.1127

Computing standard errors:

Mixed-effects ML regression Number of obs = 5,295
Group variable: district Number of groups = 90

Obs per group:
min = 2
avg = 58.8
max = 416

Wald chi2(21) = 18564.95
Log likelihood = -3712.1127 Prob > chi2 = 0.0000

-------------------------------------------------------------------------------------
sentenceln | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
race |
Black | -.0725788 .0530059 -1.37 0.171 -.1764685 .031311
Hispanic | -.0709598 .0528852 -1.34 0.180 -.174613 .0326933
Other | -.0432867 .0333231 -1.30 0.194 -.1085987 .0220253
|
time | .001684 .0031922 0.53 0.598 -.0045727 .0079406
|
race#c.time |
Black | .0160827 .0064568 2.49 0.013 .0034276 .0287377
Hispanic | .0124189 .0069716 1.78 0.075 -.0012453 .0260831
Other | -.0005414 .0048027 -0.11 0.910 -.0099545 .0088717
|
sex | .221944 .032548 6.82 0.000 .158151 .2857369
age | .0012554 .0005795 2.17 0.030 .0001196 .0023912
uscitizen | .0130701 .031318 0.42 0.676 -.048312 .0744522
hsgrad | .0101234 .0173206 0.58 0.559 -.0238243 .044071
somecoll | .0127587 .0199315 0.64 0.522 -.0263063 .0518238
college | .0363391 .0275316 1.32 0.187 -.0176219 .0903
counts_d | .1278429 .0181558 7.04 0.000 .0922583 .1634276
plea | -.0918815 .0223136 -4.12 0.000 -.1356155 -.0481476
pretrial | .2790391 .0205249 13.60 0.000 .238811 .3192672
crimhist | .1002767 .0187991 5.33 0.000 .0634312 .1371222
presumptiveln | .8363359 .007849 106.55 0.000 .8209521 .8517196
|
departure |
Upward Departure | .4713081 .0275553 17.10 0.000 .4173007 .5253155
SA Departure | -.7133685 .0331309 -21.53 0.000 -.7783038 -.6484332
Downward Departure | -.5282394 .0160795 -32.85 0.000 -.5597546 -.4967241
|
_cons | .2363465 .0668123 3.54 0.000 .1053967 .3672962
-------------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
district: Independent |
var(time) | 5.72e-06 .0000172 1.57e-08 .0020772
var(_cons) | .0015985 .0010319 .000451 .0056651
-----------------------------+------------------------------------------------
var(Residual) | .2365268 .0046263 .2276311 .2457701
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 12.24 Prob > chi2 = 0.0022

Note: LR test is conservative and provided only for reference.

↧

Export tables using tabout

February 14, 2020, 11:12 am

≫ Next: Line break in local with embedded spaces?

≪ Previous: Mixed Model Interaction with Time

Hello again,
The problem I have is the following: I have a variable that is a list of codes, and I want to export the table generated by the tab command to an excel worksheet.
EX:
tab codes
will generate

Code. | Freq. Percent Cum.
------------+-----------------------------------
202210101 | 7,286 0.90 0.90
202210102 | 3,120 0.39 1.29
202210103 | 4,877 0.61 1.90

I want to export the first column specifically. Though if I can crate a sheet with the entire table will not be a problem, I can just delete the other columns.
I tried the following: tabout codes using code.xlsx
This creates the file code.xlsx, but excel can't open it, as if it does not recognize the extention .xlsx.
Thanks in advance

↧

Line break in local with embedded spaces?

February 14, 2020, 11:32 am

≫ Next: Graph Axis Title too Long

≪ Previous: Export tables using tabout

For this example:

Code:

local my_list_with_spaces `" "first thing" "second thing" "third thing" "'

Is there a way to escape a line break in the do file editor, so that it doesn't get rid into the local? Or am I stuck having to write:

Code:

local my_list_with_spaces `" "first thing" "second thing" "'
local my_list_with_spaces `my_list_with_spaces' "third thing"

↧

Graph Axis Title too Long

February 14, 2020, 11:51 am

≫ Next: save variable names in a row

≪ Previous: Line break in local with embedded spaces?

Dear All,

I'm generating graphs with stata, but the length of l1tile is too long, which exceeds the length of the graph. I'm wondering if there is a way to wrap the title text in to two lines? So that I don't need to change the font size to too small size, or change the graph size, which is not the optimal option.

Any help is appreciated!

Thank you very much!

Best,
Craig

↧

save variable names in a row

February 14, 2020, 12:10 pm

≫ Next: Graph Loop Skip "no observations" error without saving graph

≪ Previous: Graph Axis Title too Long

Dear all,

How could I save variable names in a row?

Original data set

Code:

clear
input float(Alcona Allegan Alpena Antrim Arenac)
0 1 0 0 0
end

Data set expected

Code:

clear
input str6 Alcona str7 Allegan str6(Alpena Antrim Arenac)
"Alcona" "Allegan" "Alpena" "Antrim" "Arenac"
"0"      "1"       "0"      "0"      "0"     
end

Best,

Jack

↧

Graph Loop Skip "no observations" error without saving graph

February 14, 2020, 12:20 pm

≫ Next: Export graph with changing name to a folder in loop

≪ Previous: save variable names in a row

Dear All,

I'm using the following code to generate graphs with a loop, for some variables there are no observations, the code skipped the "no observations" error and continue to run, but for variables with no observations, the graphs are still generated and saved (the content of the graph is the previous graph with no error). For example, if b has no observation, then graph with name "b" will still be saved, and when open the graph, the content of the graph is the same as graph a.

Code:

foreach var of varlist a-z{
                capture noisily ///
                catplot `var', title(`var')
                graph save `var', replace
                }

I'm wondering if there is a way to ignore the error but not save the graph?

Any help will be appreciated.

Thank you very much!

Best,
Craig

↧

Export graph with changing name to a folder in loop

February 14, 2020, 12:27 pm

≫ Next: Marginsplot and interaction

≪ Previous: Graph Loop Skip "no observations" error without saving graph

Dear All,

I'm trying to use loop to generate graphs of a list of variables, and export the graphs to a specific folder, and save the graphs with graph names being the looping variables. The following code does not work properly.

Code:

 
 foreach var of varlist a-z{                 capture noisily ///                 catplot `var', title(`var')                 graph export "\Graphs\`var'.png"                 }

Any help will be appreciated!

Thank you very much!

Best,
Craig

↧

Marginsplot and interaction

February 14, 2020, 12:35 pm

≫ Next: How to calculate dropouts in dataset by year

≪ Previous: Export graph with changing name to a folder in loop

Hi all--I am trying to use marginsplot to graph my interactions. I have got it to work successfully when I only have a linear interaction term, but I cannot figure out a way to use marginsplot to graph interactions once I include a squared term. Does anyone have advice?

Only a linear interaction term:

Code:

 eststo: areg `var' prop_same_gender c.prop_same_race##i.race_3_s              
                        
                margins race_3_s, dydx(prop_same_race)
                quietly margins race_3_s, at(prop_same_race=(0(.1)1))
                marginsplot, noci recast(line) ytitle("") title("`title'") legend(size(vsmall))

Regression with squared interaction term:

Code:

 eststo: areg `var' prop_same_gender c.prop_same_race##i.race_3_s c.prop_same_race_sq##i.race_3_s

↧

How to calculate dropouts in dataset by year

February 14, 2020, 12:47 pm

≫ Next: Keeping first 5 observations per firm id number

≪ Previous: Marginsplot and interaction

Dear reader,

Currently, I have a panel dataset with the variables Firm_ID and Year. Across years firms drop out of the sample, but new ones also join. How could I calculate the number of firms that drop from say, 1990 to 1991?

Initially when I thought I had no "joiners" I had just calculated the number of distinct IDs per year, but that's how I realized I had joiners.

Kind regards,
Luis

↧

Keeping first 5 observations per firm id number

February 14, 2020, 12:58 pm

≫ Next: Dummy Variable matching companies in different groups over time

≪ Previous: How to calculate dropouts in dataset by year

Hi,

I have a dataset with multiple observations per variable. However, I want to keep the first 5 observations per variable.
Does anybody knows a command for this?

Regards,

Rohan Kapoor

↧

Dummy Variable matching companies in different groups over time

February 14, 2020, 1:38 pm

≫ Next: Choice of the statistical test

≪ Previous: Keeping first 5 observations per firm id number

Dear Statalist members,

I want to create a dummy variable for examining a subset of my sample.
My sample consists of three groups. There are companies being exposed to an intermediate level of regulation from 2013 to 2016 (segment 1) in a pre period. Then a regulatory change took place. So now there are companies that are exposed to high (segment 2) and low regulation (segment 3) in the 2017 and 2018 post period.

I now want to create the dummy variables pre_high and pre_low. So I want to create a dummy that equals 1 if a company of segment 1 within the pre period has chosen segment 2 in the post period.

I'm kinda stuck right now and not sure how to create such a dummy variable.

Thanks a lot for any suggestions.
Martin

↧