Quantcast
Channel: Statalist
Viewing all 73184 articles
Browse latest View live

clustering and concentration Stata code

$
0
0

Hi everyone,

I am in the process of writing one of my dissertation chapters. Using US Census and ACS, I want to measure Middle Eastern and North African immigrants’ spatial concentration and clustering in metropolitan areas.

I am wondering if there’s anyone who might have clustering and concentration Stata code. I would really appreciate it if anyone --who has worked on something similar--would like to share the code with me. That would be tremendously helpful!

Thank you,
Sevsem

extract year

$
0
0
Dear All, How can I extract year (Stata format) from the following variable?
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 date
"28-Jun-95"
"29-Jun-99"
"30-Jun-00"
"1-Jul-05" 
"2-Jul-15" 
end
Thanks in advance.

Counting the number of ID type for each country in the data

$
0
0
Hi,

My Panel data has 260 Panel IDs(firms) over a number of time period and another variable names as ID_Type (which is either 1 or 2 for each ID(firm)) and a third variable stating total 12 countries that the IDs belong to. The data looks like this:


Time IDs ID_Type Country
-- A IFI Pak
--- B CFI Pak
--- C IFI Bahrain
-- D IFI India
--- E CFI India
--- F IFI Pakistan

I want produce the table where the number of IDS and ID-Types can be represented for each country.such as a Table like:

Country ID IFI CFI
Pak 3 2 1

Bahrain 1 1 0

India 2 1 1
All the tables that I have made show the number of observations for IDs and ID-Types. But I need to count how many unique ID_types exist in each county. I was able to make such table some time ago but now am not able to replicate again. My previous table that I want to make again is given below:
Country IDs CFI IFI
Bahrain 15 8 7
Bangladesh 30 29 1
Egypt 13 11 2
Indonesia 36 23 13
Jordan 33 28 5
Kuwait 38 18 20
Malaysia 71 31 40
Pakistan 56 42 14
Qatar 15 9 6
KSA 18 10 8
Singapore 18 17 1
Total 376 250 126



How to rescale A variable by range

$
0
0
Hello statalist members,
i want to rescale my contious variables(x,y) by range(maximum-minimum) because of my other variables which are dichotomous.so i want to rescale my contious variables(x,y) so that thier values have a maximum range of 1.variable x is scaled by range of x by code and year, and variable y is by range of y in the focal company.
My data set is individual level i,e age, tenure ,race
Could you please guide me how can i do it please?
best regards.

How check if value in one variable is equal to any value in other variables

$
0
0
I am trying to gen a variable or variables that say VAR1 has elements in commons with other 2 variables (VAR2 and VAR3), VAR2 has element in common with only one variable (VAR4), VAR 3 has elements in common with only VAR1 and VAR4 has elements in common with only one variable VAR2. I have to do this by place.

Is there any way to do this in Stata?
Thanks.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str51 place float(VAR1            VAR2                    VAR3                    VAR4)
"SAN LEANDRO"      1996 . . .
"SAN LEANDRO"      1997 . . .
"SAN LEANDRO"      1998 . . .
"SAN LEANDRO"      1999 . . .
"SAN LEANDRO"           . . . .                                 1994
"SAN LEANDRO"            . . . .                                1995
"SAN LEANDRO"             . . . .                               1996
"SAN LEANDRO""             . . . .                              1997
"SAN LEANDRO"" . . . .                   1998
"SAN LEANDRO""  . . . .                  1999
"SAN LEANDRO""   . . . .                  2000
"SAN LEANDRO""       . . . .              2001     
"SAN LEANDRO""           . . . .                                                          2000  
"SAN LEANDRO""            . . . .                                                         2001
"SAN LEANDRO""             . . . .                                                        2002
"SAN LEANDRO""              . . . .                                                       2003 
end

Textbox position in bar graphs

$
0
0
I am trying to change the position of subtitles in bar graphs. I've created a MWE from Stata's dataset:
Code:
 
 use http://www.stata-press.com/data/r15/lifeexp, clear graph bar, over(lexp) by(region)
I wish that the subtitle textbox position (of all subgraphs) would be on the red rectangle shown in the attached screenshot (the screenshot shows only one subgraph).
Anyone knows how to do it?


Array

How drop some elements from a macro list of files

$
0
0
Hi Statalist I have a question.

I have the following datasets: 2008_a8.txt, 2010_a4.txt, 2012_a7.txt, 2014_a5.txt, 2015_a2.txt and 2016_a3.txt.
A first part of my .do file generate the following version for each dataset: *_v2.txt, *_v3a.txt, *_v3d.txt and *_v4.txt.

Thus, if I do the following commands:

local files : dir "C:\Users\nf19281\Desktop\example" files "*.txt"
macro list _files
_files: "2008_a8.txt" "2008_a8_v2.txt" "2008_a8_v3a.txt" "2008_a8_v3b.txt" "2008_a8_v4.txt" "2010_a4.txt" "2010_a4_v2.txt" "2010_a4_v3a.txt" "2010_a4_v3b.txt" "2010_a4_v4.txt"
"2012_a7.txt" "2012_a7_v2.txt" "2012_a7_v3a.txt" "2012_a7_v3b.txt" "2012_a7_v4.txt" "2014_a5.txt" "2014_a5_v2.txt" "2014_a5_v3a.txt" "2014_a5_v3b.txt" "2014_a5_v4.txt"
"2015_a2.txt" "2015_a2_v2.txt" "2015_a2_v3a.txt" "2015_a2_v3b.txt" "2015_a2_v4.txt" "2016_a3.txt" "2016_a3_v2.txt" "2016_a3_v3a.txt" "2016_a3_v3b.txt" "2016_a3_v4.txt"

Actually, I need that my macro list considers only initial datasets.
How can I ask to Stata to consider only the original datasets?

I tryed with: local files : dir "C:\Users\nf19281\Desktop\example" files "*.txt" !="*v*.txt" but it doesn't work.

Thank you!

Help with counts

$
0
0
Hello,

I am working on a prescribing data set containing details of about 100,000 people. The lay out is similar to the table below:
ID Gender (0=male, 1=female) Paracetamol_2002 (0=no, 1=yes) Paracetamol_2002_count Codeine_2002 (0=no, 1=yes) Codeine_2002_Count Ibuprofen_2002 (0=no, 1=yes) Ibuprofen_2002_Count
Pat01 0 1 15 0 0 1 1
Pat02 0 1 7 0 0 1 4
Pat03 1 1 3 1 4 0 0
Pat04 0 0 0 1 6 1 6
Pat05 1 0 0 1 12 0 0
Pat06 1 1 12 1 12 0 0
Pat07 1 0 0 0 0 0 0
Pat08 0 0 0 1 9 1 18
Pat09 1 0 0 1 1 1 5
















There is information on 23 different medicines covering a period of 5 years. I need to find out the following:
1. how many prescriptions were issued for each medicine in each year- for this is used the command 'count if Paracetamol_2002_count > 0'. is this the best way to do this? i also tried commands such as 'tab paracetamol_2002_count'.
2. How many people received those prescriptions in a given year- i tried 'count if Paracetamol_2002 > 0' but it didn't quite work. Any suggestions on how i might do this? How do i work out how many women for instance, received prescriptions for each drug in a given year?

I appreciate that these might be somewhat elementary questions but i would really appreciate any help i can get.

coefplot graph, symmetry between dots

$
0
0
Dear Statalisters,

I am using the coefplot command to create one graph with coefficients from 40 different regressions However, the position of the dots in the chart is not symmetric, as you can see below:
(for instance, look at Kenya and Pakistan plots)

Array

Here is the code I use:

Code:
coefplot /*
   */(female_ono_nr77_reg, aseq("India") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr77_iv, aseq("India") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr55_reg, aseq("United Kingdom") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr55_iv, aseq("United Kingdom") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr135_reg, aseq("Pakistan") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr135_iv, aseq("Pakistan") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr171_reg, aseq("United States") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr171_iv, aseq("United States") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr14_reg, aseq("Bangladesh") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr14_iv, aseq("Bangladesh") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr169_reg, aseq("Ukraine") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr169_iv, aseq("Ukraine") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr2_reg, aseq("United Arab Emirates") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr2_iv, aseq("United Arab Emirates") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr143_reg, aseq("Romania") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr143_iv, aseq("Romania") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr29_reg, aseq("Canada") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr29_iv, aseq("Canada") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr134_reg, aseq("Philippines") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr134_iv, aseq("Philippines") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr50_reg, aseq("Spain") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr50_iv, aseq("Spain") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr144_reg, aseq("Serbia") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr144_iv, aseq("Serbia") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr49_reg, aseq("Egypt") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr49_iv, aseq("Egypt") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr80_reg, aseq("Italy") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr80_iv, aseq("Italy") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr65_reg, aseq("Greece") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr65_iv, aseq("Greece") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr74_reg, aseq("Ireland") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr74_iv, aseq("Ireland") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr85_reg, aseq("Kenya") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr85_iv, aseq("Kenya") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr10_reg, aseq("Australia") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr10_iv, aseq("Australia") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr17_reg, aseq("Bulgaria") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr17_iv, aseq("Bulgaria") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr123_reg, aseq("Nigeria") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr123_iv, aseq("Nigeria") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */, keep(female_ono_nr77 female_ono_nr55 female_ono_nr135 female_ono_nr171 female_ono_nr14 female_ono_nr169 female_ono_nr2 female_ono_nr143 female_ono_nr29 female_ono_nr134 female_ono_nr50 female_ono_nr144 female_ono_nr49 female_ono_nr80 female_ono_nr65 female_ono_nr74 female_ono_nr85 female_ono_nr10 female_ono_nr17 female_ono_nr123) xline(0) mlabel(cond(@pval<.01, "***", cond(@pval<.05, "**", cond(@pval<.1, "*", "")))) note("* p < .1, ** p < .05, *** p < .01") mlabgap(*0) mlabsize(vsmall) mlabposition(1) msym(d) aseq swapnames sort

 
 
 
*Top 20, only IV
coefplot /*
   */(female_ono_nr77_iv, aseq("India") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr55_iv, aseq("United Kingdom") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr135_iv, aseq("Pakistan") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr171_iv, aseq("United States") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr14_iv, aseq("Bangladesh") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr169_iv, aseq("Ukraine") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr2_iv, aseq("United Arab Emirates") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr143_iv, aseq("Romania") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr29_iv, aseq("Canada") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr134_iv, aseq("Philippines") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr50_iv, aseq("Spain") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr144_iv, aseq("Serbia") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr49_iv, aseq("Egypt") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr80_iv, aseq("Italy") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr65_iv, aseq("Greece") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr74_iv, aseq("Ireland") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr85_iv, aseq("Kenya") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr10_iv, aseq("Australia") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr17_iv, aseq("Bulgaria") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr123_iv, aseq("Nigeria") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */, keep(female_ono_nr77 female_ono_nr55 female_ono_nr135 female_ono_nr171 female_ono_nr14 female_ono_nr169 female_ono_nr2 female_ono_nr143 female_ono_nr29 female_ono_nr134 female_ono_nr50 female_ono_nr144 female_ono_nr49 female_ono_nr80 female_ono_nr65 female_ono_nr74 female_ono_nr85 female_ono_nr10 female_ono_nr17 female_ono_nr123) xline(0) mlabel(cond(@pval<.01, "***", cond(@pval<.05, "**", cond(@pval<.1, "*", "")))) note("* p < .1, ** p < .05, *** p < .01") mlabgap(*0) mlabsize(vsmall) mlabposition(1) msym(d) aseq swapnames sort
Any advice on how to make the graph look better?

Thank you very much,
Estrella

bayesgraphs diagnostics in Item Response Model

$
0
0
Please, how do we use bayesgraphs diagnostics command in Item Response Model?
Thanks for your anticipated assistance.
olayiwola Adetutu
Kwara state University Malete,
Nigeria.

How to look at regression between score bands

$
0
0
Hi, I have a binary variable and a score variable. I want to look at the binary variable in different bands of the score variable (specifically, 0-10, 10-20, 20-30 >30).

Not sure how to do the coding

Code:
clogit case b_var if score== ? , group(set) or
.

Interpreting Xtabond2 Coefficients &amp; Coefficient Inflation?

$
0
0
Hi,

This is a follow-up question for this topic, but because the question is different, I thought may be it can be more useful in the future if someone searches this question.
Older topic: https://www.statalist.org/forums/for...are-endogenous

As before, here is the characteristics of my data:
"My data has three waves, but with gaps (unbalanced), and the N is approximately 2500. All the variables below are statistically significant in the FE/RE estimations."

I have estimated the following equation (the original variable names are changed for ease of reading):

Code:
xtabond2 DV L.DV X1 X2 X3 X4 wave2 wave3, gmm(L.DV) gmm(X1, lag(2 .)) gmm(X2 lag(2 .)) gmm(X3, lag(1 .)) iv(X4) iv(wave2 wave3, eq(level)) robust small twostep
All Hansen tests, including the incremental ones, are above 0.25 threshold (Kiviat 2019, Roodman 2009).

My question is as follows:
- In the pooled OLS regression, B for X1 is 0.66, and SE is 0.05.
- In the FE regression, B for X1 is 0.22, and SE is 0.05.
- In the Lagged FE (biased) regression, B for X1 is 0.19, and SE is 0.07.

BUT, in the dynamic panel specified above:
- B for X1 is 4.87, and SE is. 2.26.

Other variables (X2, X3, X4) are not like this, they are within the bounds of pooled OLS and FE regression.

What could be the reason for this?

How to generate predicted wage (gap) with margins command and plot a profile with marginsplot

$
0
0
Dear Stata user,

I would like to have predicted mean immigrant-native wage gaps on the vertical axis and years since migration (ysm) on the horizontal axis and generate such a profile based on the following simple regression pooling immigrants and natives:
Code:
regress logwage im ysm im#i.ysm
where im is a binary variable equal one if immigrant and equal zero if native.
Code:
margins i.ysm
generates the predicted logwage values. I treat years since migration as a set of binary variables where ysm ranges from 0 through 50 years.
Code:
marginsplot, x(ysm) recast(scatter)
produces the plot with ysm on the horizontal axis.

I have no clue how to get the predicted mean immigrant native wage gap on the vertical axis!?

I would appreciate some thoughts on this.

Thank you very much in advance.

Nico

coefplot chang bar color?

$
0
0
My question is how can I change the bar colors when I`m using
HTML Code:
coefplot
I added this
HTML Code:
bar(1,bcolor(navy)) bar(2, bcolor(maroon))
but it says:

option bar() not allowed
r(198);



HTML Code:
coefplot fcn1 fcn5, format(%9.0f) ///
 title("(a) Overall Perception of China", size(medium)) ///
 color(navy) lcolor(black) lpattern(solid) byopts(cols(1)) ///
 ciopts(recast(rcap)) citop citype(logit) ///    
 recast(bar) bar(1,bcolor(navy)) bar(2, bcolor(maroon))  rescale(80) vertical  ///
 ytitle ("") ///
 graphregion(color()) yscale(r(0 50)) ylabel(0(10)50) ///
 addplot(scatter @b @at, ms(i) mlabel(@b) mlabpos(2) mlabcolor(black)) ///   
 xlabel(1 "Favourable" 2 "Neutral" 3 "Unfavourable") ///
ylabel(, labcol(black)) barwidth(.3) legend(label(1 "Remainers") label(3 "Leavers") ring(0) position(2) bmargin(large))

Getting an error from putexcel set ..., open

$
0
0
I have this simple line of code to setup my excel worksheet but I get the following error when I try to run it:
putexcel_set_new(): 3010 attempt to dereference NULL pointer
<istmt>: - function returned error


Code:
putexcel set "Data collection tracked", replace open
It was working at first but then, without changing the line, I started to get this error. I appreciate any insight anyone has on how to resolve this.

Thank you,
-Daniel

Coefplot: reduce spacing on y-axis

$
0
0
Hello everybody,

I have a question about how to reduce the spacing between labels on the y-axis when using Benn Jann's coefplot command. This is related to a post I found on stackoverflow,, but the discussion there didn't quite solve the problem I think. Otherwise the plot will take up to much space in my document and I don't want to scale it down in a minipage or so, because then also the labels will be very small.
Anyway, reading the post there gave me the idea to fumble around with ysize(), xsize() and aspectratio().

Here is some code for demonstration:
Code:
quietly sysuse auto, clear
quietly regress price mpg trunk length turn
set autotabgraphs on

* i)
coefplot, drop(_cons) xline(0) ///
          xlabel(-600(100)300) ///
          scheme(s1mono) ///
          title(i) ///
          name(i, replace)
* ii)
coefplot, drop(_cons) xline(0) ///
          xlabel(-600(100)300) ///
          xsize(3) ysize(1) ///
          scheme(s1mono) ///
          title(ii) ///
          name(ii, replace)
* iii)
coefplot, drop(_cons) xline(0) ///
          xlabel(-600(100)300) ///
          aspectratio(.33) ///
          scheme(s1mono) ///
          title(iii) ///
          name(iii, replace)
* iv)          
coefplot, drop(_cons) xline(0) ///
          xlabel(-600(100)300) ///
          aspectratio(.33) ///
          xsize(3) ysize(1) ///
          scheme(s1mono) ///
          title(iv) ///
          name(iv, replace)
Number iii comes closest to what I need, however I would like to remove the empty white space on top and below the plotregion. I tried to do that by fooling around with margin() in graphregion() and plotregion(), but that didn't yield the desired result.
What I would like to have is number iii with the grey shaded area removed:

Array

Does anybody know how to achieve that - or where to find the solution?

Thanks in advance and best regards,
Boris

test

bootstrap doesn't work for handling endogenity for mlogit

$
0
0
Hi Dear Statalist,
I want to handle the endogenity for mlogit. My endogenous variables are two variables(each of them has two levels), and my number of observations is 4807. According to the https://www.statalist.org/forums/for...ction-approach forum, professor Wooldridge confirms that the code about using bootstrap work well for a reasonable estimate of the standard errors for the multinomial logit model.
I implement these code for my work, but the Stata errors "insufficient observations to compute bootstrap standard errors no results will be saved". I read many forums about this problem, but I can't understand why this happens and how can I handle it. Please help me to solve it, I'm so beginner in Stata.

global ylist method
global xlist Age hh_size i.work i.marital i.gender i.educ i.region i.hh_income i.spend_reduced /*
*/ i.importance_attribute_payment i.cc_hasbal i.cc_reward i.cc_ratio i.cc_revolver i.dc_free end_cash_bal log_Amount /*
*/ EaseCC EaseDC CostCC CostDC RecordCC RecordDC


// set up the program including 1st and 2nd stage
gen merch_accep_cash_norm = merch_accep_cash-1
gen merch_accep_card_norm = merch_accep_card-1

program define my2sls
probit merch_accep_cash_norm $xlist
predict merch_accep_cash_Hat , pr
gen merch_accep_cash_residual = merch_accep_cash_norm - merch_accep_cash_Hat

probit merch_accep_card_norm $xlist
predict merch_accep_card_Hat , pr
gen merch_accep_card_residual = merch_accep_card_norm - merch_accep_card_Hat

mlogit $ylist $xlist merch_accep_card_residual merch_accep_cash_residual [pw=ind_weight] , baseoutcome(1)
end

// obtain bootstrapped standard errors
*bootstrap "sim" _b _se, reps(400) dots
bootstrap , reps(50): my2sls

Weighting before Multiple imputation

$
0
0
Hello,
I have annual data from a cohort study since 2014 about smoking status. I would want to use the data (e.g., sex, age, self-rated health,...) of participants included in 2013 and calculate annual prevalence of smokers according to age, sex, etc. In addition, I would like to use 1) inverse probability weights (already calculated) to take into account non-participation to the cohort (in 2013) and 2) multiple imputation to take into account missing data since 2013 and every year.

I know how to do the two separately, but I didn't understand whether it was possible to use both techniques in the same study. Thank you for your advice!

Changing matrix values based on values of STATA variable

$
0
0
Dear Statalist,

I am currently running a random forest classification algorithm in STATA, and I wish to construct a cost function needed to run a weighted RF (using the crtrees algorithm).

In order to do this, I need to create an NxN size matrix, which starts off as an identity matrix, where the diagonal represents the relative weight of each observation in the dataset (i.e., using the identity matrix in itself means that all outcomes are equally weighed). To give you some context, my dataset consists of 19,000 data points, so it will be a 19,000x19,000 matrix.

I have figured out how to create an identity matrix equal to the number of observations in the dataset using Mata. To provide a concrete example, take the following:

sysuse auto
gen N=_N
global N=N
mata: st_matrix("MyMatrix", I($N)

Here is where I am currently stumped, however.

To go from the identity matrix to the matrix that I aim to build (i.e., the cost function), I need to change specific elements of the matrix based on values of a variable in the dataset (dummy variable).

Using the "
Automobile" dataset again, this is comparable to changing values of elements in "MyMatrix" for all observations where foreign==1.
For the sake of illustration, assume that I would, for example, wish to attribute the value "2" to all of those observations.

My question is now: Is this possible to perform this operation in this manner, or have I misunderstood how Mata works?


Sincerely
Johan Karlsson


Viewing all 73184 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>