Quantcast
Channel: Statalist
Viewing all 73030 articles
Browse latest View live

Replace values across multiple variables with wildcard

$
0
0
Hi,

I have a dataset with 20 variables, each with a name beginning with "c". I would like to drop the data across all those variables for a certain subset of rows. Specifically, I tried:

replace c* = . if patientid==3408 & visitnum==10

and got: too many variables specified

Is there a limit to the number of variables represented by a wildcard, is my syntax just plain wrong, or am I just completely off trying this approach?

I am using v.16. Thank you!

outreg2 and exporting values not in the eret list in Stata 14.2

$
0
0
Hello Everyone,

My aim is to use outreg2 to export the calculated icc (interclass correlation coefficient) and chi2 to make a table for my paper, preferably as an excel document. In addition, it seems like Stata 14.2 does not store chi2 in the eret list.

I know outreg2 is a user-written code. I would be really grateful to you if you could let me know how I can get chi2 & icc into my excel table for my paper. I did see the output for -help outreg2-, but it is overwhelming and I do not know where to dig deeper.

I used the following code, but have not been successful in exporting either icc or chi2

meologit NumAP || sector:
meologit, coeflegend
display _b[var(_cons[sector]):_cons]
gen var_random_intercept2 = _b[var(_cons[sector]):_cons]
display var_random_intercept2/(var_random_intercept2 + (_pi^2)/3)
gen icc2 = var_random_intercept2/(var_random_intercept2 + (_pi^2)/3)
display icc2
est store null2

quietly meologit NumAP || sector:
outreg2 using null_type_of_AP.xls, e(ll chi2) icc2 dec(5) long replace

I am getting the following error:

option icc2 not allowed
r(198);

error . . . . . . . . . . . . . . . . . . . . . . . . Return code 198
invalid syntax;
option __________ incorrectly specified;
option __________ not allowed;
__________ invalid;
range invalid;
__________ invalid obs no;
invalid filename;
__________ invalid varname;
__________ invalid name;
multiple by's not allowed;
__________ found where number expected;
on or off required;
All items in this list indicate invalid syntax. These errors
are often, but not always, due to typographical errors. Stata
attempts to provide you with as much information as it can.
Review the syntax diagram for the designated command.
In giving the message "invalid syntax", Stata is not helpful.
Errors in specifying expressions often result in this message.

(end of search)

looping with -tabi-

$
0
0
hi ,

does anyone know how I can loop using the -tabi- command? I have 4 variables (v1 v2 v3 v4) and about a 100 lines of data and wish to execute -tabi x x x x,chi2- across the 4 variables but it is time consuming to do it one line at a time ( >100 lines)

here is what the data looks like and what I want
line v1 v2 v3 v4
1 12 32 56 78
2
3
4
5
I would like to run -tabi 12 32 \ 56 78,chi2- for line 1 , and then do that in a loop for all lines.

I hope this ask is clear.

any feedback would be appreciated.

thanks
Vishal

Multilayer Mapping (Density and Point together)

$
0
0
Hi there -

I would like to create a map with population and facility location by county level together. I was able to draw a map with population (see the code below), but I don't know how to add another information on the map. I want to have facility location with simple dots on the map. Please help!


spmap pop_county using icoordinates if STATE != 15 & STATE != 02, title(population, size(medsmall)) id(_ID) fcolor(Blues) legend(off)





How to get the first-difference estimator for panel data

$
0
0
Good evening,

I am trying to estimate a dynamic panel data model, with n=6 and t=25. Hausman test determined I should carry out a fixed-effects regression. After detecting heteroscedasticity and autocorrelation, I carried out the regression with xtpcse. However, as you all might know, the results for a dynamic model with fixed effects might not be robust due to Nickell bias. My supervisor recommended the Anderson-Hsiao method, which I've tried implementing with xtivreg2, fd, instrumenting for my lagged dependent variable with different numbers of its own lags. However, after trying instrumenting with lag2 & lag3, lag3&lag4, lag2 & lag3 & lag4 and even up to l5, my regression fails to pass the weak identification test (the (Cragg-Donald Wald F statistic is always smaller or slightly larger than the 25% maximal IV size critical value of the Stock-Yogo weak ID test), so I've sort of given up on that.

My supervisor also said that I could use the "simple first difference estimator", which is, as far as I understand, just an OLS regression of the first-difference dependent variable on the first-differenced independent variables. How can I get this estimator? Would it be possible by #xtreg d.y d.l.y d.x, fe vce(cluster)# ?

Thank you!

Error of "asdoc": option text() required

$
0
0
Hi All,

I'm using asdoc to output stat results, it worked well with:
Code:
bys country: asdoc tabstat share, stat(N mean)
But when I change the subgroup with code:
Code:
bys company: asdoc tabstat share, stat(N mean)
An error occured:
Code:
option text() required
I tried change the code to:
Code:
bys company: asdoc tabstat share, text(Company Details) stat(N mean)
It didn't work, the same error appeared.

My guess is that it is because the string values in my "company" variable are too long, but I don't know how to solve it.

Any help would be appreciated. Thank you!

Best,
Craig


wald chi square and prob >chi square is .

$
0
0
hello, I'm using ordered probit model, and the result the wald chi-square and prob >chi square is (.) is that a problem?
thank in advice anyone

Counting observations after a certain event with two date variables

$
0
0
Hi all,

I have a database of projects that belong to a particular category and have a launch date and a deadline. For each project, I'd like to compute how many projects in the same category have been launched 24h after the deadline. Project duration is not constant. I tried to run the code shown below, but I'm not getting what I need. I'm guessing it has to do with the sorting and the fact that campaign duration is heterogeneous.

Code:
sort category launched_at
gen campaigns_launched_24 = 0
local more = 1
local i = 0
while `more' {
    local i = `i' + 1
    by category: gen doit = (launched_at[_n + `i']  - deadline) <= msofhours(24)
    replace campaigns_launched_24 = campaigns_launched_24 + doit
    count if doit
    local more = r(N)
    drop doit
}
Here's a sample of my data:

Code:
clear
input str11 category long project_id double(launched_at deadline)
"Art" 1578671837 1556799741000 1559389080000
"Art"  353710709 1557766931000 1.5620832e+12
"Art"  199916122 1557783598000 1558646760000
"Art"  995325523 1557853227000 1564105380000
"Art"  725084811 1559851618000 1567454940000
"Art"  446705094 1560400631000  1.563174e+12
"Art" 1036167768 1562718749000 1.5653058e+12
"Art"  541268297 1563030134000 1567396740000
"Art" 1097561326 1563825605000 1571500740000
"Art" 1358511195 1564508515000 1.5659277e+12
"Art" 1262817514 1564860621000 1569383940000
"Art" 1331047419 1564966004000 1570247940000
"Art" 1303299859 1565114928000 1572824160000
"Art"  242607068 1565313089000 1568769360000
"Art"  654539635 1565744244000 1569211140000
"Art"  994711737 1565898541000 1569957720000
"Art" 2077234434 1566769666000 1570136460000
"Art"  546326967 1567046149000 1569988740000
"Art"  600922270 1567066462000 1.5712632e+12
"Art" 1704281050 1567303359000  1.572642e+12
"Art" 1054364934 1567343405000 1.5712416e+12
"Art" 1613913308 1567435152000  1.572642e+12
"Art"  611017496 1567612308000 1573919040000
"Art"  518209887 1567910303000  1.575306e+12
"Art"  662389409 1568149343000 1.5758424e+12
"Art"  149233916 1568234691000 1574132820000
"Art" 2098335561 1568449452000  1.575738e+12
"Art"  558687555 1568483157000 1.5762132e+12
"Art" 1982126934 1568486739000 1576181820000
"Art" 1686482513 1568499141000 1.5712416e+12
"Art"  574627229 1568522713000 1572407940000
"Art" 1748013708 1568560802000 1572757140000
"Art" 1896639590 1568569063000 1.5711609e+12
"Art" 1048291575 1568583712000 1.5763545e+12
"Art"  255262442 1.5685866e+12 1576358580000
"Art" 2063543996 1568588410000 1.5754248e+12
"Art"  134305743 1568597970000 1576289220000
"Art"  604752938 1568604717000 1574571540000
"Art"  475787700 1568604896000 1.5762564e+12
"Art"  578708004 1568605178000 1576299540000
"Art"     637867 1568633765000 1576348080000
"Art"  357969158 1568637301000 1576396260000
"Art"  295233939 1568670496000 1576363020000
"Art"  745788330 1568679605000   1.56987e+12
"Art" 1486569394 1568741077000 1573019940000
"Art"  440417064 1568862128000 1572929940000
"Art" 1159543817 1568914135000 1575659340000
"Art" 1325701851 1569605291000 1.5773364e+12
"Art" 1921156210 1.5696193e+12 1.5773925e+12
"Art" 1117882630 1569639019000 1576617420000
end
format %tc launched_at
format %tc deadline
Thank you all very much for your help.

Need Help

$
0
0
Hey Everybody, found this website trying to find Stata help online. I am currently enrolled in an analytics class and we have a project or Stata due and up until two weeks ago, I had never heard of Stata. And, other than showing us how to upload files, we didn't get any instruction on how to use it. My project is to find the data that lies between the 5th and 95th percentile and then do a "histogram" to produce a graph. Any help you all can provide is GREATLY appreciated.

import excel

$
0
0
I want to import a data from an excel sheet and I want all the variables to be in the format as doubles I wonder if this is possible to do using the command import excel. I know this is possible with the command import delimited however the excel sheet I am working with has multiple sheets and the command import delimited doesn't have a sheet() option.

Looping through column names as opposed to content of columns

$
0
0
Hi! My goal is to find out how many questions were asked that received non-null responses before a particular question of interest. This will vary by each submission in my survey since different respondents will give null responses to different questions/skip different questions. In order to do this, I have a name of a variable stored as a local as well as a unique ID for each submission stored as a local. I want to go into another dataset, narrow down the dataset by querying using unique ID for each submission, further narrow down the dataset by removing columns that have null responses, and find out where in the order of all the variables that have non-null responses this particular variable lies. This means I need to be able to loop through the column names themselves as opposed to the content of the columns. How do I do that? Thank you very much for your help!

Formatting cells putdocx table

$
0
0
According to the "putdocx table" documentation, it should be possible to format cells in the table, as in line 10 in the script below.. E.g for means I get 7.861702 which I want to reduce to two decimal points. The loop works well, but when I try to format cells by inserting a nformat command (line 10) I get the error message "option not allowed".
Anybody know how to reduce number of decimals in this example?

I am using Stata SE 15.0


.
. forvalues i = 3/150{
2. putdocx paragraph, halign(left)
3. putdocx text ("Kommune `: label (K) `i''"), bold
4. foreach var in Fornøydhet Meningsfylt Ensom{
5.
. putdocx paragraph, halign(center)
6. putdocx text ("`var'")
7. preserve
8. statsby n=r(N) mean=r(mean) se=r(se) lb=r(lb) ub=r(ub), by(aldergr3) clear: ci mean `var' if K==`i'
9. putdocx table `var' = data("n mean se lb ub aldergr3"), varnames border(start, nil) border(insideV, nil) border(end, nil)
10. putdocx table `var'(2,3), nformat(%5.2f)
11.
.
. restore
12. }
13. }
(running ci on estimation sample)

command: ci mean Fornøydhet if K==3
n: r(N)
mean: r(mean)
se: r(se)
lb: r(lb)
ub: r(ub)
by: aldergr3

Statsby groups
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
...
option nformat(%5.2f) not allowed

Name groups of consecutive number series

$
0
0
Dear all,

We are trying to make a new variable which gives every number within a consecutive series of numbers a ''name'', which is the highest number within that consecutive series. So each consecutive series starts with a ''1'' and we need to attach the highest number to every number within that series. To make this more clear see our example underneath;

Var 1 - NewVar
1 4
2 4
3 4
4 4
1 8
2 8
3 8
4 8
5 8
6 8
7 8
8 8
1 4
2 4
3 4
4 4

This might be a very basic question but we somehow can't figure out how to do it. We would be very thankful if anyone could help!

Kind regards,

Jonathan


Count max of events in any 3 year period, in set of 20 years of data?!?

$
0
0
Hi there,

I have to see the maximum of deals a company has completed within any three- year period in a dataset containing data from 1990-2011.

Per company, i have the data_announced per deal.

And a Company has multiple deals

I was thinking about a loop, but not sure about that.

Could anyone help me out?

Thanks a lot



Array

difference in difference in stata

$
0
0
I'm doing a Diff in Diff and I'm first making a graph I put in the following command:
twoway (line students year if university==1) (line students year if university==2, xline(2011)), xlabel(2003(2)2014) legend(lab(1 "Erasmus") lab(2 "Tilburg") )

however I get the error
xlabel(2003(2)2014) is not a twoway plot type

then I tried putting in
twoway (line students year if university==1) (line students year if university==2, xline(2011)) legend(lab(1 "Erasmus") lab(2 "Tilburg") )

but then I get the error
option ) not allowed

how do I get past this error to get my plot?

Combining two parents education variables to one variable for both

$
0
0
Dear Stataforumists,

I am currently having some trouble combining two variables into one. The reason for this is my data set has a high non-response on the father's education's variable, and using both mother and father in analyses makes a drop in observations. The aim is to increase N by combining the two variables.

My dataset has N=208. The mom variable has n=161 recorded obs, while dad n=114.
The two variables (momeducation, dadeducation) is divided like this:
Mom (morutd) Dad (farutd)
Short education (1) 17 26
Upper Secondary (2) 93 56
Long education (3) 51 32
missing 47 64
To combine them I have tried three combinations in Stata, but none give me the desired result.

Code:
g edu=.
replace edu=farutd_spes if morutd_spes==.
replaceedu=morutd_spes if farutd_spes==.

replace edu=morutd_spes if morutd_spes>farutd_spes & morutd_spes<.
replace edu=farutd_spes if farutd_spes>morutd_spes & farutd_spes <.
Here the end result is 64% missing. 1 becomes n=2. 2 n=39. 3. n=33. So this makes no sense.

The other code I tried is

Code:
g edu=.


replace edu=farsutd if morsutd==.

replace edu=morsutd if farsutd==.

replace edu=morsutd if morsutd>farsutd & farsutd <. & morsutd<.

replace edu=morsutd if morsutd>farsutd & farsutd <. & morsutd<.
Also with the same effect, but higher missing.


Any ideas or thoughts on how to increase N by combining the two variables?

Thanks in advance.

SUR vs GMM, 3SLS vs GMM

$
0
0
Hello,

I'm trying to understand how GMM works and noticed that GMM produces the same estimates as OLS when there's no instrument as it should. For instance,

clear

set obs 10000


generate y1 = runiform()
generate y2 = runiform()
generate y3 = runiform()

generate x1 = runiform()
generate x2 = runiform()
generate x3 = runiform()

generate z1 = runiform()
generate z2 = runiform()
generate z3 = runiform()


reg y1 x1 z1
gmm (y1-{b1}*x1 - {b2}*z1 - {a}), inst (x1 z1)

The two models above produce the exact same results.

However when I estimate a system of equations using SUR and GMM, the two models produce similar, but different results.

* different
sureg (y1 x1) (y2 x2)
gmm (y1-{beta1}*x1- {a1}) (y2-{beta2}*x2 - {a2}), inst (x1 x2) winitial(unadjusted, independent) quickderivatives

The results are also different when I use instruments and run 3sls and GMM,
reg3 (y1 x1) (y2 x2), exog(z1 z2) endog(x1 x2) 3sls
gmm (y1-{beta1}*x1- {a1}) (y2-{beta2}*x2 - {a2}), inst (z1 z2) winitial(unadjusted, independent) quickderivatives

Although from this post (https://www.statalist.org/forums/for...-fit-the-model), it seems I should get the exact same estimates.

Can anyone see what's wrong? I'd appreciate any advice!

Best,
Ara

Expand spell data into a panel with specific reoccuring observation dates

$
0
0
Dear Statalisters,

I have an administrative dataset that is updated on the 1st and 15th day of every month.
It looks somewhat like this:
Code:
clear
input long id  str10( var1 begin end spell )
1 x 01.01.2006 14.01.2006 1
1 y 15.01.2006 28.02.2006 2
1 z 01.03.2006 31.12.2006 3
2 a 01.01.2006 29.02.2008 1
end
gen SIFbegin=date(begin, "DMY")
gen SIFend=date(end, "DMY")
format SIFbegin %td
format SIFend %td
Hovewer, unlike a panel, if nothing changes (e.g. var 1 stays the same) the endate of the spell is just updated to the day before the next update and no new entry is made (i.e. no observation for the preceding time period).If something had changed within that ~15 day period from the previous update, the enddate would be set to end the day before the update and the change in var1 would get a new entry (new spell) with the begin date set to the date of the update and the enddate to the day before the next update. I now want to bring this into a panel form that includes the update cycles where "nothing happened" and would ideally look like this:

Code:
clear
input long id  str10( var1 begin end  spell)
1 x 01.01.2006 14.01.2006 1
1 y 15.01.2006 31.01.2006 2
1 y 01.02.2006 14.02.2006 2
1 y 15.02.2006 28.02.2006 2
1 z 01.03.2006 14.03.2006 3
1 z 15.03.2006 31.03.2006 3
1 z 01.04.2006 14.04.2006 3
1 z 15.04.2006 30.04.2006 3
. . . . .
. . . . .
. . . . .
1 z 15.12.2006 31.12.2006 4
2 a 01.01.2006 14.01.2006 1
. . . . .
. . . . .
. . . . .
2 a 15.02.2008 29.02.2008 1
end
gen SIFbegin=date(begin, "DMY")
gen SIFend=date(end, "DMY")
format SIFbegin %td
format SIFend %td
NOTE: the dots are not missing values, just me skipping over the interjacend cycles

My first(admittedly somewhat naive) attempt at doing this transformation relied on using the stata week-date format and expanding by the difference (in weeks) between begin and end date:
Code:
clear
input long id  str10( var1 begin end spell )
1 x 01.01.2006 14.01.2006 1
1 y 15.01.2006 28.02.2006 2
1 z 01.03.2006 31.12.2006 3
2 a 01.01.2006 29.02.2008 1
end
gen SIFbegin=date(begin, "DMY")
gen SIFend=date(end, "DMY")
format SIFbegin %td
format SIFend %td

gen WEEK_begin=wofd(SIFbegin)
gen WEEK_end=wofd(SIFend)
format WEEK_begin %tw
format WEEK_end %tw
gen duration=WEEK_end-WEEK_begin
replace duration=duration/2
expand duration
But at this point I realized this would of course give me paneldates that do not coincide with my update cycles, as calenderweeks do not really overlap with them and that of course these update cycles can have varying lenghts of 13, 14, 15 or 16 days (depending on the month and wether or not there are leap days):

lenght 13: 15th-28th of a regular february
lenght 14: 1st-14th of a month and 15th-29th in february of a leap year
lenght 15: 15th-30th of a month
lenght 16: 15th-31st of a month

And now I am at a loss for a straightforward viable solution. The Problem is that my paneldates have to exactly fit the update cycle as I want to merge this data with another such dataset (with the same problem) but the id only uniquely identifies observations within one update cycle (i.e. between one begin and end date)..
I somehow have a feeling that my problem is trivial and the solution is just arround the corner and I am just not seeing it.

I would greatly appreciate any help and feedback!

Cheers,

Franz

Remove space from file name

$
0
0
Hello all,

I am working with a bunch of excel files and wish to re-save them without any spaces in the file name. Have tried to run the following code, however i encountered an error with "type mismatch". I am not sure how to fix this. Please advise.

Below is a picture of the folder and filenames that Stata is reading

Array

Code:
set more off
clear
local myfiles: dir "C:\Users\Eprifellow\Dropbox (EPRI)\EPRIProject2019UNICEFMontenegro\Deliverables\Part4\SWIS" files "*.xlsx"

foreach file of local myfiles {
    local subfile = substr("`file'"," ", "")
    !rename "`file'" "`subfile'"
}

Below is the error code runs into

Code:
. local myfiles: dir "C:\Users\Eprifellow\Dropbox (EPRI)\EPRIProject2019UNICE
> FMontenegro\Deliverables\Part4\SWIS" files "*.xlsx"

. 
. foreach file of local myfiles {
  2.         local subfile = substr("`file'"," ", "")
  3.         !rename "`file'" "`subfile'"
  4. }
- foreach file of local myfiles {
- local subfile = substr("`file'"," ", "")
= local subfile = substr("1  Number of all users by CSW-DEC-18.xlsx"," ", "")
type mismatch
  !rename "`file'" "`subfile'"
  }
r(109);

end of do-file

r(109);

Add line to sts graph (kaplan meier)

$
0
0
Im having trouble with overlaying two graphs.


I have 3208 observations for my survival graph (curve1).
I have 16 coordinates for curve2
Code:
 *bagX
set obs 3224
gen bagX = .
replace bagX = 0 if _n == 3209
replace bagX = 1 if _n == 3210
replace bagX = 2 if _n == 3211
replace bagX = 3 if _n == 3212
replace bagX = 4 if _n == 3213
replace bagX = 5 if _n == 3214
replace bagX = 6 if _n == 3215
replace bagX = 7 if _n == 3216
replace bagX = 8 if _n == 3217
replace bagX = 9 if _n == 3218
replace bagX = 10 if _n == 3219
replace bagX = 11 if _n == 3220
replace bagX = 12 if _n == 3221
replace bagX = 13 if _n == 3222
replace bagX = 14 if _n == 3223
replace bagX = 15 if _n == 3224

*bagY
gen bagY = .
replace bagY = 100 if bagX == 0
replace bagY = 96.5 if bagX == 1
replace bagY = 92.9 if bagX == 2
replace bagY = 89 if bagX == 3
replace bagY = 85 if bagX == 4
replace bagY = 80.8 if bagX == 5
replace bagY = 76.3 if bagX == 6
replace bagY = 71.7 if bagX == 7
replace bagY = 66.9 if bagX == 8
replace bagY = 62 if bagX == 9
replace bagY = 57 if bagX == 10
replace bagY = 52 if bagX == 11
replace bagY = 46.9 if bagX == 12
replace bagY = 41.9 if bagX == 13
replace bagY = 37 if bagX == 14
replace bagY = 32.3 if bagX == 15


sts graph, by(ES2)ylabel(1 "100" .8 "80" .6 "60" .4 "40" .2 "20" .0 "0") ytitle("Mortality") xlabel(0(2.5)15) tmax(15) addplot(line bagY bagX)



The result is a abnorm graph all over the place.
What am I missing?
Viewing all 73030 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>