How to fill in missing values across variables with last observed 0 value?

February 28, 2017, 6:56 am

≫ Next: Help: caterpillar plot or ranked box plot using Stata 14

≪ Previous: how to display (tab) values only if y=0 or y=1 in all 13 waves

Dear all

I have several variables (wide format) and would like to fill in missing values going forward based on the last observed zero. Here is what my data looks like. So, for observation 4, I would like to fill in the last two columns with zero. I would appreciate any advice. Many thanks!

HTML Code:

     +--------------------------------------------------------------------------------------+
     | T1_0_25   T2_30_55   T3_60_85   T4_9~110   T5_1~140   T6_1~175   T7_1~205   T9_2~265 |
     |--------------------------------------------------------------------------------------|
  1. |       .          .          .          .          .          .          .          . |
  2. |       .          .          .          .          .          .          .          . |
  3. |       .          .          .          .          .          .          .          . |
  4. |       6       6.75          .       3.25          .          0          .          . |
  5. |   6.625          .      2.625          .          0          .          .          . |

↧

Help: caterpillar plot or ranked box plot using Stata 14

February 28, 2017, 7:16 am

≫ Next: Time ranges as time variable when using the Stata xtset command with panel data?

≪ Previous: How to fill in missing values across variables with last observed 0 value?

Dear all,

I would like to plot mean serum concentrations and confidence intervals of a metabolite for 10 different cohorts. All data needs to appear on the same graph.

My colleagues have used SAS or R - both of which have direct codes for "caterpillar plots"
The caterpillar plots give a nice ranking to the data and automatically adjust the display of the cohorts (or grouping variable) from lowest metabolite mean to highest metabolite mean (for example).

I do not wish to change software, so any help with how to accomplish this using Stata would be most appreciated.

I tried a horizontal box plot which gave me mean metabolite concentration and confidence intervals for all cohort on the same graph (cohort is my grouping variable 1). This gave me an output similar to a "caterpillar plot" with the exception that the cohort means were not ranked from lowest metabolite mean to highest metabolite mean.

I then tried to edit the graph hoping I could drag and drop the cohorts to change the order that they appear on the graph.

I must need to "rank" the data somehow, but it is not obvious to me how to do this.

Please advise.
Thank-you!

↧

Time ranges as time variable when using the Stata xtset command with panel data?

February 28, 2017, 7:29 am

≫ Next: Beginner question: assign values to a variable according to a dataset with an index

≪ Previous: Help: caterpillar plot or ranked box plot using Stata 14

I have panel data with which I would ultimately like to do a fixed effects regression with Stata. My panels are countries and I have a multi-level structure (observations are different military operations within countries). My time variable - if I use one - is not simply years like 2001, 2002 etc. but time ranges, for example 1.5.2003-1.4.2005 etc., with some years included multiple times and others not included at all.

Is it advisable to simply exclude the time factor or is there a way to specify the time range using xtset?

↧

Beginner question: assign values to a variable according to a dataset with an index

February 28, 2017, 7:32 am

≫ Next: "Variable not found" upon attempting to test, after running regression model.

≪ Previous: Time ranges as time variable when using the Stata xtset command with panel data?

Hi all,

I have a dataset with 3 variables, and 1 index (I don't have the formula with which someone calculated the index).

In this database I have all possible combinations of variables (83 obs), and their corresponding index value.

var ..... var2 ..... var3 ..... index
1 ..........1.......... 1.......... -0.15
1.......... 2 ..........1.......... -0.12
1.......... 1.......... 2.......... -0.11
.
.
.
2.......... .n ..........2 ..........1.32
2 ..........1 ...........m ..........1.06
.
.
.
.m ...........m ...........m ...........m

I have another database, with 3000 observations, but only with the 3 variables, I don't have the index value.

var1..... var2..... var3
1 ..........1.......... 1
1 ..........2.......... 1
2 ..........m .........1
.m .........n..........1
.
.
.
I need to create an index variable, and assign the corresponding value according to the combination of the three variables. Can someone give me a hand, please? I have no idea how to do it.

Best regards,
Nicolas

↧

"Variable not found" upon attempting to test, after running regression model.

February 28, 2017, 7:38 am

≫ Next: Type mismatch and outreg2

≪ Previous: Beginner question: assign values to a variable according to a dataset with an index

Hello, I'm using Stata 13 and working with complex survey data (DHS 2013). I'm having problems testing my predictor variables after running my model:

Please see document attached for the output/response.

Same thing happens even when I attempt to test the predictors individually. (NB: I tested the outcome variables successfully though).

Thank you for your help.
Som

↧

Type mismatch and outreg2

February 28, 2017, 7:40 am

≫ Next: testing for equality of regression coefficients for separate samples - growth curve models and mi estimation

≪ Previous: "Variable not found" upon attempting to test, after running regression model.

Dear All,

I am trying to obtain the results of a regression by making use of the following commands on Stata 14.0.:

quietly xtreg dm d.iad d.rmp nber_crisis, fe
estimates store fixediad0
outreg2 using nolag.xls, e(sigma_u sigma_e rho N rmse theta r2_o r2_b r2_w) se dec(8)
quietly xtreg dm d.iad d.rmp nber_crisis, re
estimates store randomiad0
outreg2 using nolag.xls, e(sigma_u sigma_e rho N rmse theta r2_o r2_b r2_w) se dec(8)
hausman fixediad0 randomiad0

I am able to obtain the results of the first outreg2, but when I run the second outreg2, which is aimed to capture the results of the random effects model, I obtain the following error message:

type mismatch
r(109);

I double checked for typing mistakes, but there seem to be absent.

I run two similar regressions:

quietly xtreg dm L.dm d.gdpv L.d.gdpv d.rmp L.d.rmp nber_crisis, fe
estimates store fixedgdpv
outreg2 using onelag.xls, e(sigma_u sigma_e rho N rmse theta r2_o r2_b r2_w) se dec(8)
quietly xtreg dm L.dm d.gdpv L.d.gdpv d.rmp L.d.rmp nber_crisis, re
estimates store randomgdpv
outreg2 using onelag.xls, e(sigma_u sigma_e rho N rmse theta r2_o r2_b r2_w) se dec(8)
hausman fixediad randomgdpv

but there is no such type of error.

I am wondering, but I am probably wrong, that the problem has to do with the variables.

Any suggestion will be strongly appreciated.

Thank you.

↧

testing for equality of regression coefficients for separate samples - growth curve models and mi estimation

February 28, 2017, 7:52 am

≫ Next: Problems with merging datasets - unique ID repeats for twins/triplets for some datasets but not others

≪ Previous: Type mismatch and outreg2

Hi everyone. I've ran separate growth curve models on four different ethnic groups in my sample (sample syntax for one of the groups below). I'd like to test the equality of regression coefficients across these models.

1. Is there a way to test for equality in Stata combining both the complex survey AND the multiple imputation?
2. Given that I already have the coefficients for each model, is there a way I could test for equality by hand?

mi estimate, cmdok errorok: meglm viol wvage female i.imm_gene familyses adultses i.w1dfamilism w1adolnsat w1soccoh nresstab neconwel i.pctwhite i.pctblack i.pctforeign || psuscid:, pweight(mlmexwt2) || aid:, pweight(mlmexwt1) family(binomial) link(logit)

Lorena

↧

Problems with merging datasets - unique ID repeats for twins/triplets for some datasets but not others

February 28, 2017, 8:07 am

≫ Next: Graph - Rarea made on PC invisible on Mac.

≪ Previous: testing for equality of regression coefficients for separate samples - growth curve models and mi estimation

Hello everyone,

I have run into a problem when attempting to merge datasets on stata 14. I am using panel data, with three cohorts, called the “Child of the new millennium”. It contains data on children born in 2000. I am currently attempting to merge the datasets within each cohort. But the problem is, is that the ID does not uniquely identify each response in some datasets – in particular data-sets asking specific questions about the child. This is because the sample includes twins/triplets so, for about 300-400 observations, there is more than one code. There is a dummy variable included which allows me to identify twins and triplets but apart from that there is no way to identify them.

But in other datasets, the ID appears only once so I am able to successfully merge in these.

Unfortunately, I need to keep the twins and triplets in my sample – I was wondering if anyone could help? I'd really appreciate any advice that can be given

I am new to both stata and this forum, please let me know if any more information is needed!

Thanks in advance, Kishan

↧

Graph - Rarea made on PC invisible on Mac.

February 28, 2017, 8:15 am

≫ Next: Display narrow studies with narrow confidence interval in Forest plot (meta-analysis).

≪ Previous: Problems with merging datasets - unique ID repeats for twins/triplets for some datasets but not others

I create an area graph on my Windows machine:

Code:

twoway (rarea np5 np95 dist, sort fcolor(gray) fintensity(inten20) lcolor(white)) (line zero dist, sort msymbol(none) clcolor(black) clpat(dash) clwidth(thin)) (line yhat dist, sort msymbol(none) clcolor(black) clpat(solid) clwidth(medium)) (scatter dfact dist if sig5, sort msymbol(+) msize(medium) mcolor(red)) (scatter dfact dist if sig10, sort msymbol(O) msize(medium) mcolor(red)) (scatter dfact dist if (!sig5 & !sig10), sort msymbol(Oh) msize(medium) mcolor(black)), legend(off) graphregion(color(white)) xtitle("Distance to Factory") ytitle("Estimated Effect") xlabel(0(1)20) xsc(r(0 20)) ylab(, nogrid) title("")

graph export myPlot.pdf

On my machine, a png screenshot of this PDF looks like:
Array

However, my collaborator on a Mac cannot see the shaded region; it appears white. Have others faced this problem? How would I fix it?

I've tried other settings for the plot by no luck.

↧

Display narrow studies with narrow confidence interval in Forest plot (meta-analysis).

February 28, 2017, 8:18 am

≫ Next: Crash when using xtabond and/or xtdpdsys

≪ Previous: Graph - Rarea made on PC invisible on Mac.

I have 28 studies and which of them have wide confidence interval(1.2-98) so that it is not let to line of other studies to show. I used command such as xlabel(0.01, 0.05, 0.1, 0.5 , 1, 5 ,10 ,100) but it is not usefulness. please guide me

↧

Crash when using xtabond and/or xtdpdsys

February 28, 2017, 9:33 am

≫ Next: Measuring distance between control and treatment villages

≪ Previous: Display narrow studies with narrow confidence interval in Forest plot (meta-analysis).

Hey,

I try to estimate a dynamic panel using xtabond (and I've also tried xtdpdsys). Unfortunately Stata (and a few seconds later the whole computer) crashes without giving an error message. Looking at the task manager shows that Stata uses 10GB+ in a few seconds. Before running the commands I used xtset (in case the commands make problems if this hasn't been done). Any suggestions what could be wrong? Thanks in advance!

Best,
Thomas

↧

Measuring distance between control and treatment villages

February 28, 2017, 10:05 am

≫ Next: diagnostic methods for GEE models

≪ Previous: Crash when using xtabond and/or xtdpdsys

Hey,
I have a dataset containing the name of the villages, their coordinates (longitude and latitude) and a dummy variable representing the fact of being assigned to a program (1= treatetment, 0=control). I want to calculate the shortest distance of each "control" village to a "treated" one.
I tried to use the command Geonear in the following way:

geonear Village Latitude Longitude using "file", n(Village Latitude Longitude) ignoreself

But of course it is calculating the shortest distance to ANY villages, not to just the control to a treated one. I wonder if there is an option to integrate the above-written formula, or if there is an other way to get to my goal.

Thank you in advance for your help,

Matteo

↧

diagnostic methods for GEE models

February 28, 2017, 10:32 am

≫ Next: Probit/Logit with Panel Data. Should I use probit or xtprobit?

≪ Previous: Measuring distance between control and treatment villages

Hi Stata team,
I have continuous outcome that is measured repeatedly over time and I wanted to run the analysis trying the Random effect model but the outcome is not normally distributed. so I tried the ladder command to check what would be the most optimum model and according to the Q-Q plot, the log of the outcome was the most normal.
so I tried the xtgee command to build the model and use the log as the link. however I am not sure of my model because I don't know how to run the post-modelling diagnostics for GEE in STATA.
I was wondering if there is a way to check the residuals with the GEE model. I am not sure if I provided in my question all the elements that are needed for the answer.

thank you very much

↧

Probit/Logit with Panel Data. Should I use probit or xtprobit?

February 28, 2017, 11:08 am

≫ Next: Drop Records Missing Majority of Observations of Variables

≪ Previous: diagnostic methods for GEE models

Hi everyone,

I am using STATA 14 to work with a panel data set of the United States from 2007 to 2015. I want to estimate a discrete choice model but I am not sure whether I should use:

probit dep indep ..., vce (cluster stateid)

or:

xtprobit dep indep ..., pa vce(robust)

I am concerned about serial correlation in my data which is why I am shying away from using a Logit model with fixed effects and using vce(bootstrap) doesn't seem to work:

xtlogit dep indep ..., fe vce(bootstrap)

Essentially, my question is what estimation method to use. Both probit and xtprobit give very different results. Any suggestions would be greatly appreciated!

↧

Drop Records Missing Majority of Observations of Variables

February 28, 2017, 11:38 am

≫ Next: Problem with rxridge and rxrmaxl

≪ Previous: Probit/Logit with Panel Data. Should I use probit or xtprobit?

I have a data set in which several records are missing observations for the majority of the variables (in Excel, several rows would be blank for the majority of the columns). Is there a command to remove these particular records. I don't want to haphazardly drop records that might be missing the occasional observation, just those that are mostly missing (195 missing out of 210 variables).

↧

Problem with rxridge and rxrmaxl

February 28, 2017, 1:49 pm

≫ Next: Panel data: How to replace values with the first value?

≪ Previous: Drop Records Missing Majority of Observations of Variables

I am trying to do ridge regression as penalised logistic regression in STATA to deal with multicollinearity. I have established most likely Q-shape =-0.5. My dependent variable is cascontrbl and the independent variables are V20 V30 V40 V50 V60 V65 V70. When I type:

rxridge cascontrbl V20 V30 V40 V50 V60 V65 V70, qshape(-0.5)

I get the following:

RXridge: Shrinkage Path has Qshape =-0.50
RXridge: Adjusted response sum-of-squares = 537
RXridge: OLS Residual Variance = .94723762
RXridge: Variance of Principal Correlations = .00176394
MCAL = 0.000 ... True OLS Summed SMSE = .16116564
.25 invalid name
r(198);

I don't understand where the invalid name comes from, any help much appreciated.

I get a similar problem when I type:

rxrmaxl cascontrbl V20 V30 V40 V50 V60 V65 V70, qshape(-0.5)

I get the following:

RXrmaxl: Shrinkage Path has Qshape =-0.50

RXrmaxl: Estimated Sigma = .97326133

RXrmaxl: Uncorrelated Components... Number of obs = 538
------------------------------------------------------------------------------
cascontrbl | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
c1 | .1000702 .0193059 5.18 0.000 .0621447 .1379957
c2 | .0171066 .0360435 0.47 0.635 -.053699 .0879122
c3 | -.1284715 .0589139 -2.18 0.030 -.2442049 -.0127381
c4 | -.0515993 .0942976 -0.55 0.584 -.2368422 .1336435
c5 | .0149306 .1247974 0.12 0.905 -.2302277 .2600888
c6 | .1938187 .1567292 1.24 0.217 -.1140679 .5017054
c7 | .5631838 .3138296 1.79 0.073 -.0533187 1.179686
------------------------------------------------------------------------------

RXrmaxl: 3 Normal, Maximum-Likelihood Shrinkage Criteria...
(Classical, Empirical Bayes, and Random Coefficients)
MCAL = 0.250
.25 invalid name
r(198);

I can't seem to find a reason for this invalid name appearing with both rxridge and rxrmaxl

Thank you for any help here.

Kind regards
Anna

↧

Panel data: How to replace values with the first value?

February 28, 2017, 1:55 pm

≫ Next: stata ml estimation

≪ Previous: Problem with rxridge and rxrmaxl

Dear all

My data are in pane format. I wish create a new variable TT0_2 which will replace all values in TT0 with the first value. So, TT0_2 will just have 6 from row 63 to 76. I tried the following but received an error message saying invalid syntax:

HTML Code:

bys ID : replace tt0_2= min(TT0)

I'd be grateful for any help. Many thanks.

HTML Code:

    ID    TT0    
            
58.    2008    .    
59.    2008    .    
60.    2008    .    
61.    2008    .    
62.    2008    .    
63.    2008    6    
64.    2008    7    
65.    2008    9    
66.    2008    10    
67.    2008    11    
68.    2008    12    
69.    2008    13    
70.    2008    14    
71.    2008    15    
72.    2008    16    
73.    2008    17    
74.    2008    18    
75.    2008    19    
76.    2008    20

↧

stata ml estimation

March 1, 2017, 5:55 am

≫ Next: xmlsave consumers

≪ Previous: Panel data: How to replace values with the first value?

Hello everyone,

I am trying to create a clogit equivalent. Below is my program.

Note that :
1. x1 x2 x3 id choice are harded-coded variables. They have the same name as the one I loaded into the dataset.
2. I am trying to estimate beta1 to beta3, which have nonlinear effect

program myconditional_logit
args todo beta1 beta2 beta3 beta4 lnL
version 11
tempvar den p last xb

gen double `xb' = `beta1' * x1^`beta1' + `beta2' * x2^`beta2' +`beta3' * x3 ^`beta3'
local y choice
local by1 id
sort `by1'
quietly{
by `by1': egen double `den' = sum(exp(`xb'))
gen double `p' = exp(`xb')/`den'
mlsum `lnL' = `y' *log(`p') if `y'==1
if (`todo'==0 | `lnL' > =.) exit
}
end

My function call is
ml model d0 myconditional_logit () () () ()

However, when I try to run the program. It issues the following error:

myconditional_logit 0 __000009 __00000A __00000B __00000C
- `begin'
= capture noisily version 13: myconditional_logit 0 __000009 __00000A __0
> 0000B __00000C
----------------------------------------- begin myconditional_logit ---
- args todo beta1 beta2 beta3 beta4 lnL
- version 11
- tempvar den p last xb
- gen double `xb' = `beta1' * x1 + `beta2' * x2 +`beta3' * x3 +`beta4'
> * x4
= gen double __00000G = __000009 * x1 + __00000A * x2 +__00000B * x3 +_
> _00000C * x4
matrix operators that return matrices not allowed in this context
------------------------------------------- end myconditional_logit ---
- `end'

I am thinking this is because stata treat x1 as a full vector. Is there any way I can make the x1 to be observation specific?

Thanks everyone!

↧

xmlsave consumers

March 1, 2017, 6:21 am

≫ Next: Graph combine

≪ Previous: stata ml estimation

Are there any tools/packages/applications that readily consume XML files produced with Stata's xmlsave? (other than Excel for doctype(excel))

Thank you, Sergiy

↧

Graph combine

March 1, 2017, 7:23 am

≫ Next: Rainfall data

≪ Previous: xmlsave consumers

Hi everyone,

I have two separate graphs which I want to plot together in one graph.
Graph 1 is

Code:

graph twoway line valuecost demand, name(g1,replace) sort || line valuecost supply, ytitle( "Price" ) xtitle( "Quantity" ) yline(60, lpattern(dash)) legend(label(1 "Demand") label(2 "Supply"))

Array

Graph 2 is

Code:

graph twoway scatter price tradenumber,name(g2,replace) by(period,  row(1) compact) xlabel(minmax) yline(60, lpattern(dash)) ytitle( "" ) xtitle( "Trade number" ) yscale(off)

Array

My first question is, although I specified in my command that I want the y axis to be turned off, I still get a graph with y axis. Is the something I am missing?

Next I try to combine them as follows:

Code:

graph combine g1 g2,name(g3, replace) imargin(0 0 0 0)

Array

My goal is to combine them without any space between them. That is the reason I want to suppress the yaxis in graph 2 above. Any ideas on how I could achieve this?

Thank you

P.S- I am using Stata 11.1.
My dataset Looks like this

Code:

----------------------- copy starting from the next line -----------------------


	Code:
	* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(tradenumber price period valuecost demand supply equilibriumprice equilibriumquantity)
 1 100 1 100  0 30 60 20
 2  25 1 100  8 30 60 20
 3 100 1  90  8 30 60 20
 4  99 1  90  8 30 60 20
 5 100 1  80  8 30 60 20
 6 100 1  80 16 30 60 20
 7  78 1  70 16 30 60 20
 8  80 1  70 16 20 60 20
 9  85 1  60 16 20 60 20
10 100 1  60 24 20 60 20
11  80 1  50 24 20 60 20
12 100 1  50 24 20 60 20
13  60 1  40 24 20 60 20
14  47 1  40 24 20 60 20
15  50 1  30 24 20 60 20
16  55 1  30 24 10 60 20
17  35 1  20 24 10 60 20
18  75 1  20 24 10 60 20
19  40 1  20 24 10 60 20
20  60 1  10 24 10 60 20
end
------------------ copy up to and including the previous line ------------------

↧