-zipuse- unable to recognize .dta files inside zip folders

April 14, 2020, 3:57 am

≪ Previous: Stata Codes for Post-estimation Tests

Dear All,

I am trying to use -zipuse- to use and append a number of files that are inside some zip folders contained in a specific directory. Each zip folder contains one .dta file and the name of the .dta files that are inside the zip folders are a random list of digits and characters (e.g. aa48229snfjka.dta). Before giving further details, I'd like to specify that I am using Stata 13 on Windows 10, and that both Stata and the operating system are up to date.

The code I have been trying to use is the following:

Code:

local sourcedir "C:/DocumentiRV/ERIM/research/writing/ownership characteristics and tax inversions/2020/data/shareholder information"

local fls : dir "`sourcedir'" files "*people.zip", respectcase
local flsC : dir "`sourcedir'" files "*companies.zip", respectcase

tempfile people companies

save `people', emptyok
save `companies', emptyok

    foreach f of local fls {
            zipuse "`sourcedir'/`f'", clear
            append `people'
            save `people', replace        
    }


    foreach fC of local flsC {
            zipuse "`sourcedir'/`fC'", clear
            drop annualreportdate rowtype
            duplicates drop
            append `companies'
            save `companies', replace        
    }

merge m:1 directorid using `people', nogen

zipsave "`sourcedir'/BoardEx All"

When I run this code, as soon as -zipuse- tries to use the first zip file, Stata gives the following error--this is what the trace shows. (I am just reporting the part of the log where the error is generated.)

Code:

----------------------------------------------------------------------- end zipuse._ok2use ---
  - tempfile tmpdat
  - if "`dtafile'" == "" {
  = if "" == "" {
  - shell unzip -p "`zipfile'" > `tmpdat'
  = shell unzip -p "C:/DocumentiRV/ERIM/research/writing/ownership characteristics and tax inversi
> ons/2020/data/shareholder information/BoardEx Europe people.zip" > C:\Users\RICCAR~1\AppData\Loc
> al\Temp\ST_02000003.tmp
  - }
  - else {
    shell unzip -p "`zipfile'" "`dtafile'" > `tmpdat'
    }
  - use `initlist' `usind' `tmpdat', clear `options'
  = use   C:\Users\RICCAR~1\AppData\Local\Temp\ST_02000003.tmp, clear 
file C:\Users\RICCAR~1\AppData\Local\Temp\ST_02000003.tmp not Stata format
    if "`dtafile'" == "" {
    global S_FN = "`zipfile'"
    }
    else {
    global S_FN = "`dtafile'"
    }
    }
  --------------------------------------------------------------------------------- end zipuse ---

In practice, it means that the file inside the zip folder is not in Stata format, but I can guarantee that it is since I can open it manually. The developer of -zipsave- argues that to make sure that the package works at all, one should try to zip a random Stata file using -zipsave-. I tried that and, unfortunately, also that doesn't work. Specifically, -zipsave- creates a zip file, but when I try to open it manually using Windows File Explorer, the file contains some nested folders and a file that cannot be opened with any app. Another thing a user should do to check whether -zipsave/zipuse- work is to enter the code:

Code:

shell which zip

This should return a meaningful path. Perhaps it is needless to say that this also doesn't work--no meaningful path is returned. What the developer of the package suggests then if -zipsave- doesn't work is to install Info Zip (http://infozip.sourceforge.net/). Yet, Info Zip doesn't seem to be supported on Windows 10. I tried to install it anyway, but this doesn't change the situation.

I hope someone can offer advice or a workaround to avoid using zipuse altogether.

↧

RDD graph help

April 14, 2020, 4:06 am

≫ Next: Ignoring missing value in egen max

≪ Previous: -zipuse- unable to recognize .dta files inside zip folders

Hello everyone,

I am currently writing my thesis and doing an RDD. It is very intersting but I am now stuck at a weird problem. When I try to plot my data, the regression line is not fitted to the dots from the data.
I wonder why that is happening? I dont have many data points, might that be the reason? When I run this command without any covariates then the regression line fits the dots.

Used code:
[rdplot $Y $running, c($cutoff) nbins(10 3) p(2) binselect(esmv) kernel($kernel) covs($cov)]

Background: I am investigating the effect of voting system on corruption. In Brazil, municipalities with more than 200 000 inhabitants use the dual ballot system, while cities with less than 200 000 citizens use the single ballot system. Exploiting this threshold leads to a sharp RDD. My theory is that municipalities with over 200 000 inhabitants have better politicians, more competition and hence this system leads to less corruption.

I am sorry if this format of question is not appropriate for your stadards, very first time asking a question

Array

↧

Ignoring missing value in egen max

April 14, 2020, 4:18 am

≫ Next: Keep values in between a range (dates and times)

≪ Previous: RDD graph help

Dear all,

I have a panel data and would like to create a dummy variable using the following:
bys bvd_id: egen filter=max(X>16 & year==2018); however, if the observation of X in 2018 is missing (.), then egen also generates 1, instead of 0. How can I treat that missing value as zero? I tried the following:

bys bvd_id: egen filter=max(missing(X>16 & year==2018)); This slightly works, but now if I have a missing value for year other than 2018, then the result is also 0.

Really appreciate your help.

Best,

Abdan

↧

Keep values in between a range (dates and times)

April 14, 2020, 4:35 am

≫ Next: Consider matched pairs when dropping study subjects

≪ Previous: Ignoring missing value in egen max

Hello everyone,
I am quite new to stata so I would really appreciate your help.
I have date-time variables which I converted from string and they are in this form:
09jan2013 13:43:18 (together in one cell).
I would like to keep variables from the same day but in a specific time range e.g. all the variables on 09jan2013 in the range of 19:00:00 to 19:30:00.
Thank you in advance,
Tim

↧

Consider matched pairs when dropping study subjects

April 14, 2020, 5:28 am

≫ Next: Missing Variables in Panel Data Across the Waves

≪ Previous: Keep values in between a range (dates and times)

Hi Statalist,

I have a large matched cohort data set where exposed have been matched 1:m with an unexposed group (siblings). One exposed can have multiple matched siblings (e.g. 1-10 siblings).

My matching variable indicator is called set_id.

The set_id variable has 9,859 unique values.
- No. of exposed = 9,859
- No. of unexposed/siblings = 17,220

How do I keep this matching when I drop some study subjects that are not relevant for my specific study (e.g. exclude individuals with observations before a given year)?
- If I drop any exposed individuals, all of their matched siblings should also be dropped.
- Conversely, if I drop any siblings, cases should only be dropped if there is no other matched sibling for this matched pair.

How do I write a Stata code that condition on this?

Thanks a lot in advance.

↧

Missing Variables in Panel Data Across the Waves

April 14, 2020, 5:58 am

≫ Next: panel data analysis

≪ Previous: Consider matched pairs when dropping study subjects

Hello!

I have a rather "practical" question. If you have a panel data with T=3, but some of the variables are only measured in the first two waves, what would be the best strategy to account for these variables, if the only option is not to discard them?

I estimate a within-between model, namely, certain variables are treated as fixed effects specifications, and some other variables are treated as random effects specifications:

Is it best to add them as they are (FE specifications), in the first two waves, but what would be the interpretation in the three waves setting?

Thanks!

↧

panel data analysis

April 14, 2020, 6:32 am

≫ Next: is there any easy command or calculation for the process

≪ Previous: Missing Variables in Panel Data Across the Waves

Hi,
Should we check multicollinearity in panel data by VIF method? if we are getting multicollinearity, then how to remove that multicollinarity?

↧

is there any easy command or calculation for the process

April 14, 2020, 6:56 am

≫ Next: ATE coefficients comparison (SUEST) between multiple psmatches

≪ Previous: panel data analysis

↧

ATE coefficients comparison (SUEST) between multiple psmatches

April 14, 2020, 7:03 am

≫ Next: Foreach Loop with Use/Save Commands

≪ Previous: is there any easy command or calculation for the process

I am trying to compare two coefficients between two years after psmatch. Here are my commands:

1. teffects psmatch (outcome) (treat variable) if YEAR==2009, vce(robust)
estimate store nine

2. teffects psmatch (outcome) (treat variable) if YEAR==2018, vce(robust)
estimate store eighteen

3. suest nine eighteen

STATA showed this error:
"nine was estimated with a nonstandard vce (robust)"

4. When I tried vce(iid) instead of vce(robust), STATA showed this error:
"unable to generate scores for model nine
suest requires that predict allow the score option"

What can I change for the suest command to work? Any advice is greatly appreciated. Thanks in advance.

↧

Foreach Loop with Use/Save Commands

April 14, 2020, 7:13 am

≫ Next: Should I remove extreme groups/panels (so called outliers) from data for testing interaction?

≪ Previous: ATE coefficients comparison (SUEST) between multiple psmatches

Hi. I am trying to run a loop in order to use and save a number of data files, all of which are identical except for the year in which the data were collected.

At present, I have written (obscuring the path details with ...):

foreach x in 2012 2013 2014 2015 2016 2017 2018{
use `"C:/Users/.../.../.../.../.../.../.../`x'/dct_hd`x'.dta"', clear
keep if pset4flg ==1
gen year = `x'
keep unitid inst year
save `"C:/Users/.../.../.../.../.../.../.../`x'/inst.dta"', replace
}

This generates the following error:

invalid `"C:/Users/.../.../..../.../.../.../2012/dct_hd2012.dta'

Thanks in advance for your consideration.

↧

Should I remove extreme groups/panels (so called outliers) from data for testing interaction?

April 14, 2020, 7:24 am

≫ Next: Using coefplot to plot estimates (relative risk ratios) from mlogit models

≪ Previous: Foreach Loop with Use/Save Commands

I am testing interactions between variables of interest using data of 23 countries for 11 years using xtreg and ivregress 2sls. I got significant interactions between several variables. I am sharing the results as:

	(1)	(2)	(3)	(4)	(5)
VARIABLES	2sls	2sls	2sls	2sls	2sls
lnGDPGR	0.434	0.763***	0.460	0.507	0.169
	(0.404)	(0.201)	(0.379)	(0.425)	(0.303)
lnINDPR	0.637	-1.194	0.886	1.193	-0.409
	(1.296)	(0.917)	(1.207)	(1.490)	(0.960)
lnTAXREV	-0.203	-0.197	0.102	0.032	0.216
	(0.254)	(0.301)	(0.357)	(0.354)	(0.306)
lnREX	0.357	0.540	4.133***	2.550	2.417**
	(2.196)	(1.824)	(1.559)	(1.736)	(1.233)
lnCPI	-5.096***	-1.905	-10.080***	-9.003**	-4.246***
	(1.833)	(1.278)	(3.847)	(3.648)	(1.144)
lnINTEXP	0.767***	0.668**	0.318	0.713***	0.289**
	(0.171)	(0.335)	(0.209)	(0.206)	(0.140)
lnINT	7.334***	-48.531***	4.916***	8.035***	2.124**
	(1.271)	(18.679)	(1.586)	(2.120)	(0.887)
lnFMD	-14.770***		-20.536***	-11.843***
	(2.626)		(5.247)	(2.788)
lnEF		-105.751**	-0.053	-2.863
		(41.992)	(4.581)	(4.049)
c.lnINT#c.lnFMD	3.562***
	(0.578)
c.lnINT#c.lnEF		26.433***
		(9.963)
c.lnEF#c.lnFMD			10.286***
			(2.541)
c.lnINT#c.lnFMD#c.lnEF				1.410***
				(0.300)
lnRD					-8.950**
					(4.316)
c.lnINT#c.lnRD					2.190**
					(0.969)
Year Dummies	Yes	Yes	Yes	Yes	Yes
#Observations	228	228	228	228	228
#Countries	22	22	22	22	22
R Squared	0.4324	0.4166	0.4375	0.3371	0.5375
Wald chi2	4881.97	1305.74	176798.56	13074.42	7730.59
Prob > chi2	0.0000	0.0000	0.0000	0.0000	0.0000
Robust standard errors in parentheses * p<0.01, p<0.05, * p<0.10

However, when I remove five countries with lowest values of dependent variables where the phenomenon under investigation (depvar) is very weak, I got different results which are as follows (now the number of countries are 18):

	(30)	(31)	(32)	(33)
VARIABLES	2sls	2sls	2sls	2sls
lnGDPGR	-4.473	-0.589	-0.615	-0.696
	(7.285)	(0.597)	(0.547)	(0.871)
lnINDPR	0.595	-0.214	1.642***	1.944**
	(4.749)	(0.867)	(0.527)	(0.967)
lnTAXREV	0.403	0.072	0.095	0.113
	(0.971)	(0.249)	(0.275)	(0.306)
lnREX	-7.097	3.865*	2.694*	2.792*
	(15.375)	(2.337)	(1.377)	(1.468)
lnINT	-828.858	6.251***	3.884***	4.414***
	(1,194.398)	(2.370)	(1.090)	(1.405)
lnFMD			-13.395	-4.875
			(11.655)	(5.366)
lnEF	-1,796.775		-1.602	-3.366
	(2,583.638)		(4.087)	(4.346)
lnRD		16.776*
		(8.975)
c.lnINT#c.lnEF	414.039
	(594.320)
c.lnINT#c.lnRD		-3.902*
		(2.130)
c.lnEF#c.lnFMD			6.944
			(5.766)
c.lnINT#c.lnFMD#c.lnEF				0.637
				(0.632)
Year Dummies	Yes	Yes	Yes	Yes
Observations	190	190	190	190
R-squared	Negative	0.356	0.5345	0.5099
Wald chi2(16)	1225.67	2991.56	3631.55	2471.9
Instrument used	INS6	INS7	INS8	INS9

Just to be clear, I have removed some of the control variables from the models in second table because the number of regressors would outnumber groups/N which i didnt want as it would affect validity of regress(?). Removing control variables is not the problem as I have checked it so.
My question is should I go for the interaction resutls as reported in first table? Or I include both the tables in my article? Or should I drop the idea of interaction at all from this article. Is it a good idea to remove extreme value (outliers) for interactions?

↧

Using coefplot to plot estimates (relative risk ratios) from mlogit models

April 14, 2020, 7:28 am

≫ Next: Efficiency of an algorithm when using a forvalues

≪ Previous: Should I remove extreme groups/panels (so called outliers) from data for testing interaction?

Hi All

I'm trying to use the coefplot package to plot estimates from a series of mlogit models (which gives relative risk ratios). I've used this several times before for plotting graphs of odds ratios and never had any issues. However, I'm struggling to plots estimates from mlogit models

I'm trying to plot estimates from:

1. Six multinomial logistic regression models (using mlogit)
2. The 6 models are stratified by the variable cohort (which is a dichotomous variable indicating 1==cohort1 and 2==cohort2). I'd like to plot the estimates/RRRs by the variable cohort, so I get two sets of estimates indicated by two different colours.
3. The outcome variables are bmi_mh23, bmi_mh33, & bmi_mh42 which are sessentially the same outcome measured at 3 different time points and all coded the same way (all have 4 categories where category 0 is set to be baseline in all 6 models).
4. I'm dropping the estimates of sexatbirth as I'm not interested in these.

The plot I get seems to have three estimates plotted along the same horizontal line (3 estimates from the three groups of of the outcome variable I assume). Which is not what I want - I'd like each estimate to be plotted separately for each of the three groups (other than the baseline category) of the outcome variables bmi_mh23, bmi_mh33, & bmi_mh42.

Also the graph drops the reference categories of the covariates of interest. How could I retain these if I wanted to?

The code I've used so far:

Code:

mi estimate, rrr: mlogit bmi_mh23 i.sexatbirth i.childsesx i.adultsesx if cohort==1, b(0)
est store est1p
mi estimate, rrr: mlogit bmi_mh23 i.sexatbirth i.childsesx i.adultsesx if cohort==2, b(0)
est store est2p
mi estimate, rrr: mlogit bmi_mh33 i.sexatbirth i.childsesx i.adultsesx ib5.education34 if cohort==1, b(0)
est store est3p
mi estimate, rrr: mlogit bmi_mh33 i.sexatbirth i.childsesx i.adultsesx ib5.education34 if cohort==2, b(0)
est store est4p
mi estimate, rrr: mlogit bmi_mh42 i.sexatbirth i.childsesx i.adultsesx ib5.education34 if cohort==1, b(0)
est store est5p
mi estimate, rrr: mlogit bmi_mh42 i.sexatbirth i.childsesx i.adultsesx ib5.education34 if cohort==2, b(0)
est store est6p

coefplot(est1p\est3p\est5p, label(NCDS58)) ///
(est2p\est4p\est6p, label(BCS70)), ///
ci eform xline(1) labels xtitle(Relative risk ratios (with 95% CIs)) xscale(log) ///
msymbol(d) title(RRRs from regression models for co-occurance of overweight/obesity & poor mental health) ///
scheme(plotplainblind) levels(95) ///
drop(_cons 0.sexatbirth 1.sexatbirth)

Many Thanks!

/Amal

Array

↧

Efficiency of an algorithm when using a forvalues

April 14, 2020, 7:43 am

≫ Next: Low Rsquared in ridge regression

≪ Previous: Using coefplot to plot estimates (relative risk ratios) from mlogit models

Hello,

I have been using the next code to calculate linear distance between severals points, but recently I try to compute it only for a few points a each calculation was did too fast. So I decide to check the excution time for diferents sizes for N = 20 I get a execution time of 0.22, but when I took N = 100 the execution time goes up to 3.85, under this perspective would be more efficient to run the code 5 times with a size of 20 (5*0.22 < 3.85).

So I don't know why this is happened or how the code could be improved to avoid a great increase in execution time. Any help would be highly appreciated

Code:

use "$data\Database_distance.dta"  , clear
gsort year mont
egen t_id = group(year month)
drop comp1 comp2

*preserve
keep if t_id == 1

duplicates drop location mun depto, force
duplicates drop retail mun, force
g comp1 = .
g comp2 = .

*set rmsg on
sum lat
forvalues j = 1(1)`N'{

qui g distance`j' = .

local x1 = lat[`j']
local y1 = lng[`j']

forvalues x = 1(1)`N' {

 local x2 = lat[`x']
 local y2 = lng[`x']

qui geodist `x1' `y1' `x2' `y2' , sphere
qui replace distance`j' = r(distance) in `x'
}

qui g comp1`j' = 1 if distance`j' <= 1
qui g comp2`j' = 1 if distance`j' <= 2

drop distance`j'

qui egen tcomp1x`j' = total(comp1`j')
qui egen tcomp2x`j' = total(comp2`j')

local a = tcomp1x`j'[1]
local b = tcomp2x`j'[1]

qui replace comp1 = `a' in `j'
qui replace comp2 = `b' in `j'

drop tcomp1x`j' tcomp2x`j' comp1`j' comp2`j'
dis `j'
}

Thank you so much

↧

Low Rsquared in ridge regression

April 14, 2020, 8:00 am

≫ Next: T test output with descriptives

≪ Previous: Efficiency of an algorithm when using a forvalues

Hi all,

Im currently working on a strongley balanced panel data set. I have 28 countries and 764 observations. I am looking at what factors influence the level of CO2 emissions in selected countries. All my variables are taken in log form, as i am working with the STIRPAT model. My dependent variable is level of CO2 emissions. My independent variables are as follows, GDP per capita, total population, petroelum prodcts usage of the transport sector and the total urban population. I had first used the fixed effects model to estimate my coeffcients but due to the high levels of multicolinearity i decided to use the ridge regression model.

Using the ridgereg command I was able to use the ridge regression model. Preliminary tests found autocorrelation to be an issue, I used the first difference method to remove autocorrelation. The first difference corrected the issue. However, provided results with very low Rsquared. My original Rsquared provided an Rsquared of 90% and the first difference corrections provides 17.25%.

Output of "ridgereg dLogCO2 dLogGDPperCapita dLogTotalPop dLogPetrolTransport dLogUrbanPop, model(orr) kr(0.1) "

↧

T test output with descriptives

April 14, 2020, 8:10 am

≫ Next: Regression output command for a one-tailed hypothesis?

≪ Previous: Low Rsquared in ridge regression

Hi,
To say I am new to stata would be an understatement. I need to create a table with the means, sd and observations for each group and the t-vales. The aim is to create a table identical to the attached. The grouping variable is gender of the employee representative (1=female, 0=male). The first column needs to contain the "labels" not the variable names and lastly, the t-values need to be * if significant. Any help would be very much appreciated and I apologise in advance if my questions are silly or more simple than they should be.

↧

Regression output command for a one-tailed hypothesis?

April 14, 2020, 8:30 am

≫ Next: no space allocated for labeling category axis with graph dot / graph hbar

≪ Previous: T test output with descriptives

Good morning,

How might I export regression tables from Stata to Word so that the interpretation of p-value significance reflects that of a one-tailed hypothesis test? Are there manipulations to outreg2 that I should implement or should I use an entirely different command?

Thank you.

↧

no space allocated for labeling category axis with graph dot / graph hbar

April 14, 2020, 8:31 am

≫ Next: MI impute using stored estimation results

≪ Previous: Regression output command for a one-tailed hypothesis?

I am making dot graphs over various categorical variables, and for one of them (provider), Stata doesn't allocate any space for the labels of the different categories. Both provider (doesn't work) and province (does work) are labeled, numerical variables. Note that it looks fine with the vertical option. I same thing happens using graph hbar : no space allocated to label the different values of provider. (If you open the graph with the graph editor, you see that no space is allocated for grpaxis.)

Code:

. des provider province

              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
provider_       byte    %14.0g     provider_vax
                                              
province        byte    %10.0g     provs17    Province (2017 frame)

. graph dot paid [pw = _pweight], over(care) asyvar over(provider)

. graph dot paid [pw = _pweight], over(care) asyvar over(province)

. graph dot paid [pw = _pweight], over(care) asyvar over(provider) vertical

[ATTACH=CONFIG]temp_17699_1586878099093_143[/ATTACH]

[ATTACH=CONFIG]temp_17700_1586878108843_752[/ATTACH]

[ATTACH=CONFIG]temp_17701_1586878124434_980[/ATTACH]

Data excerpt:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 care float _pweight byte province float paid_ byte provider_
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 1 2
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . 2
"vax"       247.86473 1 0 2
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 0 2
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . 2
"vax"       247.86473 1 . .
"ANC"       247.86473 1 1 2
"sickchild" 247.86473 1 . 4
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 1 4
"vax"       247.86473 1 0 3
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 0 2
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 0 2
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 0 2
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 1 1
"vax"       247.86473 1 0 2
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 1 2
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 1 1
"vax"       247.86473 1 0 2
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 1 1
"vax"       247.86473 1 0 2
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 1 1
"vax"       247.86473 1 0 2
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 1 2
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
"sickchild" 247.86473 1 . .
"vax"       247.86473 1 . .
"ANC"       247.86473 1 . .
end
label values province provs17
label def provs17 1 "East", modify
label values provider_ provider_vax
label def provider_vax 1 "gov't hospital", modify
label def provider_vax 2 "gov't clinic", modify
label def provider_vax 3 "mobile team", modify
label def provider_vax 4 "other", modify

↧

MI impute using stored estimation results

April 14, 2020, 8:32 am

≫ Next: egen by is equivalent to bysort egen?

≪ Previous: no space allocated for labeling category axis with graph dot / graph hbar

Hello!

I am trying to find a way to generate imputations using stored imputation model without re-estimating imputation model in the dataset. To make it a bit more clear, my situation is as follows:

We have already generated imputations in the full dataset using the standard mi impute command.
- I am also hoping that there is a way to retrieve the estimated imputation model and save it to a .ster file.
There is a restricted-access subset of the same dataset that cannot be linked to the original dataset by design. But all the original variables that were used in the imputation in step 1 are present.

So, I was hoping that there is a way to load the imputation model from point 1 into the smaller dataset and use the stored coefficients and standard errors to generate imputed values in the smaller dataset from point 2.

I've been trying to skim the .ado files and search in the forum and google, but could not really find my way around.

If anyone has any suggestions, I'd be grateful!

Thank you!

↧

egen by is equivalent to bysort egen?

April 14, 2020, 8:38 am

≫ Next: FE Model- Different Results

≪ Previous: MI impute using stored estimation results

Hi, I am confused if egen by is equivalent to bysort egen:

For example, I have a panel data for multiple years and I want to get the mean of income for each individuals across waves

I have two command:

1. bysort id: egen incomeMean = mean(Income)
2. egen incomeMean = mean(Income), by(id)

Are these two commands produce the same thing? I tried in my own sample and it showed that the two results are the same, but I just want to make sure these two are really the same.

↧

FE Model- Different Results

April 14, 2020, 9:17 am

≫ Next: Combining marginplots for 4 binary independent variables on one line graph (student)

≪ Previous: egen by is equivalent to bysort egen?

Hello,

I am running a FE regression model with 5 independent variables. The results contend that 2 of the variables are statistically significant. However, when I run the regressions using each of the variables separately- again using FE- I get different results. More specifically, variables that in the first model , i.e. when all the variables were tested together, were not significant, result in fact significant when tested separately.

On the other hand, the R-sq within it is for all of the regressions considerably low.

My question in this case is where should I focus the interpretations of my study, which model is better of the two, provided the information as above?

↧