Difference in Differences Result

March 7, 2020, 6:12 am

≪ Previous: How do I keep "missing values" missing?

Hey everyone! My name is Jesita. I am currently writing a paper regarding the impact of education decentralization on students' standardized test scores using IFLS panel data from 1993 to 2017 with 9,141 observations.
This is my result:

xtreg std_score1 i.MoEC i.Post i.MoEC##i.Post, robust cluster (commid) fe i(fcode)

Fixed-effects (within) regression Number of obs = 9336
Group variable: fcode Number of groups = 4800

R-sq: within = 0.0062 Obs per group: min = 1
between = 0.0114 avg = 1.9
overall = 0.0031 max = 5

F(3,313) = 6.55
corr(u_i, Xb) = -0.1643 Prob > F = 0.0003

(Std. Err. adjusted for 314 clusters in commid)
------------------------------------------------------------------------------
| Robust
std_score1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.MoEC | .0119933 .0575029 0.21 0.835 -.1011477 .1251343
1.Post | .2059144 .0610805 3.37 0.001 .0857341 .3260946
|
MoEC#Post |
1 1 | -.2950334 .0673567 -4.38 0.000 -.4275625 -.1625043
|
_cons | -.0060922 .0395492 -0.15 0.878 -.083908 .0717237
-------------+----------------------------------------------------------------
sigma_u | .94483298
sigma_e | .82235053
rho | .56897805 (fraction of variance due to u_i)
------------------------------------------------------------------------------

The conclusion is that: "decentralization negatively affect students' test scores"
I ran the balanced panel and full sample analysis, with or without controls, and the results are consistent.
My biggest question: How do I know that the result is robust and correct? I know I cannot use "diff" command because my data is panel.
Please help me. Thank you so much.

↧

Question on interaction terms

March 7, 2020, 6:40 am

≫ Next: Question: how to count occurrence with Regex grouped

≪ Previous: Difference in Differences Result

Hello everyone, I am running two multiple linear regression models where the first one needs to have an interaction term between the dummy variable female (1 = female, 0=male) and a categorical variable beauty. The first model needs to have the interaction term between female and beauty, and the second one between male and beauty. I've read that i.female##c.beauty will give me the interaction term between the two variables but how should I write the regress in order to have an interaction term between female and beauty and male and beauty? I have tried the following: for the model with the interaction term between female and beauty I used regress with 1.female##c.beauty and 0.female##c.beauty for the second model but I don't know if this is the correct way to write it and if this is giving me the results I want. I also don't really understand the difference between # and ##.

↧

Question: how to count occurrence with Regex grouped

March 7, 2020, 7:13 am

≫ Next: How to replicate this simple (Bayesian) calculation in Stata

≪ Previous: Question on interaction terms

Dear all!

I want to count the occourences of a variable based on a regex expression - counted by a group of variables.

E.g.:

ID1	ID2	text	result
12	23	Hello	1
12	23	Bye	1
99	23	Hello	1

I have the two combining ID's "ID1" and "ID2" for a group, want to compare variable text with "Hello" and want a new column (result) with the numbers of "Hello"'s in this group.

My first idea was:
egen result=count(regexm(text, "Hello")), by(ID1 ID2)

or

egen result=count(text == "*Hello*"), by(ID1 ID2)

but both isn't working ...

Can you please help me?

Kind Regards
Simon

↧

How to replicate this simple (Bayesian) calculation in Stata

March 7, 2020, 7:22 am

≫ Next: replace values: type mismatch

≪ Previous: Question: how to count occurrence with Regex grouped

Dear Forum Members,

I came a cross a quite simple example of Bayesian analysis (from the book Think Bayes, written by Allen Downeyfor Python users) which can be done directly, I mean, just by using the Bayes' theorem.

In short, there are 2 baskets with cookies. In basket 1, 30 vanilla cookies and 10 chocolate cookies. In basket 2, 20 vanilla cookies and 20 chocolate cookies.

Below, I generate a dataset accordingly:

Code:

input basket str20 cookie freq
1 vanilla 30
1 chocolate 10
2 vanilla 20
2 chocolate 20
end
expand freq
encode cookie, gen(vanilla)
drop freq

The question is: What is the probability of having the basket 1 if I've got a vanilla cookie?

Using the Bayes formula, we get: Posterior = (3/4 *1/2)/ (50/80) = 0.35/0.625 = 0.56 = 56%

That said, I wish to perform the estimation in Stata.

I tryed hard with - bayesmh -, to no avail so far.

Thank you in advance for any advice.

↧

replace values: type mismatch

March 7, 2020, 8:24 am

≫ Next: To replicate Gabaix and Landier (2007 OJE)

≪ Previous: How to replicate this simple (Bayesian) calculation in Stata

Dear,

I am trying to replace all my "n.a." values into zeroes for my variable c20_28. However, when I try to do this, I get a type mismatch. The type of the variable I am trying to change is Double. Could you help me out? this is my code:

Code:

replace c20_28 = 0 if c20_28 == "n.a."
type mismatch
r(109);

thank you very much!
Timea

↧

To replicate Gabaix and Landier (2007 OJE)

March 7, 2020, 8:34 am

≫ Next: Interpretation coefficient Latent profile analysis with a mixture of continuous and dummy variables

≪ Previous: replace values: type mismatch

Hi

Now I replicate Gabaix and Landier (2007 OJE). What they did is to run the following regression using the Execomp and Compustat Data.

ln (CEO pay) = c+ a ln (CEO's firm size) +b ln (Reference firm size such as Top 250 firm size)

. After getting estimates of a and b, what they did to test the following null hypothesis that a+b=1. Using the same data set and Stata, I estimated a and b. But I have no idea of what I do next. Please give me any suggestions and tips for further steps. Thanks in advance

↧

Interpretation coefficient Latent profile analysis with a mixture of continuous and dummy variables

March 7, 2020, 8:57 am

≫ Next: Calculate number of firms based on condition

≪ Previous: To replicate Gabaix and Landier (2007 OJE)

Dear Stata users,
I am running a Latent profile analysis using a mixture of continuous and dummy variables. Looking at the SEM examples (50 and 52) it seems that for latent class analysis (only categorical variables), the coefficients are interpreted as coefficient of the multinomial logistic regression (so not very informative). For latent prifile analysis, the coefficients are interpreted as the estimated mean value for a given latent class.
It is not clear to me why, using both type of variables, the estimated coefficients for the dummy variables can't be interpreted in terms of proportion of a given class. Can anyone provide some references for this case of a mixture of continuous and dummy variables? What is the underlying equations in this case: a mixture of regression and logistic?
Thank you for your attention

↧

Calculate number of firms based on condition

March 7, 2020, 3:35 pm

≫ Next: Title on second y-axis when using by-option

≪ Previous: Interpretation coefficient Latent profile analysis with a mixture of continuous and dummy variables

Code:

 +----------------------------------------------------+
  |           id   year     country   sales   countr~s |
  |----------------------------------------------------|
  | PS2011111775   2017   Palestine       2       3023 |
  | PS4001111948   2017   Palestine      45       3023 |
  +----------------------------------------------------+

I am trying to find out how many firms(large; based on sales) are required to represent 50% of country_sales by year and country.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str40 id double year str33 country float(sales country_sales)
"PS2011111775" 2017 "Palestine"    2 3023
"PS4001111948" 2017 "Palestine"   45 3023
"PS5002111951" 2017 "Palestine"  848 3023
"PS4004111952" 2017 "Palestine"   71 3023
"PS2003111643" 2017 "Palestine"   11 3023
"PS2002111909" 2017 "Palestine"   13 3023
"PS2004111535" 2017 "Palestine"   72 3023
"PS3002112921" 2017 "Palestine"   40 3023
"PS1003112957" 2017 "Palestine"  132 3023
"PS3007112108" 2017 "Palestine"   13 3023
"PS5011111455" 2017 "Palestine"   16 3023
"PS2016112695" 2017 "Palestine"   59 3023
"PS2007111979" 2017 "Palestine"   50 3023
"PS1002112958" 2017 "Palestine"  145 3023
"PS1006112053" 2017 "Palestine"  138 3023
"PS5006112997" 2017 "Palestine"   78 3023
"PS2015112738" 2017 "Palestine"  100 3023
"PS5010112959" 2017 "Palestine"   13 3023
"PS1004112600" 2017 "Palestine"  530 3023
"PS5008111955" 2017 "Palestine"    7 3023
"PS2013111914" 2017 "Palestine"    6 3023
"PS5012112072" 2017 "Palestine"  268 3023
"PS3008112065" 2017 "Palestine"   31 3023
"PS4009111049" 2017 "Palestine"    5 3023
"PS1001112942" 2017 "Palestine"   87 3023
"PS4008112055" 2017 "Palestine"   22 3023
"PS1007112953" 2017 "Palestine"  148 3023
"PS3005112951" 2017 "Palestine"   45 3023
"PS3003112946" 2017 "Palestine"   10 3023
"PS3006112943" 2017 "Palestine"   18 3023
"SI0031102690" 2017 "Slovenia"    19 6195
"SI0031102153" 2017 "Slovenia"   861 6195
"SI0031100843" 2017 "Slovenia"    13 6195
"SI0031100637" 2017 "Slovenia"    22 6195
"SI0031104076" 2017 "Slovenia"   149 6195
"SI0031102120" 2017 "Slovenia"  2213 6195
"SI0031107079" 2017 "Slovenia"     1 6195
"SI0021111396" 2017 "Slovenia"     1 6195
"SI0031108994" 2017 "Slovenia"    58 6195
"SI0031100090" 2017 "Slovenia"    40 6195
"SI0031110164" 2017 "Slovenia"    15 6195
"SI0021110513" 2017 "Slovenia"   294 6195
"SI0031101346" 2017 "Slovenia"   510 6195
"SI0031100082" 2017 "Slovenia"   146 6195
"SI0021111313" 2017 "Slovenia"     1 6195
"SI0031105396" 2017 "Slovenia"     2 6195
"SI0031101304" 2017 "Slovenia"    27 6195
"SI0031104290" 2017 "Slovenia"   646 6195
"SI0031103805" 2017 "Slovenia"   211 6195
"SI0031110453" 2017 "Slovenia"    61 6195
"SI0031101296" 2017 "Slovenia"     8 6195
"SI0021111651" 2017 "Slovenia"   787 6195
"SI0021113111" 2017 "Slovenia"     3 6195
"SI0031108200" 2017 "Slovenia"    18 6195
"SI0031110461" 2017 "Slovenia"    23 6195
"SI0031107459" 2017 "Slovenia"     2 6195
"SI0031108655" 2017 "Slovenia"    36 6195
"SI0021113855" 2017 "Slovenia"     2 6195
"SI0031103706" 2017 "Slovenia"    26 6195
"PS3005112951" 2018 "Palestine"   56 2901
"PS2002111909" 2018 "Palestine"   14 2901
"PS4004111952" 2018 "Palestine"   65 2901
"PS1004112600" 2018 "Palestine"  464 2901
"PS1006112053" 2018 "Palestine"  127 2901
"PS3007112108" 2018 "Palestine"   17 2901
"PS3003112946" 2018 "Palestine"   16 2901
"PS5011111455" 2018 "Palestine"   16 2901
"PS2003111643" 2018 "Palestine"   12 2901
"PS3008112065" 2018 "Palestine"   21 2901
"PS3006112943" 2018 "Palestine"   28 2901
"PS1002112958" 2018 "Palestine"  157 2901
"PS5010112959" 2018 "Palestine"   14 2901
"PS2007111979" 2018 "Palestine"   49 2901
"PS2013111914" 2018 "Palestine"    8 2901
"PS2015112738" 2018 "Palestine"   98 2901
"PS3002112921" 2018 "Palestine"   58 2901
"PS2004111535" 2018 "Palestine"   79 2901
"PS5006112997" 2018 "Palestine"   82 2901
"PS2011111775" 2018 "Palestine"    2 2901
"PS5008111955" 2018 "Palestine"    7 2901
"PS4009111049" 2018 "Palestine"    8 2901
"PS1007112953" 2018 "Palestine"  137 2901
"PS5012112072" 2018 "Palestine"  240 2901
"PS5002111951" 2018 "Palestine"  801 2901
"PS4008112055" 2018 "Palestine"   19 2901
"PS1003112957" 2018 "Palestine"  118 2901
"PS1001112942" 2018 "Palestine"   91 2901
"PS4001111948" 2018 "Palestine"   39 2901
"PS2016112695" 2018 "Palestine"   58 2901
"SI0031100090" 2018 "Slovenia"    50 1778
"SI0031103706" 2018 "Slovenia"    27 1778
"SI0031105396" 2018 "Slovenia"     2 1778
"SI0031110461" 2018 "Slovenia"   141 1778
"SI0031104290" 2018 "Slovenia"   440 1778
"SI0031107079" 2018 "Slovenia"     1 1778
"SI0021110513" 2018 "Slovenia"   272 1778
"SI0031108655" 2018 "Slovenia"    44 1778
"SI0031102153" 2018 "Slovenia"   731 1778
"SI0031108994" 2018 "Slovenia"    51 1778
"SI0031102690" 2018 "Slovenia"    19 1778
end

↧

Title on second y-axis when using by-option

March 7, 2020, 4:10 pm

≫ Next: GLS Estimation in Eaton-Kortum (2002): Two-componet error and importer- and exporter- FE interpretation

≪ Previous: Calculate number of firms based on condition

Hi,

Any suggestions on how to put a title on the second y-axis when combining multiple y-axes with the by-option?

Code:

sysuse auto, clear
twoway (scatter price rep78, yaxis(1)) (scatter mpg rep78, yaxis(2)), by(foreign) ytitle("title1", axis(1)) ytitle("title2", axis(2))

There are no problems if I delete by(foreign).

Thanks

↧

GLS Estimation in Eaton-Kortum (2002): Two-componet error and importer- and exporter- FE interpretation

March 7, 2020, 4:32 pm

≫ Next: GLS Estimation in Eaton-Kortum (2002): Two-componet error and importer- and exporter- FE

≪ Previous: Title on second y-axis when using by-option

Dear Stata users,

I am using Eaton and Kortum (2002) General Equilibrium model, so I have to estimate a model by Generalized Least Square to recover a set of structural parameters. For your reference, the paper is included and my purpose is to estimate equation (30) in page 1761. Two issues arise and I will describe them as presented in the title.

The data I am using is the following:

Code:

clear
input str14(exporter importer) double contiguity float(dist_1 dist_2) double value
"japan" "japan"       0 0 0  1.872360972809597
"japan" "uk"      0 0 1  72.76820000000001
"japan" "usa"        0 0 0 157.77328270894705
"japan" "france" 0 1 0                  0
"japan" "brasil"  0 0 1  6.602928528463685
end

1) Because of heteroskedasticity associated with trade data, EK employed a GLS estimator. I understand that under small number of observations, GLS is different to OLS beyond Standard Errors (SE). EK assume an orthogonal error that consists of two components: country-pair specific that affects two-way trade, and another one that affects one-way trade. Each component has its own variance. EK further discuss the the implication of such assumptions for the variance-covariance matrix.

I would like to have a clarification to what this means. The model I have ran is the following. While this is, perhaps, GLS simplest form I do not completely understand what EK mean by their assumption about the structure of the errors described above. How would the assumptions translate into code?

Code:

reg value contiguity dist_* i.exporterID i.importerID
predict e, residual
gen e_sq = e^2
reg e_sq contiguity dist_* i.exporterID i.importerID

2) I post this question at the end since I expect fewer people will know about this topic. To estimate equation (30) in page 1761, I know it is used importer and exporter Fixed Effects such as the model estimated in the code above. The difference is that the estimation is more convoluted. My actual estimation for equation (30) without GLS estimator and using linear restrictions is shown below. Notice that constant is dropped to be able to estimate all Fixed Effects and the command collinear is enforced.

Code:

constraint 1 EXPO_FE1 + EXPO_FE2 + EXPO_FE3 + EXPO_FE4 + EXPO_FE5 + EXPO_FE6 + EXPO_FE7 + EXPO_FE8 + EXPO_FE9 + EXPO_FE10 + EXPO_FE11 + EXPO_FE12 + EXPO_FE13 + EXPO_FE14 + EXPO_FE15 + EXPO_FE16 + EXPO_FE17 = -EXPO_FE18

constraint 2 IMPO_FE1 + IMPO_FE2 + IMPO_FE3 + IMPO_FE4 + IMPO_FE5 + IMPO_FE6 + IMPO_FE7 + IMPO_FE8 + IMPO_FE9 + IMPO_FE10 + IMPO_FE11 + IMPO_FE12 + IMPO_FE13 + IMPO_FE14 + IMPO_FE15 + IMPO_FE16 + IMPO_FE17  = - IMPO_FE18

cnsreg dep_value contiguity dist_* EXPO_FE* IMPO_FE* , constraints(1-2) nocons collinear

Equation (30) shows the S's for exporter which were previously defined as "source-country dummies". Later, the importer fixed effects is included as "an overall destination effect". Despite this distiction, EK kept the S's for importer in Equation (30) despite that it is not estimated. Why is that?

↧

GLS Estimation in Eaton-Kortum (2002): Two-componet error and importer- and exporter- FE

March 7, 2020, 4:36 pm

≫ Next: Lincom

≪ Previous: GLS Estimation in Eaton-Kortum (2002): Two-componet error and importer- and exporter- FE interpretation

Code:

clear
input str14(exporter importer) double contiguity float(dist_1 dist_2) double value
"japan" "japan"       0 0 0  1.872360972809597
"japan" "uk"      0 0 1  72.76820000000001
"japan" "usa"        0 0 0 157.77328270894705
"japan" "france" 0 1 0                  0
"japan" "brasil"  0 0 1  6.602928528463685
end

Code:

reg value contiguity dist_* i.exporterID i.importerID
predict e, residual
gen e_sq = e^2
reg e_sq contiguity dist_* i.exporterID i.importerID

Code:

constraint 1 EXPO_FE1 + EXPO_FE2 + EXPO_FE3 + EXPO_FE4 + EXPO_FE5 + EXPO_FE6 + EXPO_FE7 + EXPO_FE8 + EXPO_FE9 + EXPO_FE10 + EXPO_FE11 + EXPO_FE12 + EXPO_FE13 + EXPO_FE14 + EXPO_FE15 + EXPO_FE16 + EXPO_FE17 = -EXPO_FE18

constraint 2 IMPO_FE1 + IMPO_FE2 + IMPO_FE3 + IMPO_FE4 + IMPO_FE5 + IMPO_FE6 + IMPO_FE7 + IMPO_FE8 + IMPO_FE9 + IMPO_FE10 + IMPO_FE11 + IMPO_FE12 + IMPO_FE13 + IMPO_FE14 + IMPO_FE15 + IMPO_FE16 + IMPO_FE17  = - IMPO_FE18

cnsreg dep_value contiguity dist_* EXPO_FE* IMPO_FE* , constraints(1-2) nocons collinear

↧

Lincom

March 7, 2020, 6:35 pm

≫ Next: a better method of generating a variable?

≪ Previous: GLS Estimation in Eaton-Kortum (2002): Two-componet error and importer- and exporter- FE

Hello All,
I have a panel dataset of 15 countries and I have interacted a variable of governance with expenditure using two-staged least-square
ivregress 2sls lexp lnUPop Corp c.lgexp#c.Corp (lnGDP= laglexp laglnUPop)
I used Lincom to obtain the overall effect of expenditure
lincom lexp + c.lexp#c.Corp

lincom lghexp + c.lghexp#c.Corp
( 1) lghexp + c.lghexp#c.Corp = 0

lnLE Coef. Std. Err.	z	P>z	[95% Conf.	Interval]

(1) .0383044 .0107792	3.55	0.000	.0171774	.0594313

However, I need help getting the overall effect one standard deviation below the sample mean score and above it.

Thanks.

↧

a better method of generating a variable?

March 7, 2020, 7:02 pm

≫ Next: Creating a group ID based on multiple variables: no bysort & no weights allowed

≪ Previous: Lincom

I have a education variable, q15_educ, in my data set and I need to create a new education variable, educ, based on that. I have a method, but do you have easier method?

Array

Code:

gen educ = 1 if (q15_educ == "High school graduate, diploma or equivalent") | (q15_educ == "Some high school") | (q15_educ == "Trade/Technical/vocational training")
replace educ = 2 if (q15_educ == "Associate degree") | (q15_educ == "Bachelor's degree") | (q15_educ == "Some college credit, no degree") 
replace educ = 3 if (q15_educ == "Doctorate degree") | (q15_educ == "Master's degree")

↧

Creating a group ID based on multiple variables: no bysort & no weights allowed

March 7, 2020, 9:58 pm

≫ Next: Correlated Random Effects Goodness of Fit

≪ Previous: a better method of generating a variable?

Hello Statalist,

I am attempting to create a group ID which identifies each unique combination of the variables "investorid", "AnnounceDate2", and "uniqueinvestmentid" in the example data below.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(investorid AnnounceDate2) str11 uniqueinvestmentid
1 17995 "16x3307" 
1 17995 "16x3307" 
1 17995 "16x3307" 
1 17995 "16x3307" 
1 18443 "16x13109"
1 18567 "16x3990" 
1 18604 "16x12503"
1 18736 "16x6502" 
1 19153 "16x7257" 
1 19450 "16x2982" 
1 19801 "16x11283"
1 19844 "16x11559"
1 19927 "16x11810"
1 19996 "16x4271" 
1 20032 "16x6827" 
1 20187 "16x6907" 
1 20206 "16x5897" 
1 20502 "16x6593" 
1 20545 "16x6809" 
1 20683 "16x4722" 
1 20698 "16x4745" 
1 20759 "16x9229" 
1 20836 "16x6372" 
1 20957 "16x12705"
1 20978 "16x2978" 
1 21154 "16x103"  
1 21172 "16x2978" 
1 21202 "16x4231" 
1 21535 "16x5931" 
2 20226 "17x4326" 
end
format %td AnnounceDate2

I can use the following code:

Code:

egen investmentid = group(investorid AnnounceDate2 uniqueinvestmentid)

but this just makes a continuing count all the way to the end of the data, whereas I want the investmentid to start again at 1 when it moves to investorid==2, and so on. Unfortunately, I cannot use the following code:

Code:

bysort investorid: egen investmentid = group(AnnounceDate2 uniqueinvestmentid)

because you cannot combine bysort with egen commands.

I tried to do the following as well:

Code:

bysort investorid: gen investmentid=_n
replace investmentid[_n]=investmentid [_n-1] if AnnounceDate2[_n]==AnnounceDate2[_n-1] & uniqueinvestmentid[_n]==uniqueinvestmentid[_n-1]

But unfortunately this gives me the error code "weights not allowed", an issue that, based on my reserach, is fairly well-documented, because Stata interprets the [_n] in the initial statement as a weight and refuses to combine it with the replace command.

Given all of the above - does anyone have any suggestions as to how I might achieve what I'd like to do here? It seems like a fairly easy issue in terms of logic but I cannot get there. Thanks in advance for any help you can provide!

↧

Correlated Random Effects Goodness of Fit

March 8, 2020, 12:28 am

≫ Next: How to do sensitivity analysis?

≪ Previous: Creating a group ID based on multiple variables: no bysort & no weights allowed

Dear all,
I have a panel with 345 observations and six variables. My cross-sectional variable is Panel_bland (15 groups) and my time-series variable is Panel_year (23 years).
I xtreg a correlated random effect model (Mundlak). I am analysing the results and saw that no R-squared has been produced.
I know Stata must have a good reason for not displaying it as R-squared might not be a good measure.
What might be a good alternative to assess how much of the variation can be explained by the chosen variables?
Thank you in advance.
Caroline

↧

How to do sensitivity analysis?

March 8, 2020, 1:58 am

≫ Next: How to extract _n values in month.

≪ Previous: Correlated Random Effects Goodness of Fit

Dear,

I am new in using stata. I did a metaprop analysis using the command below

metaprop Mort N, random by(StentType) ftt cimethod(score) label(namevar=Study, yearvar=Year)

The outcome is attached

I would like to do sensitivity analysis by excluding one study at a time, what command shall i add?

Thank you in advance

Hytham

↧

How to extract _n values in month.

March 8, 2020, 7:09 am

≫ Next: Merge and sum several dataset

≪ Previous: How to do sensitivity analysis?

hello researchers and expert good evening,
i want to calculate 5 maximum returns in a month. for different firms. i am using code for single max value in month but unable to extract 5 max observation,
code:
egen max_ret = max(r), by(id date)
bys dscd date: egen max2=max(r)

Thanks !!!!

Wahab Ahmed

↧

Merge and sum several dataset

March 8, 2020, 8:03 am

≫ Next: principal component analysis

≪ Previous: How to extract _n values in month.

Hello World !

I am working with several (almost 100) database, which all have the same configuration.
I have data for each country of the EU from jan2000 to dec2019 (some of them have missing values, which are written ":").
Since all my database are on the same period & frequency, and since they all have the same data names, I would like to know how to merge them all in one file (that is the sum of all values for each country).

Illustration:

Database 1

m	y	AUSTRIA	BELGIUM (and LUXBG -> 1998)	BULGARIA	CYPRUS	CZECHIA (CS->1992)	GERMANY (incl DD from 1991)	DENMARK	ESTONIA
1	2000	:	:	:	:	:	37328	148550	:
2	2000	:	:	:	:	:	38461	185092	67
3	2000	:	:	:	:	:	:	250220	:
4	2000	:	:	:	:	:	54536	181326	:
5	2000	:	:	:	:	:	150693	370858	:
6	2000	:	:	:	:	:	371919	1067027	:
7	2000	:	:	:	:	:	575746	1157469	56
8	2000	:	:	:	:	:	501796	1219928	47
9	2000	:	:	:	:	:	478317	903141	57
10	2000	:	:	:	:	:	569846	1190308	235
11	2000	:	:	:	:	:	541034	1096933	443
12	2000	:	:	:	:	:	807289	566551	388

Database 2

m	y	AUSTRIA	BELGIUM (and LUXBG -> 1998)	BULGARIA	CYPRUS	CZECHIA (CS->1992)	GERMANY (incl DD from 1991)	DENMARK	ESTONIA
1	2000	:	:	:	:	:	418	:	243
2	2000	:	:	465	:	:	:	:	366
3	2000	:	:	38	:	:	:	:	:
4	2000	:	:	49	:	:	:	:	:
5	2000	:	:	:	:	:	:	:	:
6	2000	:	:	:	:	:	:	:	:
7	2000	:	:	:	:	:	64 581	:	54
8	2000	:	:	91	:	:	:	:	299
9	2000	:	:	:	:	:	:	:	366
10	2000	:	:	147	:	:	:	:	553
11	2000	:	:	:	:	:	15 646	:	249
12	2000	:	:	:	:	:	65 956	22 545	2 021

Database 3

m	y	AUSTRIA	BELGIUM (and LUXBG -> 1998)	BULGARIA	CYPRUS	CZECHIA (CS->1992)	GERMANY (incl DD from 1991)	DENMARK	ESTONIA
1	2000	:	:	30	:	:	3 954 918	:	1 252
2	2000	:	:	:	:	:	638 856	83 229	818
3	2000	:	36 212	:	:	:	2 184 877	136 363	126
4	2000	:	17 648	:	:	:	1 097 291	91 441	585
5	2000	:	:	:	:	:	432 368	235 023	456
6	2000	:	:	100	:	:	493 020	471 125	196
7	2000	:	:	:	:	:	2 828 800	304 038	148
8	2000	:	:	96	:	:	2 440 080	453 412	:
9	2000	:	20 655	:	:	:	2 355 697	565 428	:
10	2000	:	:	:	:	:	4 057 272	747 554	241
11	2000	:	:	232	:	:	2 707 837	471 183	:
12	2000	:	:	:	:	:	3 514 004	353 058	344

and so on.

What I want is a database that is merging all and do the sum of all, or, in this example, these 3 database (under the name database4, for example):

Database 4

m	y	AUSTRIA	BELGIUM (and LUXBG -> 1998)	BULGARIA	CYPRUS	CZECHIA (CS->1992)	GERMANY (incl DD from 1991)	DENMARK	ESTONIA
1	2000	0	0	30	0	0	3992664	148550	1495
2	2000	0	0	465	0	0	677317	268321	1251
3	2000	0	36212	38	0	0	2184877	386583	126
4	2000	0	17648	49	0	0	1151827	272767	585
5	2000	0	0	0	0	0	583061	605881	456
6	2000	0	0	100	0	0	864939	1538152	196
7	2000	0	0	0	0	0	3469127	1461507	258
8	2000	0	0	187	0	0	2941876	1673340	346
9	2000	0	20655	0	0	0	2834014	1468569	423
10	2000	0	0	147	0	0	4627118	1937862	1029
11	2000	0	0	232	0	0	3264517	1568116	692
12	2000	0	0	0	0	0	4387249	942154	2753

I hope that I have correctly explained my problem.

I know that I can't keep these ":" and that weird variable names are a problem so I will first of all run something like that for each database:

Code:

// Gen date from "y" and "m" columns
{gen edate = ym(y, m)
format edate %tm
drop y m
rename edate date
}

// rename 
{rename belgium~1998 belgium
rename czechia~1992 czechia
rename germany~1991 germany
//and so on ...
}

// replace ":" by "." for STATA to understand that these are missing values
{ replace austria = "." if austria == ":"
replace belgium = "." if belgium == ":"
replace bulgaria = "." if bulgaria == ":"
//and so on ...
}

Thank you for your help !

↧

principal component analysis

March 8, 2020, 8:23 am

≫ Next: Handling missing data in Stata: imputation and likelihood based approaches

≪ Previous: Merge and sum several dataset

Hello stata users,
I'm a student in my thesis, and I have to construct a food security indicator based on principal component analysis on stata.
Food security has 4 dimensions:
Therefore, I chose I chose an indicator for each dimension of food security:
-To measure availability, I chose the variable food availability, which takes into account the availability of food in sufficient quantity and of appropriate quality.
-To measure access, I chose the Gross domestic product per capita (in purchasing power equivalent).
-To measure utilization, I chose the variable people using at least basic health services).
-To measure dietary stability, I chose the variable variability of food production per capita.
I have two questions:
1) Should I choose only one variable for each dimension or would it be better to choose if possible two or more variables for each dimension of food security to do the principal component analysis?
2) How is principal component analysis done on stata? I.e. the controls to be used and the process.
Please help me.

Yours sincerely

↧

Handling missing data in Stata: imputation and likelihood based approaches

March 8, 2020, 8:25 am

≫ Next: Accounting for time difference in forecasts when using repeated cross sectional data

≪ Previous: principal component analysis

Dear All,

I have a question on the presentation, entitled "Handling missing data in Stata: imputation and likelihood based approaches", which I found here. Here are my two questions:

on slide #18 it is suggested that one could use mi impute mvn also for categorical variables (Lee and Carlin, 2010). Could you please suggest the Stata command for doing that for the example discussed in the slide?
also, is it possible to run FIML estimation in Stata with categorical variables?

Currently I do research where I have missing data and many of them are for categorical variables. I used mi impute chained command, but the convergence is not achieved (smth. I learnt is observed quite often). Thus, I decided to proceed with alternative approaches (especially after reading Lee and Carlin, 2010) and I would highly appreciate if you could help!

Thank you in advance.

Best,
Artak

↧