Quantcast
Channel: Statalist
Viewing all 73257 articles
Browse latest View live

Difference in Differences Result

$
0
0
Hey everyone! My name is Jesita. I am currently writing a paper regarding the impact of education decentralization on students' standardized test scores using IFLS panel data from 1993 to 2017 with 9,141 observations.
This is my result:

xtreg std_score1 i.MoEC i.Post i.MoEC##i.Post, robust cluster (commid) fe i(fcode)

Fixed-effects (within) regression Number of obs = 9336
Group variable: fcode Number of groups = 4800

R-sq: within = 0.0062 Obs per group: min = 1
between = 0.0114 avg = 1.9
overall = 0.0031 max = 5

F(3,313) = 6.55
corr(u_i, Xb) = -0.1643 Prob > F = 0.0003

(Std. Err. adjusted for 314 clusters in commid)
------------------------------------------------------------------------------
| Robust
std_score1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.MoEC | .0119933 .0575029 0.21 0.835 -.1011477 .1251343
1.Post | .2059144 .0610805 3.37 0.001 .0857341 .3260946
|
MoEC#Post |
1 1 | -.2950334 .0673567 -4.38 0.000 -.4275625 -.1625043
|
_cons | -.0060922 .0395492 -0.15 0.878 -.083908 .0717237
-------------+----------------------------------------------------------------
sigma_u | .94483298
sigma_e | .82235053
rho | .56897805 (fraction of variance due to u_i)
------------------------------------------------------------------------------

The conclusion is that: "decentralization negatively affect students' test scores"
I ran the balanced panel and full sample analysis, with or without controls, and the results are consistent.
My biggest question: How do I know that the result is robust and correct? I know I cannot use "diff" command because my data is panel.
Please help me. Thank you so much.

Question on interaction terms

$
0
0
Hello everyone, I am running two multiple linear regression models where the first one needs to have an interaction term between the dummy variable female (1 = female, 0=male) and a categorical variable beauty. The first model needs to have the interaction term between female and beauty, and the second one between male and beauty. I've read that i.female##c.beauty will give me the interaction term between the two variables but how should I write the regress in order to have an interaction term between female and beauty and male and beauty? I have tried the following: for the model with the interaction term between female and beauty I used regress with 1.female##c.beauty and 0.female##c.beauty for the second model but I don't know if this is the correct way to write it and if this is giving me the results I want. I also don't really understand the difference between # and ##.

Question: how to count occurrence with Regex grouped

$
0
0
Dear all!

I want to count the occourences of a variable based on a regex expression - counted by a group of variables.

E.g.:
ID1 ID2 text result
12 23 Hello 1
12 23 Bye 1
99 23 Hello 1
I have the two combining ID's "ID1" and "ID2" for a group, want to compare variable text with "Hello" and want a new column (result) with the numbers of "Hello"'s in this group.

My first idea was:
egen result=count(regexm(text, "Hello")), by(ID1 ID2)

or

egen result=count(text == "*Hello*"), by(ID1 ID2)

but both isn't working ...

Can you please help me?

Kind Regards
Simon

How to replicate this simple (Bayesian) calculation in Stata

$
0
0

Dear Forum Members,

I came a cross a quite simple example of Bayesian analysis (from the book Think Bayes, written by Allen Downeyfor Python users) which can be done directly, I mean, just by using the Bayes' theorem.

In short, there are 2 baskets with cookies. In basket 1, 30 vanilla cookies and 10 chocolate cookies. In basket 2, 20 vanilla cookies and 20 chocolate cookies.

Below, I generate a dataset accordingly:

Code:
input basket str20 cookie freq
1 vanilla 30
1 chocolate 10
2 vanilla 20
2 chocolate 20
end
expand freq
encode cookie, gen(vanilla)
drop freq
The question is: What is the probability of having the basket 1 if I've got a vanilla cookie?

Using the Bayes formula, we get: Posterior = (3/4 *1/2)/ (50/80) = 0.35/0.625 = 0.56 = 56%

That said, I wish to perform the estimation in Stata.

I tryed hard with - bayesmh -, to no avail so far.

Thank you in advance for any advice.

replace values: type mismatch

$
0
0
Dear,

I am trying to replace all my "n.a." values into zeroes for my variable c20_28. However, when I try to do this, I get a type mismatch. The type of the variable I am trying to change is Double. Could you help me out? this is my code:

Code:
replace c20_28 = 0 if c20_28 == "n.a."
type mismatch
r(109);
thank you very much!
Timea

To replicate Gabaix and Landier (2007 OJE)

$
0
0
Hi

Now I replicate Gabaix and Landier (2007 OJE). What they did is to run the following regression using the Execomp and Compustat Data.
ln (CEO pay) = c+ a ln (CEO's firm size) +b ln (Reference firm size such as Top 250 firm size)
. After getting estimates of a and b, what they did to test the following null hypothesis that a+b=1. Using the same data set and Stata, I estimated a and b. But I have no idea of what I do next. Please give me any suggestions and tips for further steps. Thanks in advance

Interpretation coefficient Latent profile analysis with a mixture of continuous and dummy variables

$
0
0
Dear Stata users,
I am running a Latent profile analysis using a mixture of continuous and dummy variables. Looking at the SEM examples (50 and 52) it seems that for latent class analysis (only categorical variables), the coefficients are interpreted as coefficient of the multinomial logistic regression (so not very informative). For latent prifile analysis, the coefficients are interpreted as the estimated mean value for a given latent class.
It is not clear to me why, using both type of variables, the estimated coefficients for the dummy variables can't be interpreted in terms of proportion of a given class. Can anyone provide some references for this case of a mixture of continuous and dummy variables? What is the underlying equations in this case: a mixture of regression and logistic?
Thank you for your attention

Calculate number of firms based on condition

$
0
0
Code:
 +----------------------------------------------------+
  |           id   year     country   sales   countr~s |
  |----------------------------------------------------|
  | PS2011111775   2017   Palestine       2       3023 |
  | PS4001111948   2017   Palestine      45       3023 |
  +----------------------------------------------------+
I am trying to find out how many firms(large; based on sales) are required to represent 50% of country_sales by year and country.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str40 id double year str33 country float(sales country_sales)
"PS2011111775" 2017 "Palestine"    2 3023
"PS4001111948" 2017 "Palestine"   45 3023
"PS5002111951" 2017 "Palestine"  848 3023
"PS4004111952" 2017 "Palestine"   71 3023
"PS2003111643" 2017 "Palestine"   11 3023
"PS2002111909" 2017 "Palestine"   13 3023
"PS2004111535" 2017 "Palestine"   72 3023
"PS3002112921" 2017 "Palestine"   40 3023
"PS1003112957" 2017 "Palestine"  132 3023
"PS3007112108" 2017 "Palestine"   13 3023
"PS5011111455" 2017 "Palestine"   16 3023
"PS2016112695" 2017 "Palestine"   59 3023
"PS2007111979" 2017 "Palestine"   50 3023
"PS1002112958" 2017 "Palestine"  145 3023
"PS1006112053" 2017 "Palestine"  138 3023
"PS5006112997" 2017 "Palestine"   78 3023
"PS2015112738" 2017 "Palestine"  100 3023
"PS5010112959" 2017 "Palestine"   13 3023
"PS1004112600" 2017 "Palestine"  530 3023
"PS5008111955" 2017 "Palestine"    7 3023
"PS2013111914" 2017 "Palestine"    6 3023
"PS5012112072" 2017 "Palestine"  268 3023
"PS3008112065" 2017 "Palestine"   31 3023
"PS4009111049" 2017 "Palestine"    5 3023
"PS1001112942" 2017 "Palestine"   87 3023
"PS4008112055" 2017 "Palestine"   22 3023
"PS1007112953" 2017 "Palestine"  148 3023
"PS3005112951" 2017 "Palestine"   45 3023
"PS3003112946" 2017 "Palestine"   10 3023
"PS3006112943" 2017 "Palestine"   18 3023
"SI0031102690" 2017 "Slovenia"    19 6195
"SI0031102153" 2017 "Slovenia"   861 6195
"SI0031100843" 2017 "Slovenia"    13 6195
"SI0031100637" 2017 "Slovenia"    22 6195
"SI0031104076" 2017 "Slovenia"   149 6195
"SI0031102120" 2017 "Slovenia"  2213 6195
"SI0031107079" 2017 "Slovenia"     1 6195
"SI0021111396" 2017 "Slovenia"     1 6195
"SI0031108994" 2017 "Slovenia"    58 6195
"SI0031100090" 2017 "Slovenia"    40 6195
"SI0031110164" 2017 "Slovenia"    15 6195
"SI0021110513" 2017 "Slovenia"   294 6195
"SI0031101346" 2017 "Slovenia"   510 6195
"SI0031100082" 2017 "Slovenia"   146 6195
"SI0021111313" 2017 "Slovenia"     1 6195
"SI0031105396" 2017 "Slovenia"     2 6195
"SI0031101304" 2017 "Slovenia"    27 6195
"SI0031104290" 2017 "Slovenia"   646 6195
"SI0031103805" 2017 "Slovenia"   211 6195
"SI0031110453" 2017 "Slovenia"    61 6195
"SI0031101296" 2017 "Slovenia"     8 6195
"SI0021111651" 2017 "Slovenia"   787 6195
"SI0021113111" 2017 "Slovenia"     3 6195
"SI0031108200" 2017 "Slovenia"    18 6195
"SI0031110461" 2017 "Slovenia"    23 6195
"SI0031107459" 2017 "Slovenia"     2 6195
"SI0031108655" 2017 "Slovenia"    36 6195
"SI0021113855" 2017 "Slovenia"     2 6195
"SI0031103706" 2017 "Slovenia"    26 6195
"PS3005112951" 2018 "Palestine"   56 2901
"PS2002111909" 2018 "Palestine"   14 2901
"PS4004111952" 2018 "Palestine"   65 2901
"PS1004112600" 2018 "Palestine"  464 2901
"PS1006112053" 2018 "Palestine"  127 2901
"PS3007112108" 2018 "Palestine"   17 2901
"PS3003112946" 2018 "Palestine"   16 2901
"PS5011111455" 2018 "Palestine"   16 2901
"PS2003111643" 2018 "Palestine"   12 2901
"PS3008112065" 2018 "Palestine"   21 2901
"PS3006112943" 2018 "Palestine"   28 2901
"PS1002112958" 2018 "Palestine"  157 2901
"PS5010112959" 2018 "Palestine"   14 2901
"PS2007111979" 2018 "Palestine"   49 2901
"PS2013111914" 2018 "Palestine"    8 2901
"PS2015112738" 2018 "Palestine"   98 2901
"PS3002112921" 2018 "Palestine"   58 2901
"PS2004111535" 2018 "Palestine"   79 2901
"PS5006112997" 2018 "Palestine"   82 2901
"PS2011111775" 2018 "Palestine"    2 2901
"PS5008111955" 2018 "Palestine"    7 2901
"PS4009111049" 2018 "Palestine"    8 2901
"PS1007112953" 2018 "Palestine"  137 2901
"PS5012112072" 2018 "Palestine"  240 2901
"PS5002111951" 2018 "Palestine"  801 2901
"PS4008112055" 2018 "Palestine"   19 2901
"PS1003112957" 2018 "Palestine"  118 2901
"PS1001112942" 2018 "Palestine"   91 2901
"PS4001111948" 2018 "Palestine"   39 2901
"PS2016112695" 2018 "Palestine"   58 2901
"SI0031100090" 2018 "Slovenia"    50 1778
"SI0031103706" 2018 "Slovenia"    27 1778
"SI0031105396" 2018 "Slovenia"     2 1778
"SI0031110461" 2018 "Slovenia"   141 1778
"SI0031104290" 2018 "Slovenia"   440 1778
"SI0031107079" 2018 "Slovenia"     1 1778
"SI0021110513" 2018 "Slovenia"   272 1778
"SI0031108655" 2018 "Slovenia"    44 1778
"SI0031102153" 2018 "Slovenia"   731 1778
"SI0031108994" 2018 "Slovenia"    51 1778
"SI0031102690" 2018 "Slovenia"    19 1778
end

Title on second y-axis when using by-option

$
0
0
Hi,

Any suggestions on how to put a title on the second y-axis when combining multiple y-axes with the by-option?

Code:
sysuse auto, clear
twoway (scatter price rep78, yaxis(1)) (scatter mpg rep78, yaxis(2)), by(foreign) ytitle("title1", axis(1)) ytitle("title2", axis(2))
There are no problems if I delete by(foreign).

Thanks

GLS Estimation in Eaton-Kortum (2002): Two-componet error and importer- and exporter- FE interpretation

$
0
0
Dear Stata users,

I am using Eaton and Kortum (2002) General Equilibrium model, so I have to estimate a model by Generalized Least Square to recover a set of structural parameters. For your reference, the paper is included and my purpose is to estimate equation (30) in page 1761. Two issues arise and I will describe them as presented in the title.

The data I am using is the following:

Code:
clear
input str14(exporter importer) double contiguity float(dist_1 dist_2) double value
"japan" "japan"       0 0 0  1.872360972809597
"japan" "uk"      0 0 1  72.76820000000001
"japan" "usa"        0 0 0 157.77328270894705
"japan" "france" 0 1 0                  0
"japan" "brasil"  0 0 1  6.602928528463685
end
1) Because of heteroskedasticity associated with trade data, EK employed a GLS estimator. I understand that under small number of observations, GLS is different to OLS beyond Standard Errors (SE). EK assume an orthogonal error that consists of two components: country-pair specific that affects two-way trade, and another one that affects one-way trade. Each component has its own variance. EK further discuss the the implication of such assumptions for the variance-covariance matrix.

I would like to have a clarification to what this means. The model I have ran is the following. While this is, perhaps, GLS simplest form I do not completely understand what EK mean by their assumption about the structure of the errors described above. How would the assumptions translate into code?

Code:
reg value contiguity dist_* i.exporterID i.importerID
predict e, residual
gen e_sq = e^2
reg e_sq contiguity dist_* i.exporterID i.importerID
2) I post this question at the end since I expect fewer people will know about this topic. To estimate equation (30) in page 1761, I know it is used importer and exporter Fixed Effects such as the model estimated in the code above. The difference is that the estimation is more convoluted. My actual estimation for equation (30) without GLS estimator and using linear restrictions is shown below. Notice that constant is dropped to be able to estimate all Fixed Effects and the command collinear is enforced.

Code:
constraint 1 EXPO_FE1 + EXPO_FE2 + EXPO_FE3 + EXPO_FE4 + EXPO_FE5 + EXPO_FE6 + EXPO_FE7 + EXPO_FE8 + EXPO_FE9 + EXPO_FE10 + EXPO_FE11 + EXPO_FE12 + EXPO_FE13 + EXPO_FE14 + EXPO_FE15 + EXPO_FE16 + EXPO_FE17 = -EXPO_FE18

constraint 2 IMPO_FE1 + IMPO_FE2 + IMPO_FE3 + IMPO_FE4 + IMPO_FE5 + IMPO_FE6 + IMPO_FE7 + IMPO_FE8 + IMPO_FE9 + IMPO_FE10 + IMPO_FE11 + IMPO_FE12 + IMPO_FE13 + IMPO_FE14 + IMPO_FE15 + IMPO_FE16 + IMPO_FE17  = - IMPO_FE18

cnsreg dep_value contiguity dist_* EXPO_FE* IMPO_FE* , constraints(1-2) nocons collinear
Equation (30) shows the S's for exporter which were previously defined as "source-country dummies". Later, the importer fixed effects is included as "an overall destination effect". Despite this distiction, EK kept the S's for importer in Equation (30) despite that it is not estimated. Why is that?

GLS Estimation in Eaton-Kortum (2002): Two-componet error and importer- and exporter- FE

$
0
0
Dear Stata users,

I am using Eaton and Kortum (2002) General Equilibrium model, so I have to estimate a model by Generalized Least Square to recover a set of structural parameters. For your reference, the paper is included and my purpose is to estimate equation (30) in page 1761. Two issues arise and I will describe them as presented in the title.

The data I am using is the following:

Code:
clear
input str14(exporter importer) double contiguity float(dist_1 dist_2) double value
"japan" "japan"       0 0 0  1.872360972809597
"japan" "uk"      0 0 1  72.76820000000001
"japan" "usa"        0 0 0 157.77328270894705
"japan" "france" 0 1 0                  0
"japan" "brasil"  0 0 1  6.602928528463685
end
1) Because of heteroskedasticity associated with trade data, EK employed a GLS estimator. I understand that under small number of observations, GLS is different to OLS beyond Standard Errors (SE). EK assume an orthogonal error that consists of two components: country-pair specific that affects two-way trade, and another one that affects one-way trade. Each component has its own variance. EK further discuss the the implication of such assumptions for the variance-covariance matrix.

I would like to have a clarification to what this means. The model I have ran is the following. While this is, perhaps, GLS simplest form I do not completely understand what EK mean by their assumption about the structure of the errors described above. How would the assumptions translate into code?

Code:
reg value contiguity dist_* i.exporterID i.importerID
predict e, residual
gen e_sq = e^2
reg e_sq contiguity dist_* i.exporterID i.importerID
2) I post this question at the end since I expect fewer people will know about this topic. To estimate equation (30) in page 1761, I know it is used importer and exporter Fixed Effects such as the model estimated in the code above. The difference is that the estimation is more convoluted. My actual estimation for equation (30) without GLS estimator and using linear restrictions is shown below. Notice that constant is dropped to be able to estimate all Fixed Effects and the command collinear is enforced.

Code:
constraint 1 EXPO_FE1 + EXPO_FE2 + EXPO_FE3 + EXPO_FE4 + EXPO_FE5 + EXPO_FE6 + EXPO_FE7 + EXPO_FE8 + EXPO_FE9 + EXPO_FE10 + EXPO_FE11 + EXPO_FE12 + EXPO_FE13 + EXPO_FE14 + EXPO_FE15 + EXPO_FE16 + EXPO_FE17 = -EXPO_FE18

constraint 2 IMPO_FE1 + IMPO_FE2 + IMPO_FE3 + IMPO_FE4 + IMPO_FE5 + IMPO_FE6 + IMPO_FE7 + IMPO_FE8 + IMPO_FE9 + IMPO_FE10 + IMPO_FE11 + IMPO_FE12 + IMPO_FE13 + IMPO_FE14 + IMPO_FE15 + IMPO_FE16 + IMPO_FE17  = - IMPO_FE18

cnsreg dep_value contiguity dist_* EXPO_FE* IMPO_FE* , constraints(1-2) nocons collinear
Equation (30) shows the S's for exporter which were previously defined as "source-country dummies". Later, the importer fixed effects is included as "an overall destination effect". Despite this distiction, EK kept the S's for importer in Equation (30) despite that it is not estimated. Why is that?

Lincom

$
0
0
Hello All,
I have a panel dataset of 15 countries and I have interacted a variable of governance with expenditure using two-staged least-square
ivregress 2sls lexp lnUPop Corp c.lgexp#c.Corp (lnGDP= laglexp laglnUPop)
I used Lincom to obtain the overall effect of expenditure
lincom lexp + c.lexp#c.Corp
lincom lghexp + c.lghexp#c.Corp
( 1) lghexp + c.lghexp#c.Corp = 0
lnLE Coef. Std. Err. z P>z [95% Conf. Interval]
(1) .0383044 .0107792 3.55 0.000 .0171774 .0594313
However, I need help getting the overall effect one standard deviation below the sample mean score and above it.

Thanks.

a better method of generating a variable?

$
0
0
I have a education variable, q15_educ, in my data set and I need to create a new education variable, educ, based on that. I have a method, but do you have easier method?

Array

Code:
gen educ = 1 if (q15_educ == "High school graduate, diploma or equivalent") | (q15_educ == "Some high school") | (q15_educ == "Trade/Technical/vocational training")
replace educ = 2 if (q15_educ == "Associate degree") | (q15_educ == "Bachelor's degree") | (q15_educ == "Some college credit, no degree") 
replace educ = 3 if (q15_educ == "Doctorate degree") | (q15_educ == "Master's degree")

Creating a group ID based on multiple variables: no bysort & no weights allowed

$
0
0
Hello Statalist,

I am attempting to create a group ID which identifies each unique combination of the variables "investorid", "AnnounceDate2", and "uniqueinvestmentid" in the example data below.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(investorid AnnounceDate2) str11 uniqueinvestmentid
1 17995 "16x3307" 
1 17995 "16x3307" 
1 17995 "16x3307" 
1 17995 "16x3307" 
1 18443 "16x13109"
1 18567 "16x3990" 
1 18604 "16x12503"
1 18736 "16x6502" 
1 19153 "16x7257" 
1 19450 "16x2982" 
1 19801 "16x11283"
1 19844 "16x11559"
1 19927 "16x11810"
1 19996 "16x4271" 
1 20032 "16x6827" 
1 20187 "16x6907" 
1 20206 "16x5897" 
1 20502 "16x6593" 
1 20545 "16x6809" 
1 20683 "16x4722" 
1 20698 "16x4745" 
1 20759 "16x9229" 
1 20836 "16x6372" 
1 20957 "16x12705"
1 20978 "16x2978" 
1 21154 "16x103"  
1 21172 "16x2978" 
1 21202 "16x4231" 
1 21535 "16x5931" 
2 20226 "17x4326" 
end
format %td AnnounceDate2
I can use the following code:

Code:
egen investmentid = group(investorid AnnounceDate2 uniqueinvestmentid)
but this just makes a continuing count all the way to the end of the data, whereas I want the investmentid to start again at 1 when it moves to investorid==2, and so on. Unfortunately, I cannot use the following code:

Code:
bysort investorid: egen investmentid = group(AnnounceDate2 uniqueinvestmentid)
because you cannot combine bysort with egen commands.

I tried to do the following as well:

Code:
bysort investorid: gen investmentid=_n
replace investmentid[_n]=investmentid [_n-1] if AnnounceDate2[_n]==AnnounceDate2[_n-1] & uniqueinvestmentid[_n]==uniqueinvestmentid[_n-1]
But unfortunately this gives me the error code "weights not allowed", an issue that, based on my reserach, is fairly well-documented, because Stata interprets the [_n] in the initial statement as a weight and refuses to combine it with the replace command.

Given all of the above - does anyone have any suggestions as to how I might achieve what I'd like to do here? It seems like a fairly easy issue in terms of logic but I cannot get there. Thanks in advance for any help you can provide!



Correlated Random Effects Goodness of Fit

$
0
0
Dear all,
I have a panel with 345 observations and six variables. My cross-sectional variable is Panel_bland (15 groups) and my time-series variable is Panel_year (23 years).
I xtreg a correlated random effect model (Mundlak). I am analysing the results and saw that no R-squared has been produced.
I know Stata must have a good reason for not displaying it as R-squared might not be a good measure.
What might be a good alternative to assess how much of the variation can be explained by the chosen variables?
Thank you in advance.
Caroline

How to do sensitivity analysis?

$
0
0
Dear,

I am new in using stata. I did a metaprop analysis using the command below

metaprop Mort N, random by(StentType) ftt cimethod(score) label(namevar=Study, yearvar=Year)

The outcome is attached

I would like to do sensitivity analysis by excluding one study at a time, what command shall i add?

Thank you in advance

Hytham

How to extract _n values in month.

$
0
0
hello researchers and expert good evening,
i want to calculate 5 maximum returns in a month. for different firms. i am using code for single max value in month but unable to extract 5 max observation,
code:
egen max_ret = max(r), by(id date)
bys dscd date: egen max2=max(r)

Thanks !!!!

Wahab Ahmed

Merge and sum several dataset

$
0
0
Hello World !

I am working with several (almost 100) database, which all have the same configuration.
I have data for each country of the EU from jan2000 to dec2019 (some of them have missing values, which are written ":").
Since all my database are on the same period & frequency, and since they all have the same data names, I would like to know how to merge them all in one file (that is the sum of all values for each country).

Illustration:

Database 1
m y AUSTRIA BELGIUM (and LUXBG -> 1998) BULGARIA CYPRUS CZECHIA (CS->1992) GERMANY (incl DD from 1991) DENMARK ESTONIA
1 2000 : : : : : 37328 148550 :
2 2000 : : : : : 38461 185092 67
3 2000 : : : : : : 250220 :
4 2000 : : : : : 54536 181326 :
5 2000 : : : : : 150693 370858 :
6 2000 : : : : : 371919 1067027 :
7 2000 : : : : : 575746 1157469 56
8 2000 : : : : : 501796 1219928 47
9 2000 : : : : : 478317 903141 57
10 2000 : : : : : 569846 1190308 235
11 2000 : : : : : 541034 1096933 443
12 2000 : : : : : 807289 566551 388

Database 2
m y AUSTRIA BELGIUM (and LUXBG -> 1998) BULGARIA CYPRUS CZECHIA (CS->1992) GERMANY (incl DD from 1991) DENMARK ESTONIA
1 2000 : : : : : 418 : 243
2 2000 : : 465 : : : : 366
3 2000 : : 38 : : : : :
4 2000 : : 49 : : : : :
5 2000 : : : : : : : :
6 2000 : : : : : : : :
7 2000 : : : : : 64 581 : 54
8 2000 : : 91 : : : : 299
9 2000 : : : : : : : 366
10 2000 : : 147 : : : : 553
11 2000 : : : : : 15 646 : 249
12 2000 : : : : : 65 956 22 545 2 021

Database 3
m y AUSTRIA BELGIUM (and LUXBG -> 1998) BULGARIA CYPRUS CZECHIA (CS->1992) GERMANY (incl DD from 1991) DENMARK ESTONIA
1 2000 : : 30 : : 3 954 918 : 1 252
2 2000 : : : : : 638 856 83 229 818
3 2000 : 36 212 : : : 2 184 877 136 363 126
4 2000 : 17 648 : : : 1 097 291 91 441 585
5 2000 : : : : : 432 368 235 023 456
6 2000 : : 100 : : 493 020 471 125 196
7 2000 : : : : : 2 828 800 304 038 148
8 2000 : : 96 : : 2 440 080 453 412 :
9 2000 : 20 655 : : : 2 355 697 565 428 :
10 2000 : : : : : 4 057 272 747 554 241
11 2000 : : 232 : : 2 707 837 471 183 :
12 2000 : : : : : 3 514 004 353 058 344

and so on.

What I want is a database that is merging all and do the sum of all, or, in this example, these 3 database (under the name database4, for example):


Database 4
m y AUSTRIA BELGIUM (and LUXBG -> 1998) BULGARIA CYPRUS CZECHIA (CS->1992) GERMANY (incl DD from 1991) DENMARK ESTONIA
1 2000 0 0 30 0 0 3992664 148550 1495
2 2000 0 0 465 0 0 677317 268321 1251
3 2000 0 36212 38 0 0 2184877 386583 126
4 2000 0 17648 49 0 0 1151827 272767 585
5 2000 0 0 0 0 0 583061 605881 456
6 2000 0 0 100 0 0 864939 1538152 196
7 2000 0 0 0 0 0 3469127 1461507 258
8 2000 0 0 187 0 0 2941876 1673340 346
9 2000 0 20655 0 0 0 2834014 1468569 423
10 2000 0 0 147 0 0 4627118 1937862 1029
11 2000 0 0 232 0 0 3264517 1568116 692
12 2000 0 0 0 0 0 4387249 942154 2753


I hope that I have correctly explained my problem.

I know that I can't keep these ":" and that weird variable names are a problem so I will first of all run something like that for each database:

Code:
// Gen date from "y" and "m" columns
{gen edate = ym(y, m)
format edate %tm
drop y m
rename edate date
}

// rename 
{rename belgium~1998 belgium
rename czechia~1992 czechia
rename germany~1991 germany
//and so on ...
}

// replace ":" by "." for STATA to understand that these are missing values
{ replace austria = "." if austria == ":"
replace belgium = "." if belgium == ":"
replace bulgaria = "." if bulgaria == ":"
//and so on ...
}
Thank you for your help !

principal component analysis

$
0
0
Hello stata users,
I'm a student in my thesis, and I have to construct a food security indicator based on principal component analysis on stata.
Food security has 4 dimensions:
Therefore, I chose I chose an indicator for each dimension of food security:
-To measure availability, I chose the variable food availability, which takes into account the availability of food in sufficient quantity and of appropriate quality.
-To measure access, I chose the Gross domestic product per capita (in purchasing power equivalent).
-To measure utilization, I chose the variable people using at least basic health services).
-To measure dietary stability, I chose the variable variability of food production per capita.
I have two questions:
1) Should I choose only one variable for each dimension or would it be better to choose if possible two or more variables for each dimension of food security to do the principal component analysis?
2) How is principal component analysis done on stata? I.e. the controls to be used and the process.
Please help me.

Yours sincerely

Handling missing data in Stata: imputation and likelihood based approaches

$
0
0
Dear All,

I have a question on the presentation, entitled "Handling missing data in Stata: imputation and likelihood based approaches", which I found here. Here are my two questions:
  1. on slide #18 it is suggested that one could use mi impute mvn also for categorical variables (Lee and Carlin, 2010). Could you please suggest the Stata command for doing that for the example discussed in the slide?
  2. also, is it possible to run FIML estimation in Stata with categorical variables?
Currently I do research where I have missing data and many of them are for categorical variables. I used mi impute chained command, but the convergence is not achieved (smth. I learnt is observed quite often). Thus, I decided to proceed with alternative approaches (especially after reading Lee and Carlin, 2010) and I would highly appreciate if you could help!

Thank you in advance.

Best,
Artak
Viewing all 73257 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>