Looping too many variables

February 13, 2020, 5:30 am

≫ Next: Testing for validity in IV Poisson

≪ Previous: Add line to sts graph (kaplan meier)

I did the following for too many variables:

Variable 1:

Code:

bysort year Industry: egen Industry_Mean_ROA = mean(ROA)
bysort year firm: gen Industry_Adjusted_ROA = ROA - Industry_Mean_ROA

Variable 2:

Code:

bysort year Industry: egen Industry_Mean_ROE = mean(ROE)
bysort year firm: gen Industry_Adjusted_ROE = ROE - Industry_Mean_ROE

However, I would like to use the loop function to do this as I have too many variables. The following did not work:

Code:

foreach v of varlist ROA  ROE {
    bysort year firm: gen Industry_Adjusted_`v' = `v' - { 
    bysort year Industry: mean(`v')
}
}

Thanks in advance

↧

Testing for validity in IV Poisson

February 13, 2020, 5:47 am

≫ Next: Mixed model with margins

≪ Previous: Looping too many variables

Dear Statalisters,

Is there a way one can test for the validity of instruments to be included in IV Poisson?

↧

Mixed model with margins

February 13, 2020, 6:41 am

≫ Next: changing from days to months on x axis- kaplan meier survival analysis

≪ Previous: Testing for validity in IV Poisson

Hey all

I have an mixed model equation that i want to graph with margins. There are two groups, two visits, 5 Steps and one variable i want to show the effect step__co

mixed step__co i.group#i.visit c.max_load i.group || randomization_no:

I cant figure out how to make a marginsplot with mixed model in a long dataset.

Best Regards,
Massar

↧

changing from days to months on x axis- kaplan meier survival analysis

February 13, 2020, 7:56 am

≫ Next: Sign of coefficients change after first-differencing

≪ Previous: Mixed model with margins

Hello, I have searched but not found the answer to this. I am doing survival analysis on Kaplan Meier and my time to failure is in days. I do not want to change variable to months as it will collapse events, but I would like the x-axis to convert from days to months. How can I do this? thanks!

↧

Sign of coefficients change after first-differencing

February 13, 2020, 8:12 am

≫ Next: Generating New variable based on data

≪ Previous: changing from days to months on x axis- kaplan meier survival analysis

Good evening,

I'm working with a dynamic panel data model. I first ran a fixed-effects regression (by using unit dummies) with xtpcse with the c(a) option in order to account for autocorrelation;

Code:

xtpcse gini100 l.gini100 findex flab gdpgrowth unemployment uniondensity trade socx d1 d2 d3 d4 d5, c(a)

However, in order to correct for Nickell bias, I tried running a first-differenced model with the reg command:

Code:

reg D.(gini100 l.gini100 findex flab gdpgrowth unemployment trade uniondensity socx), noconstant cluster(countrycode)

In the first regression, the coefficient for the lagged dependent variable as well as my main variable of interest (findex) were both significant and positive, as expected from the literature. However, in the second one they are both negative and only the lagged dependent variable is significant. The R-squared also drops dramatically from 0.9797 to 0.1198.

Does the interpretation of the coefficients change after first-differencing? Am I doing something wrong with how I'm coding? Or do the coefficients in the second regression represent the actual relationship without the bias?

Thank you.

↧

Generating New variable based on data

February 13, 2020, 8:16 am

≫ Next: Split Sample Estimation with GMM

≪ Previous: Sign of coefficients change after first-differencing

I am using data from the Luxembourg Income study but the data is coded, for example for Ireland in 1983: "ie83". Can anyone help me by showing a way to create a year variable equal to "1983" and a country variable equal to "Ireland" based on the information in the string variable?
Thank you in advance

↧

Split Sample Estimation with GMM

February 13, 2020, 8:16 am

≫ Next: Reshape with blank observations

≪ Previous: Generating New variable based on data

Hi everyone,

I'm using GMM to evaluate a structural framework. To explore heterogeneity in the results, I ran the model on different (disjoint) subsets of my original sample. Now as I understand it, commands such as suest are not suitable for post estimation using GMM, so I cannot test coefficients from two distinct GMM estimations (or can I?).

So to be able to test the coefficients I figured the easiest way would be to include interaction terms. Luckily, after solving the structural model as far as possible, it only consists of one equation, so the easiest way for me to include interaction terms would be to interact my only equation with the heterogeneity variable (which takes on three values corresponding to the terciles of its distribution), something like

Code:

gmm (i.tercile#(equation1)), instruments(XXX)

However, this returns an error message: "could not evaluate equation 1", which from my experience really does not say anything.

Here is the minimal example to replicate the error message:

Code:

gmm (residual - {FB}*saturday_HT_after), instruments(saturday_HT_after, noconstant) vce(cluster vid)
gmm (i.baseline#(residual - {FB}*saturday_HT_after)), instruments(saturday_HT_after, noconstant) vce(cluster vid)

The first command runs through, the second does not, but returns the error message mentioned above. Including or excluding the "i." and "c." indicators we usually use for interactions does not work either

Can anyone tell me if I'm overlooking something obvious or if there is an easier way to test coefficients from split sample GMM analysis?

Thanks!

↧

Reshape with blank observations

February 13, 2020, 8:38 am

≫ Next: How to select variables using * in the middle of the variables' name

≪ Previous: Split Sample Estimation with GMM

Hi,

I have a dataset where a number of items are included in one variable (string). I need to reshape this so that each item is recorded on an individual row. I provide a dummy dataset below to explain my query.

Currently, I use split to separate the items over individual variables followed by reshape long & drop any missing.

Code:

split product, p("+")
rename product prodlist
reshape long product, i(id) j(n)
drop if mi(product)
replace product=trim(product)

The number of items differs & so when I split, many of the new rows have blank observations. My actual data has thousands of observations, so the reshape is intensive & generates many unnecessary/blank observations.

This method works, but I'm wondering if there's a way to make this more efficient. Do you have any suggestions?

Thank you!
Bryony

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str34 product
 1 "oranges"                           
 2 "oranges"                           
 3 "apples"                            
 4 "oranges"                           
 5 "apples + pears"                    
 6 "bananas"                           
 7 "oranges"                           
 8 "oranges"                           
 9 "apples + bananas"                  
10 "apples"                            
11 "apples"                            
12 "apples + bananas"                  
13 "apples"                            
14 "bananas"                           
15 "oranges + apples + pears + bananas"
16 "pears"                             
17 "pears"                             
18 "pears"                             
19 "pears"                             
20 "apples + pears + bananas"          
end

↧

How to select variables using * in the middle of the variables' name

February 13, 2020, 8:56 am

≫ Next: If missing, fill-in cell with values/text from a selected column

≪ Previous: Reshape with blank observations

Hi Stataliat

I have a tricky problem with some variables.
I want to consider all income variables (y*) except income time variables - i.e. variables that show for how much time during the considered period an individual receives a certain income. These income time variables start with y* and end with *my. Let's suppose that I want to set all of income variables to 0 if children (dag<18), except income time variables.
This is my code:

capture describe y*
if _rc == 0 {
foreach var of varilist y* {
replace `var'=0 if dag<18 & `var'!=y*my
}
}

It doesn't work - I think because Stata read y*my as: multiply y time my (and obviously this is not possible with my data).
What could I do?

Thank you!

↧

If missing, fill-in cell with values/text from a selected column

February 13, 2020, 10:02 am

≫ Next: Exporting coefficients and p values from a loop containing a factor variable

≪ Previous: How to select variables using * in the middle of the variables' name

My data looks like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 c byte optionA str7 labelA int optionB str7 labelB byte optionC str7 labelC byte final str7 source
"AA" 100 "optionA" 101 "optionB" 99 "optionC" 100 "optionA"
"BB"   . ""         45 "optionB" 42 "optionC"  45 "optionB"
"CC"  53 "optionA"  42 "optionB"  . ""         53 "optionA"
"DD"   . ""          . ""        12 "optionC"  12 "optionC"
end

I would like to generate the two columns labeled 'final' and 'source' with Stata code.

The logic is the following: if the columns labeled 'optionA' and 'labelA' are missing then these are gap-filled with information, first, from 'optionB' and 'labelB' and lastly if those are missing with information from 'optionC' and 'labelC'.

I managed to do this as follows, but wonder if there's a more efficient way to go about it. Also, I wonder if my approach is foolproof enough that I won't run into issues (hard to say, but you get the point):

Code:

gen final2 = optionA
replace final2 = optionB if missing(final2)
replace final2 = optionC if missing(final2)

gen source2 = labelA
replace source2 = labelB if missing(source2)
replace source2 = labelC if missing(source2)

Thank you in advance!

↧

Exporting coefficients and p values from a loop containing a factor variable

February 13, 2020, 10:23 am

≫ Next: Putting median into esttab

≪ Previous: If missing, fill-in cell with values/text from a selected column

I have the following code that I used to get an output file containing coefficients, SEs and p values in a csv file. This worked fine when DAA was modeled as a continuous exposure variable. I have now changed it to a categorical variable, DAA_Cat with 3 levels coded as 1, 2 and 3 (which needs to be modeled as i.DAA_Cat in the model below). While I can easily change the variable in the regression model, I cannot seem to alter my code so that I get output (coef, se and p value) for both categories 2 and 3 compared to 1, the reference.

Code:

capture postutil clear
tempfile wholebloodresults
postfile handle str32 metabolite float DAA se_DAA  ///
    Sex se_Sex Age se_Age using `wholebloodresults'
    
    
foreach var of varlist Lactate-Pyruvate {

    // LOG TRANSFORM
    gen ln`var' = ln(`var')
    
    // RUN MIXED EFFECTS REGRESSION
    mixed ln`var' DAA Sex Age || SampleID:
    
    // LOOP OVER REGRESSORS TO CREATE THE MATERIAL TO POST
    local topost ("`var'")
    foreach x in DAA Sex Age  {
        local topost `topost' (_b[ln`var':`x']) (_se[ln`var':`x'])
    }
    
    // POST IT
    post handle `topost'
}

postclose handle

use `wholebloodresults', clear

foreach v of varlist DAA Sex Age {
   gen t_`v' = `v'/se_`v'
   gen p_`v' = 2*normal(-abs(t_`v'))
   order t_`v' p_`v', after(se_`v')
}
export delimited using my_wholebloodresults, replace

Many thanks,
Sandi

↧

Putting median into esttab

February 13, 2020, 10:50 am

≫ Next: Elapsed time in minutes

≪ Previous: Exporting coefficients and p values from a loop containing a factor variable

Hi all,
I'm trying to get summary statistics that include mean, median and sd, produced with estpost sum and then exported to a word doc with esttab. The following code should work but it is only printing the number of observations and no variable rows. Any ideas?

Code:

    eststo sum1: estpost sum $sumstats2 if data_round==1, d
    eststo sum2: estpost sum $sumstats2 if data_round==2, d
    eststo sum3: estpost sum $sumstats2 if data_round==3, d
    eststo sum4: estpost sum $sumstats2, d

esttab sum1 sum2 sum3 sum4 using "$replication\Chapter 5_Table 1.rtf", label append cells("mean(fmt(2)) p50(fmt(2)) sd(par)") ///
    mlabels("2010" "2013" "2016/17" "Overall") title("Table 1. Summary Statistics") ///
    nonumbers wide collabels("mean" "median" "sd")

Thanks for the help,
Kate

↧

Elapsed time in minutes

February 13, 2020, 12:19 pm

≫ Next: Fixed Effects Regression with Categorical Variable - Hausman and Spearman

≪ Previous: Putting median into esttab

Hello,

I am looking for a way to convert my variable for time elapsed ("time") which is now in an hour:minute format to a numeric variable corresponding to the number of minutes. I have included a dataex example below.

Thank you!

Sarah

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 time
"1:15"
"1:20"
"1:05"
"2:33"
"1:30"
"0:47"
"0:30"
"1:00"
"1:33"
"0:57"
"0:52"
"1:07"
"1:20"
"1:00"
"1:02"
"0:47"
"1:31"
"0:50"
"0:15"
"1:59"
"1:14"
"0:27"
"0:42"
"1:02"
"1:25"
"1:28"
"1:11"
"1:05"
"0:50"
"2:13"
"1:55"
"1:17"
"1:25"
"1:15"
"1:15"
"1:00"
"0:43"
"0:29"
"3:30"
"0:40"
"0:20"
"0:40"
"1:41"
"0:43"
"0:55"
"2:20"
"0:38"
"0:55"
"0:42"
"1:03"
"0:27"
"1:17"
"2:02"
"1:35"
"0:51"
"2:40"
"0:40"
"1:12"
"1:13"
"0:45"
"1:20"
"0:30"
"0:39"
"0:33"
"0:51"
"0:58"
"1:03"
"0:37"
"0:47"
"0:59"
"0:32"
"1:19"
"1:48"
"2:12"
"0:26"
"0:37"
"1:51"
"0:32"
"0:39"
"2:21"
"0:17"
"0:28"
"1:05"
"0:26"
"1:07"
"0:42"
"1:03"
"0:09"
"1:00"
"1:10"
"0:30"
"0:51"
"0:42"
"1:10"
"0:30"
"0:52"
"0:54"
"0:32"
"0:50"
"1:20"
end

↧

Fixed Effects Regression with Categorical Variable - Hausman and Spearman

February 13, 2020, 12:26 pm

≫ Next: i: operator is invalid

≪ Previous: Elapsed time in minutes

Hi all,

I have a panel data that consists of multiple flight legs that can happen anytime within the month of January. I am using a grouping by flight legs arriving at a specific airport. For example, all flights arriving at Boston airport will be group 1 and all flights arriving to JFK will be group 2, etc. I basically have around 500 groups. I have a number of predictors that I want to use in a fixed-effects regression. Most of my predictors are continuous variables but I have one variable that I encoded as Categorical which is basically the size of the departure airport. I have two questions.

2) I want to run the Hausman test to verify that I should use Fixed Effects instead of Random Effects. Can I run Hausman when I have a categorical Variable among my predictors? Because when I do, I get the following message "the rank of the differenced variance matrix (14) does not equal the number of coefficients being tested (15); be sure this is what you expect, or there may be problems computing the test. Examine the output of your estimators for anything unexpected and possibly consider scaling your variables so that the coefficients are on a similar scale."

1) Can I run a Spearman correlation for all my variables to check for correlation? (even if some of my variables are continuous and some are categorical?

Thank you!

↧

i: operator is invalid

February 13, 2020, 1:49 pm

≫ Next: Age-standardized incidence rates

≪ Previous: Fixed Effects Regression with Categorical Variable - Hausman and Spearman

Hi Statalisters,

I am using an old version of STATA/SE 10. I am trying to include an interaction term between two categorical variables in a negative regression model.
The first variable is age with four categories such as: (<5, 5-19, 20-64, 65+) and the second variable is season, another categorical variable with categorise like (2014-2015, 2015-2016, 2016-2017, 2017-2018,2018-2019).
Whenever, I run the code gen ageseason=i.age*i.season I get two different messages:
First, season may not use time-series operators on string variables. So I thought I could transform season to a variable with 5 categorise like 1,2,3,4,5 (still as a categorical variable).
I get the message i: operator is invalid r(109);
I have a feeling that this might be related to the old version of stata . However, I would kindly ask anyone to help me make this work.
Thank you very much,

Adriana Peci

↧

Age-standardized incidence rates

February 13, 2020, 2:05 pm

≫ Next: Multi fixed effects time series - bootstrapping standard errors.

≪ Previous: i: operator is invalid

Hello,

I am trying to calculate age-standardized incidence rates of TIA events per 1000 population per year per age statum and overall.

I am using the following file as weights:

age	popn
15-54	54697
55-64	10731
65-74	8538
or =75	7585

This my stata code:

dstdize tia popn age, by(ckd) using("..../standard population.dta")

and this is my output:
Array

Array

I have two questions:

1. Unfortunately the event rate here is for a 5 year period and I need to calculate per 1000 population per year so I understand that I can x1000 but do I then divide by 5? How do I then calculate the 95% CIs?
2. Is there any code that will report the 95% CI for the adjusted incidence rates per strata (i.e. per age group) instead of only for the summary adjusted estimate?

Thanks for your help!

Dearbhla

↧

Multi fixed effects time series - bootstrapping standard errors.

February 13, 2020, 3:56 pm

≫ Next: test

≪ Previous: Age-standardized incidence rates

Dear Statalist,

I am looking to find the relationship between house price shocks on divorce rates. I have annual data for 8 time periods. I have time and county fixed effects so I am using the reghdfe command with absorb, from the scc install library.

My main regression looks like such:

Code:

reghdfe divorce_rate HPSHOCK femalelf_num unemployment_rate_num degree_num GCSE_num a_level_num noGCSE_num under10_num i10to34hours_num i35to44hours_num over45hours_num white_num weekly_pay_num, absorb(countynum d) vce(cluster countynum)

My supervisor has told me that instead I need to bootstrap cluster my standard errors. However I believe that the reghdfe command is not compatible with bootstrap. How can I solve this issue?

I have a subset of my dataset below to illustrate.

Thanks,
Jamie

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 mnemonic str46 county float(degree_num under10_num i10to34hours_num i35to44hours_num over45hours_num unemployment_rate_num white_num femalelf_num weekly_pay_num d) int year float(divorce_rate HPSHOCK countynum ln_divorce_rate)
"" "Bath and North East Somerset UA"      29.4        4        30     41.7      24.2      5.7      90.4      71.7     500.1 1 2011 1.1543634  -.04652044 1   .14354903
"" "Bath and North East Somerset UA"      33.3      5.6      28.3       44      22.1      4.7      90.1      70.4     481.1 2 2012 1.1265371 -.015247965 1    .1191484
"" "Bath and North East Somerset UA"      35.4      3.5      29.4       43      24.1      6.1      88.4      76.6     509.2 3 2013  .9782556  .037154894 1 -.021984324
"" "Bath and North East Somerset UA"      34.8      5.7      26.9     39.2      28.2      4.3      90.2      70.7     542.5 4 2014 1.1008123   .07657316 1   .09604838
"" "Bath and North East Somerset UA"      35.1      6.3      28.9     39.1      25.7      5.6      89.7      73.6     514.7 5 2015 1.0057058   -.1214285 1  .005689617
"" "Bath and North East Somerset UA"      40.7      3.7      32.9     39.9      23.5      4.2      88.1      76.8     522.3 6 2016 1.1473192   -.2731231 1   .13742809
"" "Bath and North East Somerset UA"        40      5.9      31.5     41.4      21.1      3.8      86.4      74.7     564.1 7 2017  .9564212   .13611344 1  -.04455688
"" "Bath and North East Somerset UA"      40.2      4.6      31.7     41.9      21.8      2.2      88.1      73.6     544.2 8 2018 1.0258412    .2302647 1  .025512993
"" "Bedford UA"                             25      4.2      29.3     42.7      23.8      6.1        76      73.7     494.7 1 2011  2.184815   -.3184305 2    .7815311
"" "Bedford UA"                           24.8      6.5      24.4     43.7      25.4        7      72.3      73.4     499.6 2 2012 2.3504815  -.50932974 2    .8546202
"" "Bedford UA"                           33.4      2.2      28.1     41.4      28.4      6.4      76.1      82.3     517.5 3 2013 2.1145895  -.44200635 2    .7488607
"" "Bedford UA"                           32.9      2.5      27.1     37.9      32.6      8.8      73.6      81.1     555.7 4 2014 2.1417122   -.3783329 2    .7616056
"" "Bedford UA"                           30.8      2.9      26.3     40.9      29.9      4.1      73.4      76.3     530.9 5 2015 2.1768427  -.28063366 2    .7778755
"" "Bedford UA"                           30.6      4.6      20.5     47.7      27.2      5.7      72.2      74.5     556.5 6 2016 2.1397316   .13171378 2    .7606804
"" "Bedford UA"                           35.5      5.3      24.2     40.8      29.7      3.8        76      73.4       584 7 2017 1.9548142     .787049 2    .6702951
"" "Bedford UA"                           32.1      4.2      24.3     44.6      26.9      2.6      66.3      82.1     565.7 8 2018  2.048119    .8929451 2    .7169219
"" "Blackburn with Darwen UA"        18.445915      2.5  28.38741 50.77852   18.3037 8.956295 75.005165  64.02518  432.8548 1 2011   .224107    .1946874 3  -1.4956317
"" "Blackburn with Darwen UA"        17.114372 2.652533  28.99493 50.20765 18.079626  8.99576  75.97445 64.377045  427.8786 2 2012  .2287373    .4383854 3  -1.4751812
"" "Blackburn with Darwen UA"        16.584013 3.080531  29.31168 48.76189 18.811268 9.111681  75.94077  64.06885   426.417 3 2013  .1985797    .3540271 3  -1.6165646
"" "Blackburn with Darwen UA"        17.216702 3.248777  28.34053 49.57016 18.775053  9.80601    74.055  65.21558   460.467 4 2014 .18671684  .064940214 3   -1.678162
"" "Blackburn with Darwen UA"        17.618029 2.654105 29.636055 47.70984        20 6.754105 74.137665  64.84753 448.56235 5 2015  .1806629  -.29422763 3  -1.7111225
"" "Blackburn with Darwen UA"        20.826815 2.537642  29.17738  47.6538  20.63118 7.206462  75.09224  67.34717  449.4172 6 2016  .2001491   -.4519042 3  -1.6086926
"" "Blackburn with Darwen UA"         23.01605 3.696901 29.456335 44.34311  22.50365 5.828168  73.17913  67.30703  458.5236 7 2017   .198256   -.3384984 3   -1.618196
"" "Blackburn with Darwen UA"         24.12458 3.065688  29.70092 46.90881 20.258894 5.740184  71.50806  66.80696  466.9022 8 2018 .17811483   -.1549169 3  -1.7253268
end

↧

test

February 13, 2020, 4:18 pm

≫ Next: pooled OLS vs. fixed effexts; omitted variables

≪ Previous: Multi fixed effects time series - bootstrapping standard errors.

↧

pooled OLS vs. fixed effexts; omitted variables

February 13, 2020, 4:43 pm

≫ Next: Stata Econometric model.

≪ Previous: test

Dear Statalist members,

I use Stata 15.1 and I have several issues with my data. First I'd like to explain what my research is about as I think it's important to understand where I am heading to.

In my study I am looking at German companies from 2013 to 2018 (no. of obs 725). These companies are sorted into different stock exchange segments. From 2013 to 2016 all companies were within one segment. Due to regulatory changes this segment was closed and all companies had to - more or less (depending on wether they could fulfill certain KPIs) - decide in which of the two new stock exchange segments they want to be sorted into.

What I want to do now is to examine wether companies within one of the new segments benefit positive economic consequences such as higher liquidity and higher valuation due to the new segment they are in. (Because the segments differ concerning the regulatory level; from 2013 to 2016 it was at intermediate level and in 2017 & 2018 one segment is high(er) and one is low(er)). That is why I want to run regressions with the following structure. (segment=dummy variable, 1=2013-2016 segment (intermediate level), 2=2017 & 2018 segment (high level) & 3=2017 & 2018 segment (low level). I use the dummy variable as follows: i.segment (1, 2 or 3).

Liqit =α0 + α1SEGMENTit + α2LN_SIZEit + α3LN_VOLUMEit + α4LN_VOLATILITYit + α5GDPt + εit

1) Is there a test to determine wether I have panel data or fulfill the requirements of panel data?
2) Is there any reason why a pooled OLS with White standard errors (reg, robust) would be more suitable than a fixed effects regression (xtreg, fe)?

I am asking these questions because the results under using "reg, robust" better match my theoretical assumptions. I am aware of the fact, that i cannot just use the results that I like more instead of using the model that fits my data structure. But when I am using time-fixed effects the years 2017 and 2018 are omitted.

When I am using "xtreg, fe" just for the two new segments in 2017 and 2018, one of the segment also gets omitted (but only when I run the regression pairwise for segment 2 and 3).

This is why I am thinking, that the fixed effects model might not be suitable for my study as the segment variables might be time invariant (2013 - 2016 there is only segment 1; 2017-2018 there is either segment 2 or 3).

I hope my thoughts are fairly understandable.

Thanks a lot.
Martin.

↧

Stata Econometric model.

February 13, 2020, 6:07 pm

≫ Next: Bootstrapping very large sample

≪ Previous: pooled OLS vs. fixed effexts; omitted variables

Hi,

I am doing an econometric project and the theme is on the effect of urbanization on gender gap. The data is Labour Force Survey(2015) of Armenia. But I am struggling to construct the
econometric model. My aim is to investigate the gender income gap from the returns to secondary education and the gender gap in the returns from tertiary education in urban and rural areas and see if the income gap (in lnwages) is narrowed as women get a tertiary degree. And then the aim is to compare it for urban and rural areas.

In other words, I would like to investigate if tertiary education narrows the gender income gap and if there is a difference between urban and rural areas.

But I am really struggling to construct an econometric model, as there is also serious problem with Multicollinearity in educational qualifications.

I reduced the categorical variables of education into 4:" Noeducation, Primary, Secondary, Tertiary" but this didn't solve the issue.

I would really appreciate if there is any advice on how to construct the model and find a solution regarding Multicollinearity.

(My variables are: log(wage), sex, age, age-squared, education, ruban/rural, profession, private/public sector)

↧