Generating a varlist using loops

March 5, 2020, 11:54 am

≫ Next: Storing marginal effects after logistic regression in a loop

≪ Previous: Interactions Interpretations

Dear all,

I have several variables with a similar structure of name (i.e., ratio_XXXXXXXXXX_co_LLL, where X and L are numbers). Let me mention that there are variables with the same XXXXXXXXXX number but different LLL, and the comment is also valid in reverse. I need to define a varlist of all these variables.
Is it possible to define the varlist using a loop instead of doing it manually?

Thanks!

↧

Storing marginal effects after logistic regression in a loop

March 5, 2020, 2:01 pm

≫ Next: generating a composite variable with a continous and a categorical variable

≪ Previous: Generating a varlist using loops

Dear all,

I am rather desperate to get the marginal effect of father’s education on the probability of college graduation after running a logistic regression for a number of countries that participated in the PIAAC survey carried out by OECD.

The model is quite simple: the dependent variable is binary, my main independent variable is father’s education (three categories) and I have two controls (age and gender). In principle, it should not be difficult. But I have to account for the complex sample design of PIAAC, which means using a number of weights provided in PIAAC data.

A Stata module (repest) was specifically created for this purpose: “repest estimates statistics using replicate weights (…) thus accounting for complex survey designs in the estimation of sample variances”. It is especially designed to for databases like IELS, PIAAC, PISA, TALIS…

‘Repest’ basically works as follows:

PHP Code:


repest svyname [if] [in] , estimate(cmd [,cmd_options]) [options]

Next, there is one of the examples provided by the authors in the corresponding help file of repest:

PHP Code:


repest PIAAC, estimate(stata: reg lnwage pvlit@ yrsqual) by(cnt)

Since I want to run the same model for a number of countries in PIAAC, I intend to create a loop that includes repest. But I also want to generate the marginal effect of father’s education after the logistic regression for each country, storing these marginal effects and then saving them in a different Stata file (dta).

At the end of repest help file, the authors provide a loop precisely for logit posestimation:

HTML Code:

    User-defined estimation command: 2. logit postestimation

        cap program drop mylogitmargins
        program define mylogitmargins, eclass
        syntax [if] [in] [pweight], logit(string) [margins(string) loptions(string) moptions(string)]
        tempname b m
        // compute logit regressions, store results in vectors
                logit `logit' [`weight' `exp'] `if' `in', `loptions'
                matrix `b'= e(b)
        // compute logit postestimation, store results in vectors
                if "`margins'" != "" | "`moptions'" != ""{
                        margins `margins', post `moptions'
                        matrix `m' = e(b)
                        matrix colnames `m' =  margins:
                        matrix `b'= [`b', `m']
                        }
        // post results
                ereturn post `b' 
        end
    . repest PISA, estimate(stata: mylogitmargins, logit(repeat pv@math escs ib1.st04q01) margins(st04q01) moptions(atmeans))

Yet, I do not know how to replicate this with my data and, in particular, how to make sure that the marginal effects of father’s education for each country is stored after each logit.

I have succeeded in making ‘repest’ work with my logit model. Next, I show a program so that the name of the country appear in the output, a replica of the program for logit post-estimation offered by the authors of repest and, finally, the loop where I introduce repest for the estimation of logit probabiities for each country:

Code:

egen cntryid3_group=group(cntryid3), label

program define pe
        if `"`0'"' != "" {
        display as text `"`0'"'
        `0'
        display("")
    }
end

        cap program drop mylogitmargins
        program define mylogitmargins, eclass
        syntax [if] [in] [pweight], logit(string) [margins(string) loptions(string) moptions(string)]
        tempname b m
        // compute logit regressions, store results in vectors
                logit `logit' [`weight' `exp'] `if' `in', `loptions'
                matrix `b'= e(b)
        // compute logit postestimation, store results in vectors
                if "`margins'" != "" | "`moptions'" != ""{
                        margins `margins', post `moptions'
                        matrix `m' = e(b)
                        matrix colnames `m' =  margins:
                        matrix `b'= [`b', `m']
                        }
        // post results
                ereturn post `b' 
        end


foreach i of numlist 1/24 {
       display "`: label (cntryid3_group) `i''"
       pe capture noisily repest PIAAC, estimate(stata: mylogitmargins, logit(univ i.edufath female age if cntryid3_group==`i' & egresados==1) margins(r.edufath))
       }

But I have not succeeded in generating the marginal effect of father’s education and storing them after the logistic regression for each country

Next, I show the results (output) for the second country of the list. The last two lines in the output are precisely the contrast of marginal effects for the three categories of father's education (second versus first, third versus first). It's what I want; yet, I do not know how to store them for each country, and how to retrieve them afterwards.

HTML Code:

capture noisily repest PIAAC, estimate(stata: mylogitmargins, logit(univ i.edufath 
> female age if cntryid3_group==2 & egresados==1) margins(r.edufath))
(note: file C:\Users\LOrti\AppData\Local\Temp\ST_00000005.tmp not found)
file C:\Users\LOrti\AppData\Local\Temp\ST_00000005.tmp saved

_pooled.
 : _pooled
----------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
univ_1b_edufat~r |          0  (omitted)
univ_2_edufather |   1.384978   .1785063     7.76   0.000     1.035112    1.734844
univ_3_edufather |   2.719534   .2327599    11.68   0.000     2.263333    3.175735
     univ_female |   .0569396   .1659537     0.34   0.732    -.2683237     .382203
        univ_age |  -.0004378   .0151277    -0.03   0.977    -.0300876    .0292121
      univ__cons |  -2.912376   .5037946    -5.78   0.000    -3.899795   -1.924956
margins_r2vs1_~r |   .1281268   .0182265     7.03   0.000     .0924035    .1638501
margins_r3vs1_~r |   .4029726   .0477158     8.45   0.000     .3094514    .4964938
----------------------------------------------------------------------------------

Could you help me with this?

Thanks for your attention

Luis Ortiz

PD: In case it could be of any use, I include a sample of my data, extracted from my dataset using dataex:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double edufather float(age female univ cntryid3_group)
1 99 1 1 1
2 99 0 1 1
3 99 1 1 1
1 99 1 0 1
1 99 1 0 1
2 99 0 0 1
3 99 0 1 1
3 99 0 0 1
3 99 1 0 1
1 99 1 0 1
3 99 0 0 1
1 99 1 0 1
2 99 1 0 1
3 99 1 1 1
3 99 1 1 1
2 99 0 0 1
1 99 0 0 1
3 99 1 1 1
2 99 1 0 1
2 99 0 1 1
3 99 0 0 1
1 99 0 0 1
3 99 1 1 1
3 99 1 0 1
1 99 1 0 1
2 99 0 0 1
2 99 1 1 1
1 99 0 0 1
. 99 0 0 1
1 99 0 0 1
1 99 1 1 1
1 99 1 0 1
1 99 0 0 1
2 99 1 0 1
1 99 0 0 1
2 99 0 1 1
3 99 1 0 1
3 99 0 0 1
2 99 0 0 1
1 99 1 0 1
. 99 0 0 1
2 99 0 0 1
3 99 0 0 1
1 99 1 0 1
2 99 0 1 1
3 99 0 1 1
3 99 0 1 1
3 99 1 1 1
3 99 0 0 1
2 99 0 0 1
3 99 1 0 1
1 99 0 0 1
2 99 0 0 1
2 99 1 0 1
1 99 0 0 1
. 99 0 0 1
. 99 0 0 1
2 99 0 0 1
1 99 1 0 1
1 99 1 1 1
1 99 0 0 1
1 99 1 0 1
2 99 0 0 1
2 99 1 0 1
1 99 0 0 1
1 99 1 0 1
3 99 0 1 1
1 99 0 0 1
2 99 1 0 1
1 99 0 1 1
2 99 0 1 1
1 99 0 0 1
1 99 0 0 1
2 99 1 0 1
1 99 1 0 1
2 99 1 0 1
1 99 0 0 1
2 99 0 1 1
3 99 0 0 1
3 99 1 0 1
3 99 1 0 1
1 99 0 0 1
3 99 1 1 1
3 99 0 1 1
3 99 0 0 1
2 99 0 0 1
3 99 0 0 1
2 99 0 0 1
2 99 1 0 1
1 99 0 0 1
3 99 1 0 1
1 99 0 0 1
2 99 1 1 1
3 99 1 0 1
2 99 0 0 1
. 99 0 0 1
1 99 0 0 1
1 99 0 0 1
1 99 0 0 1
2 99 0 0 1
end
label values edufather edu_fat
label def edu_fat 1 "ISCED 1/2/3sh", modify
label def edu_fat 2 "ISCED 3/4", modify
label def edu_fat 3 "ISCED 5/6", modify
label values female gndr
label def gndr 0 "Male", modify
label def gndr 1 "Female", modify
label values univ univ_lab
label def univ_lab 0 "No uni", modify
label def univ_lab 1 "Univ", modify
label values cntryid3_group cntryid3_group
label def cntryid3_group 1 "124. Canada", modify

↧

generating a composite variable with a continous and a categorical variable

March 5, 2020, 3:11 pm

≫ Next: Suppressing Overall marker in Metanalysis forest plots

≪ Previous: Storing marginal effects after logistic regression in a loop

Dear Statalist,

To measure my dependent variable, I would like to generate a composite variable comprising of two variables - a continous and an ordinal categorical variable. The ordinal categorical variable ranges from 0 to 1. Please what is the right way of doing this?

Thank you.

Ikenna

↧

Suppressing Overall marker in Metanalysis forest plots

March 5, 2020, 6:07 pm

≫ Next: panel data with three equations

≪ Previous: generating a composite variable with a continous and a categorical variable

Hi David. I am new to stata and woud like to plot forest plots for of hazard ratios, but would like to supress the overall marker. (0verall effect should be omitted in the output foest plot). I am using the command below because I have lower and upper confidence intervals (lci and uci) and hazard ratios.
admetan loghr loglci loguci, noomarker hr fe study(PLHIV) forestplot(xlabel(.25 .5 1 2.0 4.0))

But this command seems not accept teh "noomarker" option.

any help on what else I could use?

Thanks

Dathan

↧

panel data with three equations

March 5, 2020, 6:14 pm

≫ Next: replace multiple variables at the same time

≪ Previous: Suppressing Overall marker in Metanalysis forest plots

Hi all,

I am trying to solve the following equation system. The data is panel and I will include fixed effects (id, time)

depvarA = z x1 x2 i.id i.time (eq.1)
depvarB = depvarA x1 x2 i.id i.time (eq.2)
depvarC = depvarB x3 i.id i.time (eq.3)

If there are only first two equations, I would try the following
xi:xtivreg depvarB x1 x2 i.time (depvarA = z), fe cluster(id)

However, I have no idea when it comes to solving three equations with panel data.
The only thing I know is reg3, which does not allow robust standard errors.

One thing I can think of is combining eq1 and eq2 like
depvarB = z x1 x2 i.id i.time

However, I do want to see how variable z impacts depvarA, which again impacts depvarB, which impacts depvarC.

↧

replace multiple variables at the same time

March 5, 2020, 6:27 pm

≫ Next: I want my string values to become numeric values

≪ Previous: panel data with three equations

Hi, I have 10 variables. All of these variables are dummy, 0 or 1. The original data set is 1 and . in other words, 0 values are denoted as missing (no input).

I know a basic and silly method is:

Code:

replace var1 = 0 if var1 == .

And writing down this code 10 times for different var.

Is there any cool method?

↧

I want my string values to become numeric values

March 5, 2020, 7:33 pm

≫ Next: convert GPX file to CSV file in Stata

≪ Previous: replace multiple variables at the same time

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str26(q11r1_facebook q11r2_instagram q11r3_twitter q11r4_snapchat q11r5_pinterest q11r6_tiktok q11r7_linkedin q11r8_strava)
"I use frequently"           "I use frequently"       "I would consider using"     "I use frequently"    "I use frequently"           "I use frequently"           "I would consider using"     "I've never heard of"       
"I would NOT consider using" "I use frequently"       "I would NOT consider using" "I use occassionally" "I would NOT consider using" "I would NOT consider using" "I would consider using"     "I've never heard of"       
"I use frequently"           "I would consider using" "I would consider using"     "I use frequently"    "I would consider using"     "I would consider using"     "I use occassionally"        "I've never heard of"       
"I use frequently"           "I use frequently"       "I would NOT consider using" "I use frequently"    "I use occassionally"        "I would NOT consider using" "I would NOT consider using" "I've never heard of"       
"I use frequently"           "I use occassionally"    "I would NOT consider using" "I use occassionally" "I use occassionally"        "I would NOT consider using" "I would NOT consider using" "I would NOT consider using"
end

I have these variables. And they share the same value label:
I use frequently
I use occassionally
I would NOT consider using
I would consider using
I've never heard of

I want them to be, say, 5,4,3,2,1 to do future regression.

What's the easiest code?

↧

convert GPX file to CSV file in Stata

March 5, 2020, 8:10 pm

≫ Next: Backing out the level variable from a first-differenced log

≪ Previous: I want my string values to become numeric values

Hi all,

I have many gpx files. I want to show the tracking point by using spmap, which needs lon and lat. Are there ways to extract lon and lat from gpx to csv by using Stata?

Thank you in advance.

↧

Backing out the level variable from a first-differenced log

March 5, 2020, 11:15 pm

≫ Next: Stata Course

≪ Previous: convert GPX file to CSV file in Stata

Hi
I have a variable that is only available in my dataset as first-differenced log variable (dtfp, which represents first differenced log total factor productivity).
I need to have this variable as in level form (or log level).
How can I do this in Stata?

Thanks
Mike

↧

Stata Course

March 6, 2020, 1:51 am

≫ Next: Roc analysis?

≪ Previous: Backing out the level variable from a first-differenced log

Hi,
i'm a physician and i am interested in attending a course of biostatistics, based on stata software. Can you suggest me an entry-level course? (i'm from Italy, so maybe in Europe would be preferable..)

↧

Roc analysis?

March 6, 2020, 2:05 am

≫ Next: Time series with seasonality

≪ Previous: Stata Course

Hi,
i have a dataset of nearly 500 observations (and more than 100 variables for each obs).
For each observation I have a dichotomous variable ("diagnosis", define or not defined) and to obtain a definite diagnosis i explored other four features. I have measured empirically (yes, case by case) the contribution of each variable and noted a remarkable difference among them. I'd like to translate my observation in a "statistical language". Probably i must perform a roc analysis and for each variable measure the number of cases in which it is sufficient to make define diagnosis.
Is it correct?

Nevertheless I have two problems:
- first of all, not all variables have been tested for each patient (in some patients only two, in other only three). I must select only patients in which have been performed all of my variables? Or can the differentiate basing on proportion of n. of diagnosis using that variable / number of patients in which that variable have been tested and comparing this proportions? How can i translate into "statistical language"?
- secondly, when i try yo import my dataset in STATA it is impossible: Unable to load excel data "Error: Unexpected attribute"

I hope i was clear.
Thank you in advance for help.

↧

Time series with seasonality

March 6, 2020, 2:19 am

≫ Next: Multivariate probit model-->constant term missing standard errors

≪ Previous: Roc analysis?

Dear Statalists,

Please, I would really appreciate your help in my analysis.

I am working with hourly electricity prices in the Nordpool market. I have hourly electricity prices from 2007 to 2016 and I want to see the effect of a policy intervention in the market in November 2011.
Electricity prices are characterized by daily (more consumption around 11am and 4pm), weekly (less consumption on saturdays and sundays) and monthly (more consumption in winter) seasonality.
In order to check for the white noise I used, firs of all, two different techniques and I investigated what happened to the (logged) prices in a narrow window around the policy intervention.
I used the moving average with 96 hours (3 days, in order to take into account both hourly and weekly seasonality) and, by using a reference system price, I also regressed both the variables on the hourly and weekly dummies and then regressed the DV residuals on the IV residuals.
Both of these analyses showed that the volatility of prices decreased after the intervention.
Now, I am trying to model my data with ARMA process (time series is stationary).
However, it seems (and actually there is) a sinusoidal pattern that I do not know how to address.
According to my AC and PAC I should do something like ARMA(1, 120) which seems a little weird.
Could you please suggest me what I am missing?
Please, find attached the graphs from the AC and PAC.

Array Array

Thank you
Luisa Loiacono

↧

Multivariate probit model-->constant term missing standard errors

March 6, 2020, 2:58 am

≫ Next: standard deviation xtpoisson

≪ Previous: Time series with seasonality

Hi,

I am running a multivariate probit model to assess the music consumption habits of a sample of individuals (about 230 people). I have 4 equations with mainly dummy explanatory variables. When I run the multivariate model with all 4 equations using the cmp function, the output is fine. However, I removed one of my equations because it was not significant overall when run as an individual regression. When I run the model with only three equations, the standard error for the constant term in the regression for one of my dependent variables (PIRACY) is now missing.

Why might this happen? Is it a big issue?

Many thanks

↧

standard deviation xtpoisson

March 6, 2020, 3:29 am

≫ Next: PSM and DID

≪ Previous: Multivariate probit model-->constant term missing standard errors

Hi all,

is there a way to recover standard deviations of coefficients from xtpoisson, fe?
in particular, I have:

Code:

quietly xtpoisson trials y residuals average_age_prodbyatc3 avg_prd_sq mean_agefirm_byatc mean_agefirm_squared hhi share_expired share_patented i.Year, fe vce(robust)

and would like to recover and save the standard deviation of coefficients. Have I provided too few info? Please let me know

Thanks for the help in advance

↧

PSM and DID

March 6, 2020, 3:54 am

≫ Next: calculating weights for each year (panel data)

≪ Previous: standard deviation xtpoisson

Can outcomes from PSM and DID have same coefficients..

↧

calculating weights for each year (panel data)

March 6, 2020, 4:21 am

≫ Next: Create a boxplot with 2 x left axis to capture difference ranges #data visualisation

≪ Previous: PSM and DID

Hi,

I am currently writing my thesis but unfortunately ran into some problems with calculating a variable in stata.

My (panel)dataset consists of 48,000 observations all data of firms from year 2006 to 2014

The variable i am trying to create is efwamb which is denoted by this formula: Array

e and d are equity issues and debt issues respectively, M/B is the market to book ratio in any given year. For every year the M/B ratio is multiplied by the weight of equity and debt issues divided by the sum of total equity and debt issues of the starting years until the year i am calculating the efwamb for. So for example 2006 and 2008:
Array
Array

I have calculated the e and d issues in every year, as well as the M/B ratios and cumulative sum of e and d for every year. So I now have the nominator and denominator available but am really struggling with calculating these weights.

Would anyone know an easy way to calculate this? Would be much appreciated!

Bob

↧

Create a boxplot with 2 x left axis to capture difference ranges #data visualisation

March 6, 2020, 5:11 am

≫ Next: Indirect effects with zero-inflated Poission regression / SEM

≪ Previous: calculating weights for each year (panel data)

Hi folks - been mulling over this but can't quite figure out a good solution. Wondered if anyone had suggestions.

Basically, I have a boxplot looking at accelerometer counts and I want to show time spent at different intensities across PA spectrum. Problem is there is quite a large range at the lower end (in particular for 0-49cpm - see image below) - which makes it hard to make out the data at right end.
Array

One option is to remove the 0-49cpm category so can glean more insight from others (e.g. below). However, I do need to show all of them somehow (and ideally on same graph). Array

I wondered whether a good option might be to have a second axis on the left for that categories, but superimposed onto the same graph for perspective (example of what I am pretty much after below).... Array

I wondered if anyone had any thoughts on how to do something like this in Stata? The above example is I think possible in excel... I'd also like to have the the graph in the same format by sex (as per above - bars side by side as opposed to two separate graphs) - but unsure how to do that. Array

Wondered if anyone had any insights on how to 1) get sex side by side in same single plot, and 2) get something very similar to that excel version above with a separate axis for the 0-49cpm category)? Thanks and hope this is clear...

This is my current Stata code.

Code:

graph box cpm_0_49-cpm_5000plus if incl_main==1 , ///
    nooutsides ytitle("Time (min/day)") note("") leg(off) graphregion(fc(white) ifc(white) lc(white) ilc(white)) ///
    showyvars yvar(label(labsize(tiny) angle(vertical))) yscale(range(0 1000)) ylabel(0 (250) 1000)

    graph export "$OUT_DATASET/Box_whisker\time-intensity_`epoch's_ALL_nonnorm.png", height(`plotexportheight') width(`plotexportwidth') replace
    
graph box cpm_0_49-cpm_5000plus if incl_main==1 , ///
    over(sex) ///
    nooutsides ytitle("Time (min/day)") note("") leg(off) graphregion(fc(white) ifc(white) lc(white) ilc(white)) ///
    showyvars yvar(label(labsize(tiny) angle(vertical))) yscale(range(0 1000)) ylabel(0 (250) 1000)

    graph export "$OUT_DATASET/Box_whisker\time-intensity_`epoch's_ALL_SEX_nonnorm.png", height(`plotexportheight') width(`plotexportwidth') replace

↧

Indirect effects with zero-inflated Poission regression / SEM

March 6, 2020, 6:12 am

≫ Next: Determining best fe-model with paneldata (u-test)

≪ Previous: Create a boxplot with 2 x left axis to capture difference ranges #data visualisation

Dear all,

I have a causal chain X -> M -> Y and am interested in the indirect effect of X on Y. This can usually be easily estimated using SEM and nlcom.

My problem is that the X -> M part in my data requires a zero-inflated Poisson regression (whereas the M -> Y part does not require special treatment). How would you estimate the indirect effect of X on Y in this case?

I tried specifying the following SEM:

gsem (1: m <- , family(pointmass 0)) (2: m <- x, family(poisson)) (C <- x t)(y <- m x), lclass(C 2) lcinvariant(none) vce(cluster id)

However, I encounter two issues: First, the entire model is estimate in two latent classes, including the latter par (M -> Y). However, I would like to get one indirect effect at the end. Second, I don't think I am allowed to simply multiply the two coefficient (b[X->M] * b[M->Y]) because zero-inflated Poisson is logistic.

How would you proceed? I would be deeply grateful for some guidance.

Best,

Johannes

↧

Determining best fe-model with paneldata (u-test)

March 6, 2020, 6:33 am

≫ Next: IV variable option in Stata program syntax

≪ Previous: Indirect effects with zero-inflated Poission regression / SEM

Hello! I am new to stata and am investigating the relationship between paid leave (maternal and parental) and female labour force participation rates with OECD paneldata. Literature that has also investigated this relationship uses a FE model with year dummies and country-specific time trends and find an inverse u-relationship. Unfortunately, I find different results and struggling with the fit of my data.

I have done multiple tests where I have found out that i should include year dummies, trends and clustered SE (due to heteroscedasticity and autocorrelation) and that I should rely on a FE, rather than a RE model.

When doing:

xtset cou_1 year
xtreg flfpr2554 moth_leave moth_leave_2 i.year i.cou_1#c.t, fe i(cou_1) vce(cluster cou_1)
nlcom _b[moth_leave]/(2*-_b[moth_leave_2])

adopath ++ "m:\p_arbeid\p_betaald_verlof\SSC install"
utest moth_leave moth_leave_2
(129 missing values generated)

Specification: f(x)=x^2
Extreme point: 538.5394

Test:
H1: Inverse U shape
vs. H0: Monotone or U shape

-------------------------------------------------
| Lower bound Upper bound
-----------------+-------------------------------
Interval | 0 198
Slope | .0351557 .0222303
-------------------------------------------------

Extremum outside interval - trivial failure to reject H0

I get a weird extreme point and graphing the result also shows that it is probably not the best fit

Code:

graph twoway (scatter flfpr2554 moth_leave) (function y=[44.7195]+x*[.0351557]+x^2*[-.0000326], range(0 800))

Array

When not including year dummies and trends, my fit seems better.

Code:

use "m:\p_arbeid\p_betaald_verlof\Data\Stata files\7-1-regressions-2018", clear

xtset cou_1 year

gen moth_leave_2 = moth_leave*moth_leave
xtreg flfpr2554 moth_leave moth_leave_2, fe i(cou_1)
nlcom _b[moth_leave]/(2*-_b[moth_leave_2])

adopath ++ "m:\p_arbeid\p_betaald_verlof\SSC install\"
utest moth_leave moth_leave_2


Specification: f(x)=x^2
Extreme point:   138.853

Test:
     H1: Inverse U shape
 vs. H0: Monotone or U shape

-------------------------------------------------
                 |   Lower bound      Upper bound
-----------------+-------------------------------
Interval         |           0              198
Slope            |    .4256187        -.1813002
t-value          |    12.05258        -2.726574
P>|t|            |    3.16e-31         .0032678
-------------------------------------------------

Overall test of presence of a Inverse U shape:
     t-value =      2.73
     P>|t|   =    .00327

the u-test (presence of u-shape) becomes insignificant when clustering at country-level.

Code:

use "m:\p_arbeid\p_betaald_verlof\Data\Stata files\7-1-regressions-2018", clear

xtset cou_1 year

gen moth_leave_2 = moth_leave*moth_leave
xtreg flfpr2554 moth_leave moth_leave_2, fe i(cou_1) vce(cluster cou_1)
nlcom _b[moth_leave]/(2*-_b[moth_leave_2])

adopath ++ "m:\p_arbeid\p_betaald_verlof\SSC install\"
utest moth_leave moth_leave_2

. utest moth_leave moth_leave_2
(129 missing values generated)

Specification: f(x)=x^2
Extreme point:   138.853

Test:
     H1: Inverse U shape
 vs. H0: Monotone or U shape

-------------------------------------------------
                 |   Lower bound      Upper bound
-----------------+-------------------------------
Interval         |           0              198
Slope            |    .4256187        -.1813002
t-value          |    3.384718        -.6482774
P>|t|            |    .0008667         .2604597
-------------------------------------------------

Overall test of presence of a Inverse U shape:
     t-value =      0.65
     P>|t|   =       .26

Array

However the tests did say I should include those and my r-squared goes down drastically when doing so.
Model 1 is a naive OLS model, 2 a fixed effectsmodel without trends and year dummies, 3 is a fixed effects model with year dummies, 4 a fixed effects model with year dummies and trends, 5 is the same as 4 but with SE clustered at the country-level.

Code:

 
(1)
(2)
(3)
(4)
(5)

VARIABLES
flfpr2554
flfpr2554
flfpr2554
flfpr2554
flfpr2554








moth_leave_div
16.1989***
42.5619***
-13.2572***
3.5156**
3.5156


(3.0404)
(3.5313)
(2.7618)
(1.5361)
(3.0979)

moth_leave_div_2
-7.3050***
-15.3262***
7.1199***
-0.3264
-0.3264


(1.4861)
(2.3920)
(1.6326)
(0.9886)
(1.7249)

Constant
46.6809***
52.0218***
47.3923***
44.7195***
44.7195***


(4.8106)
(1.1491)
(1.9779)
(0.9649)
(2.0572)








Observations
863
863
863
863
863

R-squared
0.3828
0.2037
0.7314
0.9618
0.9618

Specification
Quadratic
Quadratic
Quadratic
Quadratic
Quadratic

Method
LPM
FE
FE and year dummies
FE and year dummies
FE and year dummies

Controls
NO
NO
NO
NO
NO

Clustering
NO
NO
NO
NO
YES

Number of cou_1
37
37
37
37
37

Standard errors in parentheses






*** p<0.01, ** p<0.05, * p<0.1

When including controls (GDP, female unemployment rate, childcare costs, birth rate), the problem stays the same and I lose a lot of observations as the data is not available for the same nr. of countries/years.

What do you think when seeing this? I do not know which model to go for or how to explain these results.

Thank you in advance

↧

IV variable option in Stata program syntax

March 6, 2020, 6:46 am

≫ Next: Test to compare Positive and Negative Predictive Value?

≪ Previous: Determining best fe-model with paneldata (u-test)

Hello

I am trying to write a program to run a regression based on some criteria and I want to add the possibility of adding an instrumental variable. So I want my customized program "myreg" to work like this:

Code:

myreg Yvar X1var (X2var = Zvar)

Where X1var and X2var are independent variables and X2var is instrumented with Zvar. My question is, in the "syntax" command, how should I introduce the option of adding the (X2var = Zvar) part?

↧