Regression to examine the impact of monetary policy (MP) on bank performance in two different countries (panel data)

March 20, 2025, 6:57 am

≫ Next: Trouble with 'esttab' and stats() in probit model ... N remain blank in the Table

≪ Previous: GSEM in your Stata does not support ordinal dependent variables directly

Dear group,

I'm a newbie to Stata so please allow me to ask several things concerning my steps when running regression

Usually, within 1 country (let say Country A), i know how to run OLS, FEM, REM, GLS, and then GMM to examine the impact of Monetary Indicators, macro-economic and some bank determinant variables on bank performance (ROA-dependent variable), in which the dataset comprises 25 banks with 10 years timeframe (250 observations). I can easily check all the Collinearity, autocorrelation, and heteroskedasticity before running the regression to find the most suitable method of either OLS or REM or FEM or GLS or GMM.

However, my recent challenge arises when I combine the dataset of another country (Country B), which contains 30 banks and spans a 10-year timeframe, with the dataset of Country A (a total of 55 banks, 550 observations) to conduct an overall regression analysis. My study objective.is to examine the impact of monetary policy on bank performance in country A and Country B, and provide a comparative analysis. So, when doing the regression, i command" bysort country : reg.............." to see and compare the regression results between A and B. But, within this consolidated dataset, how can i test the Collinearity, autocorrelation, and heteroskedasticity among variables for each country? Or i need to create another separate dataset for only Country A and Country B to test? So in sum, i have three stata files? Is it reliable?

My last question is: "am I applying the right approach when using the command "by sort country...." to do the regression and comparison? Or you may have another approach? Could you please suggest the most popular one? I only have two countries, not many.

Please do me a big favour on this struggle.

Thank you very much for your kind help.

Regards

Huy Ngo

↧

Trouble with 'esttab' and stats() in probit model ... N remain blank in the Table

March 20, 2025, 9:05 am

≫ Next: P Chart

≪ Previous: Regression to examine the impact of monetary policy (MP) on bank performance in two different countries (panel data)

Hello everyone,

using Stata (14.1) I'm running three probit regressions which are only different in dependent variables. I'd like to present the marginal effects of interaction variables in each model in a single table with three side-by-side columns. The number of observations and the Chi-squared statistic for each model remain blank, even though I specify them in the esttab command.
I’d really appreciate any insights on how to fix this. Is sth wrong in my commands?—thanks in advance!

probit X1 $attitude $info $edu $country, vce(robust)
local pR2 = e(r2_p)
eststo margins_1: margins , dydx(*) post

probit X1 $attitude $info $edu $country c.env##i.income, vce(robust)
margins , dydx(env) at(income = (1 2 3 4)) post
estimates store het_income
eststo margins_2: xlincom income1vsincome2 = (2._at - 1._at) , post
estimates restore het_income
eststo margins_3: xlincom income1vsincome3 = (3._at - 1._at) , post
estimates restore het_income
eststo margins_4: xlincom income1vsincome4 = (4._at - 1._at) , post

running the same commands for X2 and X3

eststo outcome1: appendmodels margins_1 margins_2 margins_3 margins_4
eststo outcome2: appendmodels margins_5 margins_6 margins_7 margins_8
eststo outcome3: appendmodels margins_9 margins_10 margins_11 margins_12

esttab outcome1 outcome2 outcome3 using "C:\Users\.rtf", se(%4.3f) b(%4.3f) mtitles("x" "y" "z") stats(N chi2, labels("N" "Chi-squared") fmt(%4.0f %4.0f %4.2f)) star(* 0.10 ** 0.05 *** 0.01) varlabels(,elist(weight:_cons "{break}{hline u/width}")) nonotes addnotes("") compress replace

↧

P Chart

March 20, 2025, 1:06 pm

≫ Next: Obtaining current value of by-group variable

≪ Previous: Trouble with 'esttab' and stats() in probit model ... N remain blank in the Table

Hi!

I would like to create a p chart that recalculates at a designated point. Attaching an image to illustrate. I realize this wasn't created by Stata, but feels like something Stata should be able to.

Thanks

Array

↧

Obtaining current value of by-group variable

March 20, 2025, 3:49 pm

≫ Next: Obtaining current value of by-group variable

≪ Previous: P Chart

Within a by-able program, I'd like to use the current value of the by-group variable, assuming the simplest situation of a byable program with just one by-variable. I would have thought this would be stored in some macro, but I didn't find that in the documentation. The best I can come up with is the following, which seems clumsy:

Code:

program test, byable
marksample touse
levelsof `_byvars' if `touse', local(byval)
... now do whatever with `byval'
end

↧

Obtaining current value of by-group variable

March 20, 2025, 3:50 pm

≫ Next: XTPMG in stata /MP 18.0

≪ Previous: Obtaining current value of by-group variable

Code:

program test, byable
marksample touse
levelsof `_byvars' if `touse', local(byval)
... now do whatever with `byval'
end

↧

XTPMG in stata /MP 18.0

March 20, 2025, 8:52 pm

≫ Next: Stata: Handling Missing Data & Winsorizing Large Variables

≪ Previous: Obtaining current value of by-group variable

why this commend doesn't work
xtpmg d.lnci d(1/2).lnci d.lnreer d.ir d.inf d.lngdp dummy_time if quarter > tq(2015q1), ec(ec) lr(l.lnci lnreer ir inf lngdp) mg replace
invalid new variable name;
variable name ec is in the list of predictors
r(110);

whereas with pmg and dfe estimation it was fine, and failed when estimating mg

↧

Stata: Handling Missing Data & Winsorizing Large Variables

March 21, 2025, 5:01 am

≫ Next: Can I save the last graph's legend as a local?

≪ Previous: XTPMG in stata /MP 18.0

I have data on companies that report various employment and financial metrics. The dataset includes:

Staffing Information:
- Total staff
- Part-time staff
- Full-time staff
- Female staff
Financial Metrics:
- Monthly revenue
- Annual revenue
- Annual profit
- Revenue aspirations for the next year
- Profit aspirations for the next year
Customer Information:
- Current number of customers
- Expected number of customers in the next year

Missing Data Challenge

As expected, some firms have missing values across these variables. For example, a firm might report total staff but not specify female staff. Similarly, financial and customer-related variables have missing entries. Imputation & Winsorization Approach

To handle missing values:

Imputation Using Sector-Specific Medians:
- Instead of imputing using percentiles across all firms, I now replace missing values with the median within the same sector to ensure more realistic figures.
- This approach significantly reduces unrealistic observations.
Winsorization for Large Values:
- For large numerical variables (e.g., revenue, profit, customers), I winsorize at the 2.5% and 97.5% percentiles to cap extreme values while preserving overall trends.
- This is done after imputation to avoid distorting the sector-based replacements.

Remaining Issues

Initially, I was imputing missing values using a random number between the 25th and 75th percentiles, ignoring sector differences. This resulted in many illogical observations (e.g., firms with unrealistically high or low staff/customer numbers).

After switching to sector-specific median imputation, the number of unrealistic observations dropped significantly, but I still have about 3 problematic cases per variable. These include:

Firms reporting more female staff than total staff
Unrealistic revenue-to-customer ratios
Firms with profit aspirations far below current profits

Question

Given my current approach (sector-specific median imputation before winsorizing), what additional steps can I take to resolve the remaining inconsistencies? Would a different method (e.g., regression imputation) help?

Would appreciate any insights from the community!

↧

Can I save the last graph's legend as a local?

March 21, 2025, 5:02 am

≫ Next: Project / File management, workflow

≪ Previous: Stata: Handling Missing Data & Winsorizing Large Variables

Dear Stata users,

Suppose I'm producing two graphs, say plot1 and plot2. I want to suppress plot2's default legend, and use the legend of plot1 instead. So, can I save the legend of plot1 as a local? Codes would be something like:

Code:

twoway scatter weight price mpg, sort
local alegend ...
twoway whatever..., legend(`alegend')

↧

Project / File management, workflow

March 21, 2025, 6:45 am

≫ Next: character limitations of "view browse" command

≪ Previous: Can I save the last graph's legend as a local?

I am trying to use the cdsave and related programs from package workflow from https://jslsoc.sitehost.iu.edu/stata but I get the error message:

Code:

package name:  workflow.pkg
        from:  https://jslsoc.sitehost.iu.edu/stata/

checking workflow consistency and verifying not already installed...
file https://jslsoc.sitehost.iu.edu/stata/workflow/cddrop1.ado not found
could not copy https://jslsoc.sitehost.iu.edu/stata/workflow/cddrop1.ado
(no action taken)

How can I install these programs - or is there a more 'modern' alternative?

Eddy

↧

character limitations of "view browse" command

March 21, 2025, 6:46 am

≫ Next: Merge from a long format to a wide format

≪ Previous: Project / File management, workflow

The stata command

Code:

view browse "http://statalist.org"

opens the given url in the operating system's standard web browser.

However, when the given url is larger than 246 characters Stata (Version 18.0) doesn't do anything and doesn't produce any error message.

Code:

view browse "http://statalist.org/sssssssssss/ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss sssssssssssssssssssss"

Putting part of the url in a local, and accessing that local in the "view browse"-line, doesn't fix the problem.

Does anyone know how to fix this? Is this a Stata (intended/unintended) issue or a limitation in the system OS (Windows 11) or Browser (Firefox)?

Background: I am using an ado that retrieves values from a dataset and adds them as parameters to a url.

Stata output with "trace on" for the first command:

Code:

. view browse "http://statalist.org/ssssssssssssssssssss"

------------------------------------------------------------------------------------------------------------------------------------------------------------------------ begin _view_helper ---

- version 12

- syntax [anything(everything)] [, noNew name(name) *]

- if (index("`anything'"', "|") == 0) {`

= if (index("browse "http://statalist.org""', "|") == 0) {`

- if ("\new'" == "" | "`new'"=="new") & "`name'" == "" {`

= if ("" == "" | ""=="new") & "" == "" {

- local name _new

- }

- if ("\new'" == "nonew") & "`name'" == "" {`

= if ("" == "nonew") & "_new" == "" {

local name _nonew

}

- if "\name'" != "" {`

= if "_new" != "" {

- local suffix "##|\name'"`

= local suffix "##|_new"

- }

- }

- if "`anything'"' == "" {`

= if "browse "http://statalist.org""' == "" {`

local anything "help contents"

}

- if "`options'"' == "" {`

= if ""' == "" {`

- _view \anything'`suffix'`

= _view browse "http://statalist.org"##|_new

- }

- else {

_view \anything', `options' `suffix'`

}

. view browse "http://statalist.org/ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss sssssss

> ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss sssssssssssssssssssssssssssssssssssssss

> ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssss"

------------------------------------------------------------------------------------------------------------------------------------------------------------------------ begin _view_helper ---

- version 12

- syntax [anything(everything)] [, noNew name(name) *]

- if (index("`anything'"', "|") == 0) {`

= if (index("browse http://statalist.org/ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss `

> ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss sssssssssssssssssssssssssssssssssssssss

> ssssssssssssssssssssssssssssssssssssssssssssssssss sssssssssssssssssssssssssssssssssssssssssss""', "|") == 0) {

- if ("\new'" == "" | "`new'"=="new") & "`name'" == "" {`

= if ("" == "" | ""=="new") & "" == "" {

- local name _new

- }

- if ("\new'" == "nonew") & "`name'" == "" {`

= if ("" == "nonew") & "_new" == "" {

local name _nonew

}

- if "\name'" != "" {`

= if "_new" != "" {

- local suffix "##|\name'"`

= local suffix "##|_new"

- }

- }

- if "`anything'"' == "" {`

= if "browse "http://statalist.org/ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss sssssss`

> ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss sssssssssssssssssssssssssssssssssssssss

> ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssss""' == "" {

local anything "help contents"

}

- if "`options'"' == "" {`

= if ""' == "" {`

- _view \anything'`suffix'`

= _view browse "http://statalist.org/ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssss

> ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssssssssssssssssssssssssssssssssssssssssssss sssssssssssssssssssssssssssssssssssssss

> ssssssssssssssssssssssssssssssssssssssssssssssssss sssssssssssssssssssssssssssssssssssss"##|_new

- }

- else {

_view \anything', `options' `suffix'`

}

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- end _view_helper ---

↧

Merge from a long format to a wide format

March 21, 2025, 7:17 am

≫ Next: Using IV

≪ Previous: character limitations of "view browse" command

I have two datasets. One dataset has an individual identifier , and some of the variables in that dataset points to an identifier in another dataset. I want to get the entries from the second dataset as variables in the first dataset. These are the data examples

DATASET 1:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str15(prim_key hhid) byte stateid int ssuid byte(fs203_coreside_child1 fs203_coreside_child2 fs203_coreside_child3 fs203_coreside_child4 fs203_coreside_child5 fs203_coreside_child6)
"101000100040101" "101000100040100" 1 1 .e .e .e  .  .  .
"101000100040102" "101000100040100" 1 1 .e .e .e  .  .  .
"101000100130101" "101000100130100" 1 1  3 .e .e .e  6  7
"101000100130102" "101000100130100" 1 1  3 .e .e .e  6  7
"101000100130109" "101000100130100" 1 1 .e  .  .  .  .  .
"101000100250106" "101000100250100" 1 1 .e .e .e .e  1  .
"101000100320101" "101000100320100" 1 1 .e  3  6  7  8  .
"101000100320102" "101000100320100" 1 1 .e  3  6  7  8  .
"101000100320109" "101000100320100" 1 1  1 .e .e .e  .  .
"101000100370102" "101000100370100" 1 1  3  4  5  .  .  .
"101000100370111" "101000100370100" 1 1 .e  1 .e  6  .  .
"101000100590101" "101000100590100" 1 1 .e .e  3 .e  4  8
"101000100590102" "101000100590100" 1 1 .e .e  3 .e  4  8
"101000100760101" "101000100760100" 1 1  4  5  6  .  .  .
"101000100760102" "101000100760100" 1 1 .e  3 .e  4  5  6
"101000101040101" "101000101040100" 1 1 .e .e .e .e  3 .e
"101000101040102" "101000101040100" 1 1 .e .e .e .e  3 .e
"101000101330101" "101000101330100" 1 1  .  .  .  .  .  .
"101000101720101" "101000101720100" 1 1 .e  2 .e  5  6  7
"101000101890107" "101000101890100" 1 1 .e .e  1  6 .e  .
"101000102490101" "101000102490100" 1 1 .e .e  3 .e  .  .
"101000102490102" "101000102490100" 1 1 .e .e  3 .e  .  .
"101000103350101" "101000103350100" 1 1 .e  3  4  5  6  7
"101000103350102" "101000103350100" 1 1 .e  3  4  5  6  7
"101000103550101" "101000103550100" 1 1  2 .e  .  .  .  .
"101000103550102" "101000103550100" 1 1  3  5  6  7  8  .
"101000103690101" "101000103690100" 1 1 .e  3 .e .e  .  .
"101000103690102" "101000103690100" 1 1 .e  3 .e .e  .  .
"101000104030101" "101000104030100" 1 1  .  .  .  .  .  .
"101000104030102" "101000104030100" 1 1  3 .e  4  5  6  .
"101000104240101" "101000104240100" 1 1  3 .e .e  8  .  .
"101000104240102" "101000104240100" 1 1  3 .e .e  8  .  .
"101000200050101" "101000200050100" 1 2 .e .e .e  3 .e .e
"101000200050102" "101000200050100" 1 2 .e .e .e  3 .e .e
"101000200160109" "101000200160100" 1 2  2  .  .  .  .  .
"101000200210101" "101000200210100" 1 2 .e .e .e  3 .e  4
"101000200210102" "101000200210100" 1 2 .e .e .e  3 .e  4
"101000200210108" "101000200210100" 1 2 .e .e  1 .e .e .e
"101000200270101" "101000200270100" 1 2  2  3  4  5  6  .
"101000200290108" "101000200290100" 1 2 .e .e .e  1 .e .e
"101000200290109" "101000200290100" 1 2  .  .  .  .  .  .
"101000200340101" "101000200340100" 1 2  3  4  .  .  .  .
"101000200340102" "101000200340100" 1 2  3  4  .  .  .  .
"101000200470101" "101000200470100" 1 2 .e .e .e .e  3 .e
"101000200470102" "101000200470100" 1 2 .e .e .e .e .e  3
"101000200740101" "101000200740100" 1 2  2  .  .  .  .  .
"101000200860102" "101000200860100" 1 2  3  4  5  6  7  8
"101000201190106" "101000201190100" 1 2 .e .e .e .e .e .e
"101000201360101" "101000201360100" 1 2 .e .e .e .e .e  3
"101000201360102" "101000201360100" 1 2 .e .e .e .e .e  3
"101000201360103" "101000201360100" 1 2  .  .  .  .  .  .
"101000201560108" "101000201560100" 1 2 .e .e .e .e .e  1
"101000201670110" "101000201670100" 1 2  .  .  .  .  .  .
"101000202130102" "101000202130100" 1 2 .e .e  3  4  5  6
"101000202440101" "101000202440100" 1 2 .e .e .e  3  .  .
"101000202440102" "101000202440100" 1 2 .e .e .e  3  .  .
"101000203010101" "101000203010100" 1 2 .e .e  3 .e  5 .e
"101000203010102" "101000203010100" 1 2 .e .e  3 .e  5 .e
"101000203460101" "101000203460100" 1 2 .e .e  3  4  5  .
"101000203460102" "101000203460100" 1 2  .  .  .  .  .  .
"101000300030201" "101000300030200" 1 3 .e .e .e  4  5  .
"101000300030202" "101000300030200" 1 3 .e  3 .e  4  5  .
"101000300070105" "101000300070100" 1 3 .e .e  2 .e  .  .
"101000300140106" "101000300140100" 1 3 .e .e  1 .e .e .e
"101000300220101" "101000300220100" 1 3  6  2 .e  3  4  5
"101000300260101" "101000300260100" 1 3  3  4  5  6  7  8
"101000300260102" "101000300260100" 1 3  3  4  5  6  7  8
"101000300380205" "101000300380200" 1 3 .e .e .e .e .e  1
"101000300550105" "101000300550100" 1 3 .e .e .e  .  .  .
"101000300920101" "101000300920100" 1 3 .e  3  4  5  .  .
"101000300920102" "101000300920100" 1 3 .e  3  4  5  .  .
"101000301080101" "101000301080100" 1 3 .e  3  .  .  .  .
"101000301080102" "101000301080100" 1 3 .e  3  .  .  .  .
"101000301450201" "101000301450200" 1 3 .e .e .e .e  3  .
"101000301450202" "101000301450200" 1 3 .e .e .e .e  3  .
"101000301720106" "101000301720100" 1 3 .e .e .e  1  .  .
"101000301840101" "101000301840100" 1 3 .e .e  3  4 .e  .
"101000301840102" "101000301840100" 1 3 .e .e .e .e .e  .
"101000302140101" "101000302140100" 1 3 .e  3  4  .  .  .
"101000302140102" "101000302140100" 1 3 .e  3  4  .  .  .
"101000302250106" "101000302250100" 1 3  .  .  .  .  .  .
"101000302720104" "101000302720100" 1 3 .e  1 .e .e  .  .
"101000303240201" "101000303240200" 1 3 .e .e  2  3  .  .
"101000400040101" "101000400040100" 1 4 .e .e  3 .e  .  .
"101000400040102" "101000400040100" 1 4 .e .e  3 .e  .  .
"101000400070107" "101000400070100" 1 4 .e .e .e  1 .e .e
"101000400070108" "101000400070100" 1 4 .e .e .e .e .e .e
"101000400120101" "101000400120100" 1 4 .e  3  6  7  8  .
"101000400120102" "101000400120100" 1 4 .e  3  6  7  8  .
"101000400120109" "101000400120100" 1 4  1 .e  .  .  .  .
"101000400140101" "101000400140100" 1 4  3 .e .e  5  6 .e
"101000400140102" "101000400140100" 1 4  3 .e .e  5  6 .e
"101000400230102" "101000400230100" 1 4  3 .e  8  9 10  .
"101000400230111" "101000400230100" 1 4  1 .e .e .e .e .e
"101000400430201" "101000400430200" 1 4 .e .e .e .e  3  5
"101000400430202" "101000400430200" 1 4 .e .e .e .e  3  5
"101000400610101" "101000400610100" 1 4  3  9 .e 11  .  .
"101000400610102" "101000400610100" 1 4  3 .e  9 11  .  .
"101000400930101" "101000400930100" 1 4 .e  3  4  5  6  7
"101000400930102" "101000400930100" 1 4 .e  3  4  5  6  7
end
label values stateid stateid_cv
label def stateid_cv 1 "1 Jammu and Kashmir", modify
label values fs203_coreside_child1 _vl8884_ind
label values fs203_coreside_child2 _vl8885_ind
label values fs203_coreside_child3 _vl8886_ind
label values fs203_coreside_child4 _vl8887_ind
label values fs203_coreside_child5 _vl8888_ind
label values fs203_coreside_child6 _vl8889_ind

DATASET 2:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str15(prim_key hhid) byte(cv007 cv008 cv009 cv010)
"101000100040201" "101000100040200" 3 1  8 3
"101000100040202" "101000100040200" 3 1  9 3
"101000100040203" "101000100040200" 4 1  0 .
"101000100040204" "101000100040200" . .  . .
"101000100040205" "101000100040200" . .  . .
"101000100130103" "101000100130100" 3 1 15 7
"101000100130104" "101000100130100" 3 1 12 5
"101000100130105" "101000100130100" . .  . .
"101000100130106" "101000100130100" 3 1 12 5 
"101000100130107" "101000100130100" 3 1 12 5
"101000100130108" "101000100130100" 3 1  9 3
"101000100250101" "101000100250100" 3 1 10 4
"101000100250102" "101000100250100" 3 1 15 7
"101000100250103" "101000100250100" 4 1  0 .
"101000100250104" "101000100250100" . .  . .
"101000100250105" "101000100250100" . .  . .
"101000100320103" "101000100320100" 3 1 12 5
"101000100320104" "101000100320100" 3 1  9 3
"101000100320105" "101000100320100" . .  . .
"101000100320106" "101000100320100" 3 1 11 4
"101000100320107" "101000100320100" 3 1 10 4
"101000100320108" "101000100320100" 3 1 12 4
"101000100370101" "101000100370100" 3 1  1 5
"101000100370103" "101000100370100" 3 1 17 8
"101000100370104" "101000100370100" 3 1 16 5
"101000100370105" "101000100370100" 3 1 11 4
"101000100370106" "101000100370100" 3 1 15 7
"101000100370107" "101000100370100" 3 1 15 5
"101000100370108" "101000100370100" 4 1  1 1
"101000100370109" "101000100370100" . .  . .
"101000100370110" "101000100370100" . .  . .
"101000100590103" "101000100590100" 4 2  . .
"101000100590104" "101000100590100" 3 1  8 3
"101000100590105" "101000100590100" 3 1  6 2
"101000100590106" "101000100590100" . .  . .
"101000100590107" "101000100590100" . .  . .
"101000100590108" "101000100590100" 4 2  . .
"101000100760103" "101000100760100" 3 1  9 3
"101000100760104" "101000100760100" 3 1  9 3
"101000100760105" "101000100760100" 3 1  4 1
"101000100760106" "101000100760100" 3 1  6 2
"101000100760107" "101000100760100" 4 2  . .
"101000101040103" "101000101040100" 3 1 10 4
"101000101040104" "101000101040100" 3 1 10 3
"101000101040105" "101000101040100" 3 1  8 3
"101000101040106" "101000101040100" 3 1  7 2
"101000101040107" "101000101040100" 3 1  4 2
"101000101040108" "101000101040100" 3 1  2 1
"101000101040109" "101000101040100" . .  . .
"101000101040110" "101000101040100" . .  . .
"101000101190201" "101000101190200" 3 1  9 3
"101000101190202" "101000101190200" 3 1 12 5
"101000101190203" "101000101190200" . .  . .
"101000101190204" "101000101190200" . .  . .
"101000101330102" "101000101330100" 4 2  . .
"101000101330103" "101000101330100" 3 1  9 3
"101000101330104" "101000101330100" 3 1  4 1
"101000101330105" "101000101330100" 1 1  1 1
"101000101330106" "101000101330100" . .  . .
"101000101490101" "101000101490100" 3 1 10 4
"101000101490102" "101000101490100" 3 1  8 3
"101000101490103" "101000101490100" 3 1  6 2
"101000101490104" "101000101490100" 3 1  4 1
"101000101490105" "101000101490100" 3 1  4 1
"101000101720102" "101000101720100" 3 1 10 4
"101000101720103" "101000101720100" 3 1 10 4
"101000101720104" "101000101720100" . .  . .
"101000101720105" "101000101720100" 3 1  8 3
"101000101720106" "101000101720100" 3 1  8 3
"101000101720107" "101000101720100" 3 1  9 3
"101000101720108" "101000101720100" 3 1  8 3
"101000101890101" "101000101890100" 3 1 17 9
"101000101890102" "101000101890100" 3 1 13 5
"101000101890103" "101000101890100" 3 1  2 1
"101000101890104" "101000101890100" 4 1  0 .
"101000101890105" "101000101890100" . .  . .
"101000101890106" "101000101890100" 4 2  . .
"101000102090101" "101000102090100" 3 1 10 4
"101000102090102" "101000102090100" 3 1 10 4
"101000102090103" "101000102090100" 4 1  2 1
"101000102090104" "101000102090100" 4 1  1 1
"101000102090105" "101000102090100" . .  . .
"101000102210101" "101000102210100" 3 1  9 3
"101000102210102" "101000102210100" 3 1  9 3
"101000102210103" "101000102210100" . .  . .
"101000102210104" "101000102210100" . .  . .
"101000102210105" "101000102210100" . .  . .
"101000102490103" "101000102490100" 3 1 11 4
"101000102660101" "101000102660100" 3 1 10 3
"101000102660102" "101000102660100" 3 1 10 3
"101000102660103" "101000102660100" 4 1  1 1
"101000102660104" "101000102660100" . .  . .
"101000102660105" "101000102660100" . .  . .
"101000102870201" "101000102870200" 3 1 15 8
"101000102870202" "101000102870200" 3 1 12 5
"101000102870203" "101000102870200" 4 1  1 1
"101000102870204" "101000102870200" . .  . .
"101000102870205" "101000102870200" . .  . .
"101000102970201" "101000102970200" 3 1  9 3
"101000102970202" "101000102970200" 3 1 10 4
end
label values cv007 lasi_vl216_cv
label def lasi_vl216_cv 1 "1 Can read only", modify
label def lasi_vl216_cv 3 "3 Can both read and write", modify
label def lasi_vl216_cv 4 "4 Cannot read or write", modify
label values cv008 lasi_vl251_cv
label def lasi_vl251_cv 1 "1 Yes", modify
label def lasi_vl251_cv 2 "2 No", modify
label values cv010 lasi_vl321_cv
label def lasi_vl321_cv 1 "1 Less than Primary school (Standard 1-4)", modify
label def lasi_vl321_cv 2 "2 Primary school completed (Standard 5-7)", modify
label def lasi_vl321_cv 3 "3 Middle school completed (Standard 8- 9)", modify
label def lasi_vl321_cv 4 "4 Secondary school/Matriculation completed", modify
label def lasi_vl321_cv 5 "5 Higher secondary/intermediate/senior secondary completed", modify
label def lasi_vl321_cv 7 "7 Graduate degree (B.A., B.Sc., B. Com.) completed", modify
label def lasi_vl321_cv 8 "8 Post-graduate degree or (M.A., M.Sc., M. Com.) above (M.Phil, Ph.D., Post-Doc) completed", modify
label def lasi_vl321_cv 9 "9 Professional course/degree (B.Ed, BE, B.Tech, MBBS, BHMS, BAMS, B.Pharm, BCS, BCA, BBA, LLB) (BVSc., B. Arch, M.Ed, ME, M.Tech, MD, M.Pharm, MCS, MCA, MBA,LLM, MVSc., M. Arch, MS, CA, CS, CWA) completed", modify

so the fs203_coreside_child* variables in the first dataset gives me the last digits of the prim_key of the second dataset. I want the cv007 variable values as separate variables for the different prim_keys that the coreside variables give me. Eg 101000100130101 has a fs203_coreside_child5 value of 6 in first dataset, means that in the second dataset, 101000100130106 is my target observation, and I want the cv007 value of that observation as a variable (say cv007_child5) which takes that value . Similarly, the same 101000100130101 prim_key has fs203_coreside_child6 value of 7 in first dataset, so 101000100130107 in the second dataset is the target observation, and the cv007 value from there goes to another variable (saycv007_child6). This needs to be done for all prim_keys for the first dataset.

I guess there is a reshape problem here, but I am clueless as to how to approach this problem. Any help would be appreciated.

↧

Using IV

March 21, 2025, 9:35 am

≫ Next: Count Frequency of two observations are the same and equal to a certain value

≪ Previous: Merge from a long format to a wide format

Hi,

I need some advice on using instrumental variables (IV). In my case, I have three first-stage regressions using probit models, and the second-stage regression is an OLS.

Could you please guide me on how to perform this correctly?

Thank you!

↧

Count Frequency of two observations are the same and equal to a certain value

March 21, 2025, 10:23 am

≫ Next: Lag length selection for panel unit root tests (unbalanced)

≪ Previous: Using IV

Hello, I am trying to count the frequency with which two people who are paired up and both choose the number 6 in q.
In the example below, subject 73 and subject 116 are paired up for 10 rounds. I generated variable y to indicate that they are paired up. I would like to find the number of the times they both chose 6 in variable q. For example in round 2, 3, 7, 8, 9. How can I generate a dummy variable indicating this is the case please?

Many thanks!

*Example generated by -dataex-. For more info, type help dataex
clear
input float id byte(round id_in_group_cournot q) int y
73 1 1 7 1
116 1 2 6 1
73 2 1 6 2
116 2 2 6 2
73 3 1 6 3
116 3 2 6 3
73 4 1 6 4
116 4 2 7 4
73 5 1 7 5
116 5 2 6 5
73 6 1 6 6
116 6 2 7 6
73 7 1 6 7
116 7 2 6 7
73 8 1 6 8
116 8 2 6 8
73 9 1 6 9
116 9 2 6 9
73 10 1 7 10
116 10 2 8 10
end

↧

Lag length selection for panel unit root tests (unbalanced)

March 21, 2025, 5:24 pm

≫ Next: Instrumental Variables Decomposition

≪ Previous: Count Frequency of two observations are the same and equal to a certain value

Hi all. I am testing for unit roots in my unbalanced panel using:

Code:

 
 xtunitroot fisher varname, dfuller lags(0)

And was wondering if anyone had advice on how to decide lag length. If it helps, my T is 30 and N 17. Thank you.

↧

Instrumental Variables Decomposition

March 21, 2025, 9:59 pm

≫ Next: Loss of observations when using eregress/heckman but not when using probit - differences in first stage

≪ Previous: Lag length selection for panel unit root tests (unbalanced)

Hello Statlisters,

I need clarification regarding an IV-Decomposition model. I have separately estimated an Instrumental Variables (IV) regression, and now I want to decompose the results using the Oaxaca-Blinder technique to examine wage differentials between men and women.

To estimate the IV-Decomposition, I first obtained the predicted values from the IV regression and used "school_hat" in the Oaxaca-Blinder decomposition as such;

Ivregress 2sls logwage (school=ube_north ube_south) age agesq p_educ educ_qual female, vce (cluster hhid) first
Predict school_hat, xb
Oaxaca logwage school_hat age agesq p_educ educ_qual, by(female) vce(cluster hhid)

However, I have concerns regarding the sample size. I expected the sample size to remain the same as in the original IV estimation (5,400 obs). However, the IV-Decomposition appears to capture the full sample size (8,000 obs). I suspect this discrepancy is because I grouped the Oaxaca-Blinder model by female which is an explanatory variable in the original IV model. Could this be introducing additional observations that were not included in the IV estimation?

Despite this, the results appear statistically and economically reasonable. However, I am concerned about the sample size. Am I missing something in how the sample is selected for Oaxaca-Blinder decomposition after IV estimation? Any insights into why this is happening and whether it affects the validity of my results would be greatly appreciated.

Many Thanks

↧

Loss of observations when using eregress/heckman but not when using probit - differences in first stage

March 22, 2025, 2:08 am

≫ Next: ib. operator won't omit correct category

≪ Previous: Instrumental Variables Decomposition

Hi all,

I've been searching this forum furiously, consulting the Stata manuals, etc., but cannot find an answer to this question. Something is happening under the hood of the commands heckman/eregress in my selection model that is causing a loss of observations but I cannot figure out what it is.

The problem in a nutshell: I have a heckman selection model that I can replicate in eregress with a selection equation. The selection equation is instances of political violence, where I want to model if political violence occurs in a place or not, so Violence = set of covariates x1-x8. Then this chooses the instances of political violence for the second stage, where I put it against a financial indicator.

When I do this, I receive 985 observations in the probit, which then results in a selected n of about 231.

HOWEVER, when I do the heckman "by hand," running the exact same probit (Violence = x1-x8) it gives me 104 more observations. This of course changes the second-stage considerably when I run OLS by hand.

I have pared down the covariates to the absolute minimum, still a loss of observations between heckman/eregress and probit.

I have summarized the variables and they all have similar availability.

I have tried everything I could think of but cannot figure out what is going on under the hood of heckman/eregress to drop 104 observations consistently that are retained in probit. Is there any diagnostic I can run (I've already studied the first stage of the heckman to death) to figure out which observations are dropped and why?

Thanks!!!

↧

ib. operator won't omit correct category

March 22, 2025, 8:14 am

≫ Next: Breusch-Pagan test for heteroskedasticity

≪ Previous: Loss of observations when using eregress/heckman but not when using probit - differences in first stage

I am trying to run a difference-in-difference regression and see the coefficient relative to the control group before time of treatment. This DID has three categories: RGGI, Leaker, and Control.

> reghdfe log_netgen b1.category_num#i.after_RGGI, absorb(plantstate obsyear) vce(cluster plantstate) where category_num represents a type of state (RGGI, Leaker, or Control) and after_RGGI is a dummy variable where 1 means the date in the data is after 2009.

My aim is to see the coefficients for Leaker#1 and RGGI#1, so I specified b1 as the base category, as 1 = Control for my category_num variable. Stata gives the following output:

HDFE Linear regression Number of obs = 183,543
Absorbing 2 HDFE groups F( 2, 50) = 13.59
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.0364
Adj R-squared = 0.0360
Within R-sq. = 0.0011
Number of clusters (plantstate_num) = 51 Root MSE = 3.1621

(Std. err. adjusted for 51 clusters in plantstate_num)
---------------------------------------------------------------------------------------------
| Robust
log_netgen | Coefficient std. err. t P>|t| [95% conf. interval]
----------------------------+----------------------------------------------------------------
category_numeric#after_RGGI |
Control#1 | .6075423 .1186053 5.12 0.000 .3693165 .8457681
Leaker#0 | -.697489 .2695983 -2.59 0.013 -1.238993 -.1559848
Leaker#1 | 0 (omitted)
RGGI#0 | 0 (omitted)
RGGI#1 | 0 (omitted)
|
_cons | 9.601487 .0604883 158.73 0.000 9.479993 9.722982
---------------------------------------------------------------------------------------------

Why is the regression omitting Leaker#1, RGGI#0, and RGGI#1 instead of omitting Control and 0 (since i.after_RGGI would usually make 0 the base category)

Basically, how can I make my regression output give me RGGI#1 and Leaker #1, omitting all other combinations? Thank you!

↧

Breusch-Pagan test for heteroskedasticity

March 22, 2025, 11:35 am

≫ Next: Selection bias

≪ Previous: ib. operator won't omit correct category

What is the long way to do the Breusch-Pagan test for heteroskedasticity.
I know of "hettest" but i've seen theres another way by obtaining the squared residuals after estimating the model by OLS (Don't know what comes after)

Any help is appreciated,thanks

↧

Selection bias

March 22, 2025, 2:11 pm

≫ Next: find previous nonmissing value by group

≪ Previous: Breusch-Pagan test for heteroskedasticity

Hi everyone,

I'm a beginner in econometrics and I'm working on a study examining the association between CO₂ exposure and infant mortality using a pooled cross-sectional dataset, please advise me if my question is flawed..
As my first study,
I run a regression using reghdfe with fixed effects (assuming the exogeneity of CO₂ exposure) to estimate infant mortality. This stage includes several controls and fixed effects (e.g., country or year) to account for unobserved heterogeneity. I receive significant results in this.
As the second study,
I then regress CO₂ exposure on the weight-for-age of children. Here, I assume that the unobservables (error term) from the infant mortality regression drive the selection of children (i.e., only survivors are observed in the second stage).
As I understand, by conditioning on survival, the sample for the second study is selected in a non random manner.
I attempted to address this using a Heckman selection model, but I'm finding it extremely difficult to construct a valid instrument that affects survival (the selection process) without directly affecting weight-for-age.
Are there alternative methods or strategies you can recommend to address or rationalize selection bias in this context especially when a valid instrument for the Heckman model is hard to come by?
Is there any way to mathematically model the bias and show how much my estimates shift due to bias..
I’d appreciate any insights, alternative suggestions, or relevant literature that could help me move forward.
Thanks in advance for your help!

↧

find previous nonmissing value by group

March 22, 2025, 2:22 pm

≫ Next: Is this model only suitable for DiD and not fixed effects model?

≪ Previous: Selection bias

I have a dataset with: i) date, ii) firm, and iii) price. Sometimes price is missing. I need the previous price when there was a (nonmissing) price but am having difficulties locating it as the windows of missing obs are random. I tried searching and found some related posts, but could not get their code to work. Also, it was never the same exact situation as mine.

Code:

clear
input float date float firm float price
1 1 4
2 1 .
3 1 6
4 1 .
5 1 .
6 1 .
7 1 4
8 1 .
9 1 .
10 1 6
1 2 .
2 2 5
3 2 4
4 2 .
5 2 .
6 2 .
7 2 6
8 2 .
9 2 .
10 2 .
end

On the days when there is a price, i would like to include on the same row/observation the last nonmissing previous price as well as the gap (in dates) from the current and the previous price. And do all of this within group (firm).

The solution would look like:

Code:

clear
input float date float firm float price float lastprice float dategap
1 1 4 . .
2 1 . . .
3 1 6 4 2
4 1 . . .
5 1 . . .
6 1 . . .
7 1 4 6 4
8 1 . . .
9 1 . . .
10 1 6 6 3
1 2 . . .
2 2 5 . .
3 2 4 5 1
4 2 . . .
5 2 . . .
6 2 . . .
7 2 6 4 4
8 2 . . .
9 2 3 6 2
10 2 . . .
end

Thank you in advance.

↧