Are marginal structural models used for repeated/panel data?

July 13, 2018, 7:45 pm

I've been studying this paper from the Stata journal by Fewell et al on implementing MSM (https://ageconsearch.umn.edu/bitstre...art_st0075.pdf). I've only found reference material for marginal structural models in the context of time-to-event analyses. Most published studies in epidemiology and social sciences seem to apply it in Cox/pooled logistic models.

What about panel / longitudinal repeated data: Can the principals of MSMs be applied to such mixed models or GEE, and are there any examples of it in the literature?

↧

How to draw a graph like this?

July 13, 2018, 8:03 pm

≫ Next: Trying to merge two datasets that have already been merge and getting error 101 (factor-variable and time-series operators not allowed)

≪ Previous: Are marginal structural models used for repeated/panel data?

Dear All, Does anyone know how to draw a graph like this.Array
However, no data available at this moment.

↧

Trying to merge two datasets that have already been merge and getting error 101 (factor-variable and time-series operators not allowed)

July 13, 2018, 10:20 pm

≫ Next: Identifying variables not present in multiple datasets and then creating them

≪ Previous: How to draw a graph like this?

Hello Everyone,

I am trying to merge two different datasets. Each dataset has two columns (example dataset 1: countryID and fdi_out_1970m.dta , example dataset 2: countryID and aid_out_1970m.dta).
The dataset with aid_out_1970m.dta has 12 more variables than fdi_out_1970m.dta because it includes the summary of different world region and fdi_out_1970m.dta just has the country names and no regions.

I previously merged the countryID with aid_out_1970 and fdi_out_1970 and got the results (fdi_out_1970m.dta and aid_out_1970m.dta). Now I am trying to merge together aid_out_1970m.dta with fdi_out_1970m.dta. But every time I try to merge the two datasets together I get factor-variable and time-series operators not allowed.

Here is the code I tried:

use "/Users/an-nourazen-nabcompaore/Dropbox/AnNoura/build/code/STEP 2 AID MERGE/aid_out_1970m.dta", clear
generate id = _n
tempfile aid
use "/Users/an-nourazen-nabcompaore/Dropbox/AnNoura/build/code/STEP 2 FDI MERGE/fdi_out_1970m.dta", clear
generate id = _n
merge 1:1 aid_out_1970m.dta using fdi_out_1970m.dta

I also previously tried this code:

merge 1:1 "/Users/an-nourazen-nabcompaore/Dropbox/AnNoura/build/code/STEP 2 AID MERGE/aid_out_1970m.dta" using /Users/an-nourazen-nabcompaore/Dropbox/AnNoura/build/code/STEP 2 FDI MERGE/fdi_out_1970m.dta, clear

ANY HELP WOULD BE MORE THAN APPRECIATED! THANK YOU FOR YOUR TIME

↧

Identifying variables not present in multiple datasets and then creating them

July 13, 2018, 10:56 pm

≫ Next: Escape compound double quotes that occur in a macro?

≪ Previous: Trying to merge two datasets that have already been merge and getting error 101 (factor-variable and time-series operators not allowed)

Hi all,

I have 100 datasets, each with a similar set of variables var1 to var60. I would like to write a generic recode do file using a loop to recode var1 to var60 in each dataset to create new datasets with variables newvar1 to newvar5. I would then be able to append these 100 datasets together for my analysis. The loop is fine, but the problem is that some datasets are missing one of the original variables, so my generic recode do file does not run. e.g.

egen newvar1 = rowtotal(var1 var 2 var3)

But if var2 is missing from one of the datasets, the code stops with the r(111) variable var2 not found error. I wanted to include in the loop a first step to identify variables that are not present, then create the variables with missing values in the original dataset, so that the code can run through. e.g.

gen var2=.
egen newvar1 = rowtotal(var1 var 2 var3)

I got as far as using lookfor to identify the variables present in each dataset, but I'm not sure how to return a varlist containing the variables that are not present, and then to use this varlist returned to create them.

Can anyone help with this? Or suggest another way of doing it?

Thanks,

Sonia

↧

Escape compound double quotes that occur in a macro?

July 13, 2018, 11:38 pm

≫ Next: twoway lfit - model with covariates

≪ Previous: Identifying variables not present in multiple datasets and then creating them

Hello,

I have some code that uses the file command to read in lines from a text file, perform some cleaning, and write them back out. The problem I am running into is that occasionally the line contains the characters "' , which Stata of course interprets as a closing compound double quote. When I try to output the line with file write `"`line'"' _n, the closing compound quote within the macro prematurely terminates the quoted string and the remainder of the line triggers a syntax error. The following code illustrates the problem:

Code:

local line = char(34) + char(39) + char(34)
macro list _line
file open test using test.txt, text write replace
file write test `"`line'"' _n
file close test

The file write `"`line'"' _n command in the above code triggers an invalid syntax r(198) error.

What I would ideally like here is some way of telling Stata to ignore any quotes that happen to occur within the macro when determining where the line ends. (Something kind of like macval() but for preventing interpretation of quotes rather than macros.) Does such a thing exist? Can anyone think of a good workaround for this problem?

Thanks very much!

↧

twoway lfit - model with covariates

July 14, 2018, 12:16 am

≫ Next: Post-stratification weights, calibrated weights, and sampling desing weights: How to combine them?

≪ Previous: Escape compound double quotes that occur in a macro?

Hello everyone,
I hope I could use your help with the command twoway lfit.
I am estimating the following model:

reg y m d d*m X

where y and m are two continuous variables, d is a binary indicator =1 if m>0 (0 otherwise), d*m is an interaction term and X includes several covariates.
I run the following command to get the graph of the regression line:

two (scatter y m if !d) (scatter y m if d) (lfit y m if !d) (lfit y m if d)

but I believe lfit is graphing the regression line from the model without controls. Is there a way to tell lfit to take into account the covariates? I tried the alternative command binscatter but it didn't really capture the exact relation between y and m I have estimated.

Regards,

Egidio

↧

Post-stratification weights, calibrated weights, and sampling desing weights: How to combine them?

July 14, 2018, 1:08 am

≫ Next: How to add lines to scatter graph

≪ Previous: twoway lfit - model with covariates

Dear all,

I want to calculate weighted means of variable x and don't know how to combine the weights provided in the data set with post-stratification weights that I calculated on my own.

I am working with cross-sectional individual-level survey data in Stata 15.

The data set comes with two different weights: (i) a sampling design weight that account for unequal selection probabilities of the sample units (inverse of the probability to be in the sample) and (ii) calibrated weights that also consider calibration margins based on gender and regions.

Because the age distribution in the sample is not the same as the age distribution in the population, I want to further apply post-stratification weights considering the age structure (in addition to gender and region) when calculating the weighted means of x.

I know that I could calculate post-stratification weights by dividing the share of each gender-region-age group in the population (N) by the share of the same gender-region-age group in the sample (n) and then use these weights as pweights (pweight = N/n) when calculating means.

My question is: How do I combine these weights with the calibrated weights provided in the sample? Or do I need to combine them with the sampling design weights somehow?

I do have information on strata, psu and ssu - but (i) this information is missing for 1/3 of my observations and (ii) I do not know how this information relates to my problem. The information on the share of gender-region-age groups in the population (N) comes from census data.

I know this is a very specific problem, but if you could at least lead me to some applied readings on "combining sampling design weights with post-stratification weights", I would be very grateful.

Best regards,
Stephanie

*----------------------------------------------------------

I also tried the following:

1. Collapse the dataset using the calibrated weights provided in the dataset:

Code:

 collapse (mean) x  [pweight = calibrated weight] , by(age)

2. Merge shares of age groups from census data

3. Calculate means manually

Code:


gen help = x * N //  e.g. mean of x in age group 30-34 * the share of 30-34-year-olds in the population
egen x_mean = total(help)
drop help

But (i) I am not sure if this is a valid option and (ii) it makes it hard to compare the means with and without the post-stratification weights so I am not that happy with this approach.

↧

How to add lines to scatter graph

July 14, 2018, 3:57 am

≫ Next: Adding multiply imputed data using Rubin's rules into registered multiple imputation variables

≪ Previous: Post-stratification weights, calibrated weights, and sampling desing weights: How to combine them?

Hello,

I would like to make twoway graph showing fited value and scatter plots with additional upper and lower lines for fitline that shows proportion of plot points located near to the fit line. What should I add to the following command so that there will be upper and lower lines as shown in the example graph? I do not want to have CIs to the fit line. I just need the two line below and above the CIs

graph twoway (lfit y x) (scatter y x)

↧

Adding multiply imputed data using Rubin's rules into registered multiple imputation variables

July 14, 2018, 4:01 am

≫ Next: How can I include year-fixed effects into monthly panel data?

≪ Previous: How to add lines to scatter graph

Hello,

I am currently performing a survival analysis project for melanoma (a form of skin cancer). I am reasonably new to Stata having only started using in past 4 months.
I have been using a Cox proportional hazard model thus far in my analyses.
Within the dataset of approximately 3,600 observations there are up to 20% missing variables.
I have explored exclusion and other missing variable methods however too many of my failures would be lost for my analysis (currently total 400 failures which are melanoma specific deaths)
I have ended up choosing the utility of multiple imputation using chained equations (MICE) given that some of the key prognostic variables are not normally distributed and heavily skewed.
To begin with I have selected key prognostic values recorded within the dataset for melanoma being Breslow thickness of melanoma (continuous), ulceration status (binary) and mitotic rate (classified as ordinal categorical variable). I have selected independent variables where data is complete (no missing observations) - age, melanoma subtype, sex, subsite location as well as outcome indicator and survival hazard function.

Below is my code thus far for imputation, I am fairly happy with the mi estimate coefficients very closely mirroring the coefficients estimated from non-imputed dataset.
My question to the forum is what would be the appropriate process/syntax to incoporate the imputed values into the incomplete/missing datapoints to allow continuation of my survival analysis models with a 'complete' dataset? (apologies if I have not worded this correctly and if this is a basic question- I have trawled through the Statalist forums and other useful sites such as UCLA and various MI lectures as well as the Stata manual but could not find this process described; I have also found the MI menu interface tricky to follow)

Code:

 mi stset timem, failure(censor2==1) scale(1)
mi set mlong
mi register imputed breslow ulcer mitosescat4
mi impute chained (regress) breslow (logit) ulcer (ologit) mitosescat4 = agecat2 subtype sex subsitecat4 matthews_haz censor2, add(10)
mi estimate: regress breslow i.ulcer i.mitosescat4

Many thanks in advance,

↧

How can I include year-fixed effects into monthly panel data?

July 14, 2018, 4:31 am

≫ Next: Export to excel formatting with changing cells ranges

≪ Previous: Adding multiply imputed data using Rubin's rules into registered multiple imputation variables

Hi,

I have monthly panel data on 350 mutual funds over the period January 2007 to December 2015. Here is a small extract of my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int FundID str7 Date double(TotalAssets Flows Return Expenses)
1 "oct2012"  75.4    .95325         -8.2476               1.35
1 "nov2012"  76.2   3.49890        -29.2548               1.349
1 "dec2012"  75.7    .46943        -13.5072               1.35
1 "jan2013"    77  -5.46289         86.1624               1.35
1 "feb2013"  98.9  28.90355         -5.544                1.35
1 "mar2013" 107.7   6.25727         31.6872               1.35
end
label var FundID "FundID" 
label var Date "Date" 
label var TotalAssets "FundSize" 
label var Flows "Flow" 
label var Return "SiReturn" 
label var Expenses "Expense"

I am using xtreg and fe to regress Flows on Return and some controls. In the literature, year-fixed effects are used in many of these models. How can I implement year-fixed effects when I have monthly data?

Kind regards,
Stephan

↧

Export to excel formatting with changing cells ranges

July 14, 2018, 4:44 am

≫ Next: Export to Excel applying format when the range of the cells change

≪ Previous: How can I include year-fixed effects into monthly panel data?

Dear all,
I have a question related to how to export to Excel different datasets (specifically, resultssets) giving format to a set of cells when the range of these cells changes on every datasets. I am using Stata 13 (although my Uni has access to Stata 14) in Windows 10.

To give a background of what I am doing, I have a comparable (sampling design-wise) dataset for 30 countries. For each of these, I am estimating proportions and associated standard errors for the same set of K=20 CATEGORICAL variables. My goal is to create, for every country, an Excel sheet (I must stick to this program for this project) with 3 colums (category, estimated proportion, estimated standard error) and 20 consecutive blocks of rows (one block for each of the 20 variables). Each of these 20 blocks is made up of 1+q_k rows:
1) 1 row indicating the label of the k-th variable (column 1) and nothing on columns 2 and 3
2) q_k additional rows showing the name of the categories for this k-th variable (in the column 1) and its associated proportions (column 2) and standard errors (column 3).
I am sticking to the resultssets approach from Newson, R. (2004) From datasets to resultssets in Stata (http://www.rogernewsonresources.org....4/overhed2.pdf) which, basically, consists on turning the dataset into the desired statistical table. So, for each of the 30 countries, I apply a nested loop through the 20 variables which provides me the variable-especific proportions and standard errors for each category using -parmby- (the q_k rows above). After this, I use -insob- (from SSC) to insert a new observation which will have the variable label at the beginning of this block (the first row of the block). Iterating this procedure I get each of 20 the blocks described above. After appending these, I export them with -export excel- with the sheet() option. A MWE of this is

Code:

local countries_list country_1 country_2 ... country_30
local vars_list var_1 var_2 ... var_20
foreach country of local countries_list {
    use "Dataset `country'.dta", clear

    foreach var of local vars_list {
        local var_label: variable label `var'
        preserve
        parmby "proportion `var'", label rename(parm category estimate proportion stderr se) norestore
        insob 1 1
        replace category="Variable `var': `var_label'" in 1

        save "`country' `var'.dta", replace
        restore    
    }
    clear
    foreach var of local vars_list {
        append using "`country' `var'.dta"
    }

    export excel category proportion se using "Descriptives.xlsx", sheet("`country'") firstrow(variables) sheetmodify        
}

As expected, the structure of the resultsset for each country looks like
Array

My problem is that I want to apply, for each block of each of these country-specific sheets, italics to the first row (which contains the label of the k-th variable) and indentation the q_k rows (which shows the categories of the k-th variable) in the first column. The complication arises because the range of the cells that I want to format changes in a non-uniform way between countries. This is because the number of categories for a given variable (say, var_k) might be different for every country, given a country-specific codification (e.g. geographical regions differ between countries) or, even in the case when a variable have theorethically a unique codification, a given category might be absent on particular countries. Hence, row i on a given sheet (country) might not refer to the same thing (label or categories) on different sheets. I am pretty sure that specifying these cells manually is not the way to go.

So, my question is: is there any way that I can apply the desired formatting on an Excel file (using export excel or other command) from within Stata that accomodates to this resultssets approach when the range of cells to be formatted change? More specifically, I was wondering if there is solution similar to what can be done in LaTeX; namely to apply between parmby and insob commands
replace category="\hspace{3bp} " + category
which would give me what I want. Is there a similar way to do this in Excel? Just for the record. I am aware that the -putexcel- command allows formatting cells. However, this solution turns out to be very inneficient since it requires me to specify the range of the cells that I wish to format.

Thanks for you time

↧

Export to Excel applying format when the range of the cells change

July 14, 2018, 4:52 am

≫ Next: Grouping observations into four dummy variables

≪ Previous: Export to excel formatting with changing cells ranges

Dear all,
I have a question related to how to export to Excel different datasets (specifically, resultssets) giving format to a set of cells when the range of these cells changes on every dataset. I am using Stata 13 (although my Uni has access to Stata 14) in Windows 10.

To give a background of what I am doing, I have a comparable (sampling design-wise) dataset for 30 countries. For each of these, I am estimating proportions and associated standard errors for the same set of K=20 categorical variables. My goal is to create, for every country, an Excel sheet (I must stick to this program for this project) with 3 colums (category, estimated proportion, estimated standard error) and 20 consecutive blocks of rows (one block for each of the 20 variables). Each of these 20 blocks is made up of 1+q_k rows:
1) 1 row indicating the label of the k-th variable (column 1) and nothing on columns 2 and 3
2) q_k additional rows showing the name of the categories for this k-th variable (in the column 1) and its associated proportions (column 2) and standard errors (column 3).
I am sticking to the resultssets approach from Newson, R. (2004) From datasets to resultssets in Stata (http://www.rogernewsonresources.org....4/overhed2.pdf) which, basically, consists on turning the dataset into the desired statistical table. So, for each of the 30 countries, I apply a nested loop through the 20 variables which provides me the variable-especific proportions and standard errors for each category using -parmby- (the q_k rows above). After this, I use -insob- (from SSC) to insert a new observation which will have the variable label at the beginning of this block (the first row of the block). Iterating this procedure I get each of 20 the blocks described above. After appending these, I export them with -export excel- with the sheet() option. A MWE of this is

Code:

local countries_list country_1 country_2 ... country_30
local vars_list var_1 var_2 ... var_20
foreach country of local countries_list {
    use "Dataset `country'.dta", clear
    foreach var of local vars_list {
        local var_label: variable label `var'
        preserve
        parmby "proportion `var'", label rename(parm category estimate proportion stderr se) norestore
        insob 1 1
        replace category="Variable `var': `var_label'" in 1
        
        save "`country' `var'.dta", replace
        restore
    }

    clear
    foreach var of local vars_list {
        append using "`country' `var'.dta"
    }

    export excel category proportion se using "Descriptives.xlsx", sheet("`country'") firstrow(variables) sheetmodify
}

As expected, the structure of the resultsset for each country looks like Array

My problem is that I want to apply, for each block of each of these country-specific sheets, italics to the first row (which contains the label of the k-th variable) and indentation the q_k rows (which shows the categories of the k-th variable) in the first column. The complication arises because the range of the cells that I want to format changes in a non-uniform way between countries. This is because the number of categories for a given variable (say, var_k) might be different for every country, given a country-specific codification (e.g. geographical regions differ between countries) or, even in the case when a variable have theorethically a unique codification, a given category might be absent on particular countries. Hence, row i on a given sheet (country) might not refer to the same thing (variable label or variable categories) on different sheets. I am pretty sure that specifying these cells manually is not the way to go.

So, my question is: is there any way that I can apply the desired formatting on an Excel file (using export excel or other command) from within Stata that accomodates to this resultssets approach when the range of cells to be formatted change? More specifically, I was wondering if there is solution similar to what can be done in LaTeX; namely to apply between parmby and insob commands
replace category="\hspace{3bp} " + category
which would give me what I want. Is there a similar way to do this in Excel? I am aware that the -putexcel- command allows formatting cells but this solution turns out to be inneficient since the range of the cells I want to format changes, as decribed above.

Thank you for your time

↧

Grouping observations into four dummy variables

July 14, 2018, 6:58 am

≫ Next: egen help: quintiles

≪ Previous: Export to Excel applying format when the range of the cells change

Hello, I've been struggling lately with how to create/group observations into four dummy variables in the attached data set. I want to group the programs into four dummy variables called LCT, Information, Lenient and Strict. I've watched countless videos and tried different commands and syntax but can't seem to figure it out. I would be incredibly grateful for some guidance and expertise. Kind regards, Jon. Please comment if you need more information.

	LCT	Information	Lenient	Strict
	Kenya: CT-OVC	Zimbabwe: Manicaland	Argentina: AUHPC	Cambodia: ESSS
	Indonesia: JPS	Burkina Faso: OVC	Malawi: CCT for Schooling	Philippines: Pantawid
	Honduras: PRAF	Nicaragua: SAC	Tanzania: CCT	Mexico: Progresa
*Programs*	Dominican Republic: PS	Paraguay: Tekopor	Colombia: Familias en Accion	Mexico: Oportunidades
	Morocco: Tayssir		Cambodia: JFPRS	Indonesia: KH
	Bangladesh: Shombhob		Cambodia: Scholarship Pilot	Jamaica: PATH
			Brazil: PETI	Nicaragua: RPS
				Colombia: SCB (x 3)

↧

egen help: quintiles

July 14, 2018, 9:37 am

≫ Next: Measuring the average of variables of variables in subsequent periods

≪ Previous: Grouping observations into four dummy variables

Hello, I am trying to generate a new variable, alcohol use (in quintiles). I would like to take my variable and cut it in to 5 equal parts.
I have demonstrated an association with another variable when being measured continuously, but the magnitude is very small. Thus I would like to rescale.

I have tried various permutations of the egen command, but I cannot quite find the right syntax to divide it up into 5ths.

Any help would be greatly appreciated.

↧

Measuring the average of variables of variables in subsequent periods

July 14, 2018, 10:10 am

≫ Next: Accomodating arguments of an option in the -syntax- command

≪ Previous: egen help: quintiles

Dear reader,

I have large panel dataset on a few thousand firms' stock returns and share issuance. I'd like to compare the stock returns of firms that issue a lot of shares, versus those who don't.
In order to so, I've sorted my data by my variable 'netissue' and created decile rankings. For each year between 1980 and 2018 my variable netissue_decile thus gives a rank between 1-10 denouning if that specific firm has issued few or many stocks in the relevant year.

Thus, I want to compare the average return for the next 5 periods of a portfolio consisting of stocks where netissue_decile == 1 versus one consisting of only firms in the tenth decile.
A major complication is that it's possible for a firm to be in the first decile in e.g. 1990 and then again in 2010. When asking myself the question, what is the mean return after a firm was in the first decile of share issuers three years after? Both the return from 1993 and 2013 need to be included.

What I've tried to do so far is use bysorts and loops, but to no avail. My best bet yet was to create a variable called years_since that measures the number of years after a firm was in the first or tenth decile. Then, my plan was to take the mean of all returns for years_since 1 till 5. I tried to do this:

Code:

generate years_since = .
local i = 1
local N = _N

while `i' <`N' {
    if netissue_annual_decile == 1 | netissue_annual_decile == 10 {
    replace years_since = 1
    replace years_since = 2 in `i'+1
    replace years_since = 3 in `i'+2
    replace years_since = 4 in `i'+3
    replace years_since = 5 in `i'+4
    }
`i' = `i' + 1
}

But unfortunately, STATA throws the error ''1+1' invalid observation number'. So it doesn't recognize 'i + 1' as a valid observation number.
Secondly, I think the approach with a loop will ignore that the [i + 4] observation might actually belong to a different ID within the panel.

Does anyone know of a good way to tackle this problem? A code example would of course be nice, but is not necessary at all. If I can just find a good strategy to use.

↧

Accomodating arguments of an option in the -syntax- command

July 14, 2018, 10:10 am

≫ Next: How to create a variable with consecutive numbers that follow certain a rule of when to start counting consecutively?

≪ Previous: Measuring the average of variables of variables in subsequent periods

Dear all,
I have a question related the -syntax- command, namely how to accomodate a varlist specified as part of an option to use it later within the program. I am using Stata 13 in Windows 10.

In particular, I am defining a program (called descriptives) which requires a varlist. For the subset of continuous variables (specified in the option continuous() ), I apply a set of commands; for the subset of discrete variables (the remaining variables of the varlist), I apply another sets of commands. A MWE of what I have is

Code:

capture: program drop descriptives
program define descriptives
version 13
syntax varlist(numeric min=1), CONTinuous(varlist numeric min=1)

foreach var of local varlist {
    if SOMETHING {
       ....
       [A set of commands here for the continuous variables]
       ....    
    }
    else  {
       ....
       [Another set of commands here for the discrete variables]
       ....
    }    
}

end

My question is how to turn SOMETHING into a condition that uses the content in the continous() option to specify the continuous variables in the varlist. E.g. say that I have 5 variables, x, y and z which are continuous and w1 and w2 which are discrete. Thus, I would like to specify

Code:

descriptives x w1 y z w2, cont(x y z)

so that after accomodating "x y z", SOMETHING turns into
"`var'"=="x" | "`var'"=="y" | "`var'"=="z"
which would tell the program that these are the continous variables and do the appropiate analysis. Maybe there is another way to do this more efficiently that does not imply the conditional branching, which I would appreciate to know.

Thank you

↧

How to create a variable with consecutive numbers that follow certain a rule of when to start counting consecutively?

July 14, 2018, 4:30 pm

≫ Next: xtpcse estimation with AR(1) option

≪ Previous: Accomodating arguments of an option in the -syntax- command

Dear experts,

I am working with a panel dataset (from 1946 to 2015) that contains among other variables a dummy equal to 1 for each year a country is at war.
I am trying to create a variable that records peace. In essence, the peace variable is supposed to start with 1 when a war ends (thus the war dummy would be 0), and continue to have consecutive values (2, 3, 4 etc.) until a new war starts (war = 1), and then peace gets values of 0 until that new war ends, and so on. Thus, it measures the years of peace from the time a war stops, and up until a new war starts.

I have created the peace variable itself, but I have some issues with it. Below is the code I used to create the peace variable.

Code:

//Generating Peace Variable
gen peace = 1
replace peace = 0 if owar==1
by country, sort: replace peace = peace + peace[_n-1] if lowar==0
replace peace = 0 if peace!=0 & owar==1

peace - peace variable
owar - war dummy
lowar - lag of war dummy

The problem with this peace variable is that if a country, say Afghanistan, begins in the dataset (at 1946) without a war, I want it to have values of 0 for the peace variable until the first war of that country starts, and then when the war ends I want the peace variable to continue normally. However the variable I created counts the first years of any country (1946 and up) as "post-war" peace years and assigns consecutive values starting from 1 until a war starts.
An example to make it a bit clearer:

country	year	owar	my "peace" variable	new "peace" variable I want
Afghanistan	1946	0	1	0
Afghanistan	1947	0	2	0
Afghanistan	1948	1	0	0
Afghanistan	1949	0	1	1
Angola	1946	0	1	0
Angola	1947	1	0	0
Angola	1948	0	1	1
Angola	1949	0	2	2

Unfortunately the first wars of all the countries start at different times, so I can't specify a certain time that would encompass all first wars of each country.
Essentially, I want the countries and years before the first wars start to have 0 values for peace, and then once the first wars start the peace variable continues normally (consecutively).
However, I can't figure out how to make that distinction in Stata.
In the worst case scenario, I would have to fix them manually.
I hope I've made myself clear enough, and if not, I apologize.

Thank you in advance!

↧

xtpcse estimation with AR(1) option

July 15, 2018, 12:20 am

≫ Next: Kernel propensity score matching d-i-d technique in an unbalanced panel with different treatment years for each id

≪ Previous: How to create a variable with consecutive numbers that follow certain a rule of when to start counting consecutively?

Dear all:

I have a time-series cross-section dataset. When I use xtpcse with AR(1) option "c(a)", I get the following message. I understand that I get the message because there are gaps in my dataset (unbalanced), and the AR(1) option assumes balanced data. But what does it mean for the results that Stata's xtpcse produces? The results cannot be trusted? Or they can be trusted, although estimated rho may not be appropriate?

I thank you in advance for your kind help.

Best,

Taka Sakamoto

-----------------------------------------
. xtpcse mfppwt l4.redisp l4.lfamtotppp l4.lsecondarygdppop l4.lalmptotppp l4.igerdgov l4.popgrow l4.lcgdpopc if year
> <2009,c(a) p

Number of gaps in sample: 28
(note: computations for rho restarted at each gap)
(note: estimates of rho outside [-1,1] bounded to be in the range [-1,1])
(note: at least one disturbance covariance assumed 0, no common time periods
between panels)

Prais-Winsten regression, correlated panels corrected standard errors (PCSEs)

Group variable: id Number of obs = 215
Time variable: year Number of groups = 17
Panels: correlated (unbalanced) Obs per group:
Autocorrelation: common AR(1) min = 2
Sigma computed by pairwise selection avg = 12.647059
max = 22
Estimated covariances = 153 R-squared = 0.1254
Estimated autocorrelations = 1 Wald chi2(7) = 19.00
Estimated coefficients = 8 Prob > chi2 = 0.0082

----------------------------------------------------------------------------------
| Panel-corrected
mfppwt | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
redispov |
L4. | -.0512275 .0269678 -1.90 0.057 -.1040834 .0016284
|
lfamtotppp |
L4. | .3697768 .365409 1.01 0.312 -.3464116 1.085965
|
lsecondarygdppop |
L4. | .1587017 .1843462 0.86 0.389 -.2026102 .5200137
|
lalmptotppp |
L4. | .4643055 .3519623 1.32 0.187 -.2255279 1.154139
|
igerdgov2 |
L4. | 1.657015 .7539443 2.20 0.028 .1793118 3.134719
|
popgrow |
L4. | -.2262848 .3279123 -0.69 0.490 -.8689812 .4164116
|
lcgdpopc |
L4. | -3.314369 1.592007 -2.08 0.037 -6.434645 -.1940926
|
_cons | 32.64075 13.82515 2.36 0.018 5.54396 59.73755
-----------------+----------------------------------------------------------------
rho | .4644425
----------------------------------------------------------------------------------

↧

Kernel propensity score matching d-i-d technique in an unbalanced panel with different treatment years for each id

July 15, 2018, 12:31 am

≫ Next: How to identify city names appeared in all years

≪ Previous: xtpcse estimation with AR(1) option

Hi,

I am working on a firm-level panel data (unbalanced panel). I want to estimate the impact of entering into import market on total factor productivity of firms using the user-written command "diff" in stata. My treatment group contains all those firms which enter the import market for the first time in a particular year and remain an importer for a minimum of two years. Also, the firms should have an observed history of atleast two years in the data prior to import entry. To get a value for the counter-factual, I am considering a control group that consists of firms which never entered the import market. However, I have a doubt in creating the variable "time"? Ideally, this variable takes a value 1 in the follow up period and 0 in the baseline (i.e. before treatment). I know how to substitute the value for the treated firms in this case, however, I am not able to figure out how should I assign the values for the controlled group. Please help me in clarifying this doubt. I shall be highly obliged.

Thank You so much.

↧

How to identify city names appeared in all years

July 15, 2018, 2:27 am

≫ Next: Conditional variable generation

≪ Previous: Kernel propensity score matching d-i-d technique in an unbalanced panel with different treatment years for each id

I am analyzing a panel data. But I don't know how to get the list of city names which have values in all years.

For example, the cities appear in all year in the following data are A, B, C. Does anyone know how to do this?

Code:

↧

Year	City
1999	A
1999	B
1999	C
1999	D
1999	E
1999	F
1999	G
2000	A
2000	B
2000	C
2000	D
2000	E
2001	A
2001	B
2001	C