standard error of 8280 in multinominal logit

December 30, 2019, 9:29 am

≫ Next: Using anymatch in a forvalue loop to detect if each value in v1 matches ANY value in v2

≪ Previous: Estimating adjusted means and 95% CI using regression stata

Hi,

I am analysing my data using multinominal logit. Firstly sorry that I cannot post my data and full results here.

Let call dependent variable "P3", and I have several independent variables: "treatment" , "P1", "age", "iq", "female", "mistakes", "major". The one I'm interested in is "treatment", and I think that "P1" has be included in the regression as a control. "P3" and "P1" are measuring the same thing before and after the treatment, and they have 7 categories. The sample size is small, 157, with two missing value in Female, so N=155.

I am running into a problem of getting very large standard error of coefficient, such as 8280 of one category of P1. Almost every such large standard error happens with one of the category of P1.

Code:

                        Coef.    Std.  Err.     z     P>z     [95%  Conf.  Interval]
P1                |
               1   |   2.514913   .7466968     3.37   0.001     1.051414    3.978412
                2  |   1.361554   1.327665     1.03   0.305    -1.240622    3.963729
                3  |   .9366774   8280.394     0.00   1.000    -16228.34    16230.21
                4  |  -.3395546   4124.867    -0.00   1.000     -8084.93    8084.251
                -1  |  -.6942527   1.267946    -0.55   0.584    -3.179382    1.790877
                -2  |   19.35054   11956.97     0.00   0.999    -23415.88    23454.58

I looked at the cross-table of P1 and P3, and found there are some empty cells. The partial table looks like this.

Code:


P1        |                                P3
            |        -2         -1          0          1          2          3          4 |     Total
-----------+-----------------------------------------------------------------------------+----------
         3 |         0          0          0          0          1          2          2 |         5
         4 |         0          0          0          0          0          3        13 |        16
-----------+-----------------------------------------------------------------------------+----------

I am wondering if these empty cells cause the enormous standard error. I know that the sample size is very small, and the number of independent variable are relatively too large to sample size, should I switch to -firthlogit-?

Thanks for any help!!

↧

Using anymatch in a forvalue loop to detect if each value in v1 matches ANY value in v2

December 30, 2019, 12:01 pm

≫ Next: Time stamps in forum software

≪ Previous: standard error of 8280 in multinominal logit

I'm struggling to come up with a solution for finding if each observation in variable 1 matches ANY of the specified observations in v2.I'm trying to narrow the data to focus on passengers that have arrived on-time at least one time in the data. That way I can look at those passengers' data, even for points when they weren't on time.
I'm trying to pass a numlist to anymatch of the names of the ID's of the passengers that have arrived on time at least one time but I'm getting an error.
"values() invalid -- invalid numlist"

This is my code:

g on_time= passengers if timely==1; // limiting to timely arrivals.
levelsof on_time;
g on_time_levels= r(levels); //unique numlist of passengers with timely arrivals (unsure of this)
g on_time_ever=.;
forvalues i =1/6939 {;
egen tempvariable = anymatch(passengers) if _n==`i',values(on_time_levels);
replace on_time_ever=tempvariable if _n==`i';
drop tempvariable;
};

I am unsure if the levels var I generated is really a numlist. How else can I get a numlist from this variable so I can pass it to anymatch? Or am I just going about this completely wrong?
Thanks!

↧

Time stamps in forum software

December 30, 2019, 12:34 pm

≫ Next: Assistance on Statistical analysis

≪ Previous: Using anymatch in a forvalue loop to detect if each value in v1 matches ANY value in v2

Is there any chance the forum software could be changed to specify time zone in the time stamps? Right now (for me at least) it displays Central Time, which always confuses me a bit, since I'm on the east coast (and I'm assuming is even more confusing for people in more different time zones). So right now, it's 15:34 where I am, but the time stamp says 14:34. Even better would be the option to change what time zone the time stamps are displayed in (if that option doesn't exist already - apologies if it does!).

↧

Assistance on Statistical analysis

December 30, 2019, 12:48 pm

≫ Next: Problem with merging multiple csv files using merge 1:1

≪ Previous: Time stamps in forum software

Can anyone help me out? I am investigating the coping strategies used among women by using the Brief COPE scale with Likert scale of 4. I want to see if there any association between the coping strategies and socio-demographic characteristics and medical variables? Which test is appropriate and what command to give. I have attached a dummy table for understanding.

↧

Problem with merging multiple csv files using merge 1:1

December 30, 2019, 1:49 pm

≫ Next: Multilevel Panel Data with CPS Data

≪ Previous: Assistance on Statistical analysis

Hi,

I am a beginner in Stata (using Stata 16) and after going through many of the posts regarding merging multiple files from a folder, I tried to write the following code but I received an error. I will describe the data, folder structure, code and error messages below:

Data: I have quarterly bank data from FDIC where each csv file corresponds to one quarter and within each file, different banks are identified using a variable called 'cert'. For every file, there is also a column named 'repdte' which lists the quarter for the particular file (so for eg, I will have a file named All_Reports_20170930_U.S. Government Obligations.csv which will have many columns giving data regarding US Govt Obligations and there will also be two additional columns cert and repdte listing the bank ID and 20170930 respectively for the entire file).
Sample csv files may be downloaded from: https://www7.fdic.gov/sdi/download_l...st_outside.asp For my testing, I am using the 2018, 2017 files for quarters 1231 and 0930 for the files "Unused Commitments Securitization" and "U.S. Government Obligations".
What I want to do: I want to merge all the bank data across banks and quarter (panel data) and to do this, i figured I should use the command: merge 1:1 cert repdte using filename
Code:
clear all
pwd
cd "C:\Users\HP\Dropbox\Data\Test2"

tempfile mbuild
clear
save `mbuild', emptyok

foreach year in 2018 2017{
foreach dm in 1231 0930 {
foreach name in "Unused Commitments Securitization" "U.S. Government Obligations"{
import delimited "C:\Users\HP\Dropbox\Data\Test2\All_Reports_`y ear' `dm'_`name'", clear
gen source = "C:\Users\HP\Dropbox\Data\Test2\All_Reports_`y ear' `dm'_`name'"
merge 1:1 cert repdte using `mbuild'
save `mbuild', replace

}
}
}
Error:
.
. foreach year in 2018 2017{
2. foreach dm in 1231 0930 {
3. foreach name in "Unused Commitments Securitization" "U.S. Governmen
> t Obligations"{
4. import delimited "C:\Users\HP\Dropbox\Data\Test2\All_Report
> s_`year'`dm'_`name'", clear
5. gen source = "C:\Users\HP\Dropbox\Data\Test2\All_Reports_`y
> ear'`dm'_`name'"
6. merge 1:1 cert repdte using `mbuild'
7. save `mbuild', replace
8.
. }
9. }
10. }
(52 vars, 5,415 obs)
no variables defined
r(111);

Could someone please help me understand what i am doing wrong and how I can achieve what I am trying to do? Additionally, I also want to be able to retrieve the merged file to do further analysis on Stata and also export it to a folder on my computer - how should I do that?

↧

Multilevel Panel Data with CPS Data

December 30, 2019, 4:41 pm

≫ Next: Identification of Treatment and Control Group

≪ Previous: Problem with merging multiple csv files using merge 1:1

Good evening,

Using the below Census Population Survey variables, I need to figure out the change in percent Latino for each metarea from year to year, as well as the actual percent for each given year. I plan to use actual percent for that year and the change in percent as IVs for my model.

-year (2010-2019 in one year increments; sample per year)
-metarea (about 370 metropolitan areas that households are assigned to)
-household
-person in household
-Latino (binary variable at the person level)

I attached a preview of my dataset.

Thank you!

↧

Identification of Treatment and Control Group

December 30, 2019, 6:43 pm

≫ Next: Generate var using sequential variables names

≪ Previous: Multilevel Panel Data with CPS Data

Respected members,

I am trying to employ DID as a means of analysis. In my dataset of 287 firms between 2001 and 2016, there was a policy reform in 2010 of including at least 10 percent of female directors. After reading some articles on DID, I have developed the following alternatives to identify the treatment and control groups.

Option 1
Treatment group: Firms that did not have 10% of female directors before 2010.
Control group: Firms that had female directors of 10% or above before 2010.

Option 2
Treatment group: Firms that did not have 10% of female directors before 2010 and had at least 10% of female directors from 2010 onwards.
Control group: Firms that did not have 10% of female directors before 2010 and did not have at least 10% of female directors even after 2010.
The firms that already had at least 10% female directors before 2010 are excluded from the analysis.

Could you please advise me in this regard as to which of the above option (1 or 2) is appropriate?

Thanks in anticipation.

↧

Generate var using sequential variables names

December 30, 2019, 8:10 pm

≫ Next: Generate sum of variables with sequential variables names

≪ Previous: Identification of Treatment and Control Group

Hi everyone,

p1_1 p1_2 p1_3 p2_1 p2_2 p2_3 p3_1 p3_2 p3_3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3

I would like to generate a sum of the values with sequential variable names, using loops.

generate p1= p1_1 + p1_2 + p1_3
generate p2= p2_1 + p2_2 + p2_3
generate p3= p3_1 + p3_2 + p3_3

Thanks you....

↧

Generate sum of variables with sequential variables names

December 30, 2019, 8:19 pm

≫ Next: Removing NA Across var

≪ Previous: Generate var using sequential variables names

Hi everyone,

I have a data with these sequential variable names.

p1_1_1_1	p1_1_1_2	p1_1_1_3	p1_1_1_4	p1_1_2_1	p1_1_2_2	p1_1_2_3	p1_1_2_4	p1_1_3_1	p1_1_3_2	p1_1_3_3	p1_1_3_4
1	1	1	1	2	2	2	2	3	3	3	3
1	1	1	1	2	2	2	2	3	3	3	3
1	1	1	1	2	2	2	2	3	3	3	3

I would like to sum the sequential variable names as follows:

generate p1_1_1 = p1_1_1_1 + p1_1_1_2 + p1_1_1_3 + p1_1_1_4
generate p1_1_2 = p1_1_2_1 + p1_1_2_2 + p1_1_2_3 + p1_1_2_4
...

Somebody can help me for doing the same using loops...

Thanks a lot...

↧

Removing NA Across var

December 31, 2019, 2:51 am

≫ Next: Split string variable

≪ Previous: Generate sum of variables with sequential variables names

I have data in string as shown below

data1 data2
NA NA
NA NA
NA NA
NA 8415739
NA 10024002
N 12057882
N 10759322
N 11305650
N 10937087
N 11463371
N 11287917
N 12720750
N 14849447
N 15542380
N 17368642
N 20738561

I want to replace NA observation to missing(.)
I tried this command
replace data1=. if data1==NA stata return error "NA not found"

Can anybody help me on this please

↧

Split string variable

December 31, 2019, 3:33 am

≫ Next: about synth instruction question

≪ Previous: Removing NA Across var

Dear Experts,

I want to split the string variable. Please advice.

The issue is: I have the responses "a" "adc" "acfj" "cde" "adfghj". I want to split these responses as single word. e.g. "a" "b" "c" "d" "e". Can it done? Looking forward for your advice.

Thanking you

Yours faithfully

Cheda Jamtsho

↧

about synth instruction question

December 31, 2019, 3:41 am

≫ Next: Discriminate analysis using stata

≪ Previous: Split string variable

hello great master
I have a question
My stata version is 14
when I perform the instruction, it always displays the error message
what can I do?

////stata instruction///
xtset state year
replace age15to24 = 100*age15to24
synth cigsale cigsale(1988) cigsale(1980) cigsale(1975) lnincome retprice ///
age15to24 beer(1984(1)1988), trunit(3) trperiod(1989)

file synthopt.plugin not found <-------error message
(error occurred while loading synth.ado)

/////data////
clear
input long state float(year cigsale lnincome beer age15to24 retprice cigsale_cal cigsale_rest)
1 1970 89.8 . . 1788.618 39.6 . 120.08421
1 1971 95.4 . . 1799.2784 42.7 . 123.86316
1 1972 101.1 9.498476 . 1809.939 42.3 . 129.17896
1 1973 102.9 9.550107 . 1820.5994 42.1 . 131.53947
1 1974 108.2 9.537163 . 1831.26 43.1 . 134.66843
1 1975 111.7 9.540031 . 1841.9207 46.6 . 136.93158
1 1976 116.2 9.591908 . 1852.581 50.4 . 141.26053
1 1977 117.1 9.617496 . 1863.242 50.1 . 141.08948
1 1978 123 9.654072 . 1873.9023 55.1 . 140.47368
1 1979 121.4 9.64918 . 1884.563 56.8 . 138.08684
1 1980 123.2 9.612194 . 1895.2234 60.6 . 138.08948
1 1981 119.6 9.609594 . 1858.4222 68.8 . 137.98685
1 1982 119.1 9.59758 . 1821.621 73.1 . 136.29474
1 1983 116.3 9.626769 . 1784.8202 84.4 . 131.25
1 1984 113 9.671621 18 1748.019 90.8 . 124.90263
1 1985 114.5 9.703193 18.7 1711.218 99 . 123.1158
1 1986 116.3 9.74595 19.3 1674.4167 103 . 120.59473
1 1987 114 9.762092 19.4 1637.6157 110 . 117.58685
1 1988 112.1 9.78177 19.4 1600.8146 114.4 . 113.82368
1 1989 105.6 9.802527 19.4 1564.0134 122.3 . 109.66315
1 1990 108.6 9.81429 20.1 1527.2124 139.1 . 105.66579
1 1991 107.9 9.81926 20.1 . 144.4 . 104.3421
1 1992 109.1 9.845286 20.4 . 172.2 . 103.39474
1 1993 108.5 9.85216 20.3 . 176.2 . 102.69473
1 1994 107.1 9.879334 21 . 154.6 . 102.11842
1 1995 102.6 9.924404 20.6 . 155.1 . 103.1579
1 1996 101.4 9.940027 21 . 158.3 . 101.18421
1 1997 104.9 9.93727 20.8 . 167.4 . 101.78947
1 1998 106.2 . . . 180.5 . 100.9579
1 1999 100.7 . . . 195.6 . 97.59473
1 2000 96.2 . . . 270.7 . 92.13421
2 1970 100.3 . . 1690.0676 36.7 . 120.08421
2 1971 104.1 . . 1699.5386 38.8 . 123.86316
2 1972 103.9 9.464514 . 1709.0095 44.1 . 129.17896
2 1973 108 9.55683 . 1718.4805 45.1 . 131.53947
2 1974 109.7 9.542286 . 1727.9513 45.5 . 134.66843
2 1975 114.8 9.514094 . 1737.4224 48.6 . 136.93158
2 1976 119.1 9.558153 . 1746.8933 50.9 . 141.26053
2 1977 122.6 9.590923 . 1756.364 52.6 . 141.08948
2 1978 127.3 9.657238 . 1765.835 56.5 . 140.47368
2 1979 126.5 9.633533 . 1775.306 58.4 . 138.08684
2 1980 131.8 9.573803 . 1784.777 61.5 . 138.08948
2 1981 128.7 9.593041 . 1750.1112 64.7 . 137.98685
2 1982 127.4 9.5737 . 1715.4453 72.1 . 136.29474
2 1983 128 9.593053 . 1680.7794 82 . 131.25
2 1984 123.1 9.65044 17.9 1646.1138 93.6 . 124.90263
2 1985 125.8 9.675527 18.1 1611.448 98.5 . 123.1158
2 1986 126 9.705939 18.7 1576.782 103.6 . 120.59473
2 1987 122.3 9.705574 19 1542.1163 113 . 117.58685
2 1988 121.5 9.721532 18.9 1507.4504 119.9 . 113.82368
2 1989 118.3 9.73737 19 1472.7847 127.7 . 109.66315
2 1990 113.1 9.736311 19.9 1438.119 141.2 . 105.66579
2 1991 116.8 9.743068 19.9 . 146.5 . 104.3421
2 1992 126 9.788629 20 . 177.3 . 103.39474
2 1993 113.8 9.785142 19.7 . 179.9 . 102.69473
2 1994 108.8 9.813631 19.7 . 168.1 . 102.11842
2 1995 113 9.86446 19.5 . 167.3 . 103.1579
2 1996 110.7 9.885234 20.1 . 167.1 . 101.18421
2 1997 108.7 9.883107 19.8 . 181.3 . 101.78947
2 1998 109.5 . . . 187.3 . 100.9579
2 1999 104.8 . . . 206.9 . 97.59473
2 2000 99.4 . . . 279.3 . 92.13421
3 1970 123 . . 1781.5833 38.8 123 .
3 1971 121 . . 1792.9636 39.7 121 .
3 1972 123.5 9.930814 . 1804.344 39.9 123.5 .
3 1973 124.4 9.955092 . 1815.724 39.9 124.4 .
3 1974 126.7 9.947999 . 1827.1044 41.9 126.7 .
3 1975 127.1 9.937167 . 1838.4847 45 127.1 .
3 1976 128 9.976858 . 1849.865 48.3 128 .
3 1977 126.4 10.0027 . 1861.2454 49 126.4 .
3 1978 126.1 10.045565 . 1872.6255 58.7 126.1 .
3 1979 121.9 10.054688 . 1884.0057 60.1 121.9 .
3 1980 120.2 10.03784 . 1895.386 62.1 120.2 .
3 1981 118.6 10.028626 . 1855.3705 66.4 118.6 .
3 1982 115.4 10.01253 . 1815.355 72.8 115.4 .
3 1983 110.8 10.031737 . 1775.3394 84.9 110.8 .
3 1984 104.8 10.07536 25 1735.324 94.9 104.8 .
3 1985 102.8 10.099703 24 1695.3083 98 102.8 .
3 1986 99.7 10.127267 24.7 1655.2927 104.4 99.7 .
3 1987 97.5 10.1343 24.1 1615.277 103.9 97.5 .
3 1988 90.1 10.141663 23.6 1575.2615 117.4 90.1 .
3 1989 82.4 10.142313 23.7 1535.246 126.4 82.4 .
3 1990 77.8 10.141623 23.8 1495.2303 163.8 77.8 .
3 1991 68.7 10.110714 22.3 . 186.8 68.7 .
3 1992 67.5 10.11494 21.3 . 201.9 67.5 .
3 1993 63.4 10.098497 20.8 . 205.1 63.4 .
3 1994 58.6 10.099508 20.1 . 190.3 58.6 .
3 1995 56.4 10.155916 19.7 . 195.1 56.4 .
3 1996 54.5 10.178637 19.1 . 197.9 54.5 .
3 1997 53.8 10.17519 19.5 . 200.3 53.8 .
3 1998 52.3 . . . 207.8 52.3 .
3 1999 47.2 . . . 224.9 47.2 .
3 2000 41.6 . . . 351.2 41.6 .
4 1970 124.8 . . 1909.5022 29.4 . 120.08421
4 1971 125.5 . . 1916.476 31.1 . 123.86316
4 1972 134.3 9.805548 . 1923.4497 31.2 . 129.17896
4 1973 137.9 9.848413 . 1930.4232 32.7 . 131.53947
4 1974 132.8 9.840451 . 1937.397 38.1 . 134.66843
4 1975 131 9.828461 . 1944.3706 41.7 . 136.93158
4 1976 134.2 9.858913 . 1951.344 44.8 . 141.26053
end

↧

Discriminate analysis using stata

December 31, 2019, 4:22 am

≫ Next: how to estimate individual betas of coefficients in a province-sector-year panel data (with 2 sectional identifiers)

≪ Previous: about synth instruction question

Hello everyone ;
i need to apply 'Discriminate analysis' on stata , how can i apply it and get both the standardized and unstandardized Discriminate function coefficients with structure matrix
I'm supposed to do like the pic

↧

how to estimate individual betas of coefficients in a province-sector-year panel data (with 2 sectional identifiers)

December 31, 2019, 4:59 am

≫ Next: Timevar for survival analysis

≪ Previous: Discriminate analysis using stata

Hello everyone:
I'm trying to estimate production functions for a panel data of manufacturing with 2 identifiers (province, sector) so that each sector will have observations of the different provinces. The first thing I do is to egen a new ID by group(province sector), but it leads to ignoring the unobservable common trend within each province, or sector apparently.

I was considering a fixed effect (LSDV) or a semi-parametric (e.g. Levinsohn and Petrin). The problem is:
(1)for the former fashion, how to correctly set factor variables;
(2)for the latter, how to correctly get betas of K and L for every province-sector section.

The attachment dataex.txt is a part of my data file.The models I thought were:
(1) reg lnYL_go lnKL i.prov_sec_id i.prov_sec_id#c.lnKL i.actual_year, vce(cluster prov_sec_id) (lnYL and lnKL not included, they are simply ln(Y\L), etc, assuming CRS.)
(2) prodest lnY_va, free(lnL) state(lnK) proxy(lnInt) met(lp) va acf id(prov_sec_id) t(actual_year)

I'm not trying to be a free rider, it's just that related references are rare. Any opinion or suggestion would be appreciated, and happy new year!

↧

Timevar for survival analysis

December 31, 2019, 5:29 am

≫ Next: Time varying covariate in Cox Regression model

≪ Previous: how to estimate individual betas of coefficients in a province-sector-year panel data (with 2 sectional identifiers)

Dear All,

This might be a silly question, but it is driving me crazy.

I am managing data which were not recorded for survival analysis and I am trying to put them in a proper format.

For the purpose of my question, here my data (I have more variables, but they behave as Var1 and Var2, namely varying during time):

ID	Visit	Date	DOsp1	DOsp2	Sex	Var1	Var2
1	0	1mar2002			M	0	.
1	1	3jun2005			M	.	.
1	2	4feb2007			M	.	.
2	0	9feb2002	21dec2000	22jun2001	F	1	18.9
2	1	7sep2002			F	2	9999
3	0	25mar2003			M	0	20
3	1	13oct2004			M	2	9999
4	0	4oct2002			F	1	23.5
4	1	03may2004	4jan2003	24jun2003	F	.	.
4	2	13jan2006			F	.	.
4	3	25aug2007			F	2	9999

ID is my person identifier, who can be visited several times (Visit, 0 is the baseline) in different dates (Date is when the visit took place). Each person, during the visit, could say up to 9 dates (I do have DOsp1-DOsp9, but for the sake of this question I just put the first two) regarding if and when they were hospitalized between the visits.

I will use snapspan in order to convert my data to time-span data, but before I guess I need to slightly change my time variable (and the dataset overall).

I want to have a timevar like Time (see table below) in order to run snapspan ID Time.

ID	Visit	Date	DOsp1	DOsp2	Sex	Var1	Var2	Time
1	0	1mar2002			M	0	.	1mar2002
1	1	3jun2005			M	.	.	3jun2005
1	2	4feb2007			M	.	.	4feb2007
2	.	.	.	.	.	.	.	21dec2000
2	.	.	.	.	.	.	.	22jun2001
2	0	9feb2002	21dec2000	22jun2001	F	1	18.9	9feb2002
2	1	7sep2002			F	2	9999	7sep2002
3	0	25mar2003			M	0	20	25mar2003
3	1	13oct2004			M	2	9999	13oct2004
4	0	4oct2002			F	1	23.5	4oct2002
4	.	.	.	.	.	.	.	4jan2003
4	.	.	.	.	.	.	.	24jun2003
4	1	03may2004	4jan2003	24jun2003	F	.	.	03may2004
4	2	13jan2006			F	.	.	13jan2006
4	3	25aug2007			F	2	9999	25aug2007

This is the final dataset I want to obtain:

ID	Datestarts	Dateends	Sex	Var1	Var2	Event	Event_recode
1	.	1mar2002	M	0	.	Visit 0	0
1	1mar2002	3jun2005	M	.	.	Visit 1	0
1	3jun2005	4feb2007	M	.	.	Visit 2	0
2	.	9feb2002	F	1	18.9	Visit 0	0
2	9feb2002	7sep2002	F	2	9999	Visit 1	2
3	.	25mar2003	M	0	20	Visit 0	0
3	25mar2003	13oct2004	M	2	9999	Visit 1	2
4	.	4oct2002	F	1	23.5	Visit 0	0
4	4oct2002	4jan2003	F	.	.	Osp 1	1
4	4jan2003	24jun2003	F	.	.	Osp 2	1
4	24jun2003	03may2004	F	.	.	Visit 1	0
4	03may2004	13jan2006	F	.	.	Visit 2	0
4	13jan2006	25aug2007	F	2	9999	Visit 3	2

As you might notice, if any date recorded in DOsp1-DOsp9 happened before Visit 0, it will not be taken into account. Then Event_recode will be build in order to have the failure var for my stset (Event_recode will be 0 if the row is regarding a visit, 1 if it is regarding an hospitalization, 2 if the person dies, namely if Var1==2, and then 3 if it is censored).

All of that, in order to run the following code:

stset Dataends, id(ID) time0( Datastarts ) origin(time Datastarts ) failure(Event_recode==1 2 ).

Thank you to anyone who can help me, feel free to ask me clarifications.
Best

↧

Time varying covariate in Cox Regression model

December 31, 2019, 5:51 am

≫ Next: Panel data regression

≪ Previous: Timevar for survival analysis

Hi all. After a thorough search online I can't seem to find a solution to my problem, which is why I'm now asking the experts

I'm doing a cox regression in 1175 subjects where I want to assess the effect of the dichotomous baseline variable X on the outcome Z. All subjects have variable X which is present since birth, so basically all have (X=1) In addition I have another dichotomous variable Y (which is more like an intervention effect) which is not present at baseline for any of the subjects, however some of the subjects get affected by (Y) event during their follow up at different dates, and this variable is known to be connected with outcome Z. I'm trying to know if occurrence variable Y increases the chance of occurrence of outcome Z (Z=1) in he study subjects , (so basically comparing those with Y=1 to those with Y=0) among all subjects who have variable X=1.

So the "known" chain of events is X --> Y ---> Z . And I want to test X --> Z. But I still want to include the effect of Y in my model as some of the subjects will follow X-->Y-->Z.
So i thought - how can I include Y as a time-varying co variate so as not to underestimate the effect of Y but still assess if there is a direct correlation with X and Z.

Hope the question isn't to cryptic - I'll be happy to elaborate on the question.

↧

Panel data regression

December 31, 2019, 6:07 am

≫ Next: Please help: Importing and merging multiple sheets from an excel file while renaming variables using loops

≪ Previous: Time varying covariate in Cox Regression model

Hello everyone,

I'm writing my thesis and I'm struggling with the processing of my data. First of all, my research question is: "What is the effect of environmental controversies on the profitability of Chinese and European firms?" and I want to check for moderation of corporate environmental performance, press freedom of the country of origin of the firm and ownership structure (concentration and state ownership). My dependent variables are ROA, ROE and Tobin's Q. My independent variables are environmental controversies (EC), corporate environmental performance (CEP), press freedom (PF), ownership concentration (Independence), and state ownership (GUO). My control variables are firm size, leverage and industry.

I have collected my data from Eikon and Orbis. I opted for a balanced dataset (so there are no more missing values), and this dataset consists of 314 firms (64 Chinese, 250 European)
My variables are:
- id (1 until 314)
- Year (2013-2018)
- Country (Europe or China)
- Industry (10 categories)
- Independence (A+ until D)
- GUO (e.g. Public authority)
- EC (dummy --> 0: no controversy in that year; 1: controversy in that year)
- CEP (score out of 100)
- PF (score out of 100)
- ROA
- ROE
- Tobin's Q
- Firm size
- Leverage

I made dummy variables for Country (DummyChina and DummyEurope), BvDIndependenceIndicator (DummyLowConcentration, DummyMediumLowConcentration, DummyMediumHighConcentration and DummyHighConcentration), GUO Type (DummyStateOwnership), Industry (DummyIndustry1, DummyIndustry2, DummyIndustry3, DummyIndustry4, DummyIndustry5, DummyIndustry6, DummyIndustry7, DummyIndustry8, DummyIndustry9 and DummyIndustry10).

Also, the variables EC, CEP and PF are lagged, as I want to measure the effect of the occurence of an environmental controversy on the profitability of the next year.

When I first started my regression, I used SPSS. However, I read that Stata is a much better alternative for panel data. I was able to upload my data in Stata, and did some tests to check whether I need: pooled OLS model, fixed effects model or random effects model. The result pointed out that I need to use REM. I was able to regress my first model, only using ROA as my dependent variable and EC, Firm size, leverage, DummyChina, DummyIndustry2, DummyIndustry3, DummyIndustry4, DummyIndustry5, DummyIndustry6, DummyIndustry7, DummyIndustry8, DummyIndustry9 and DummyIndustry10.

My questions:
- If I want to compare Chinese and European firms, is this the right standard model? Or do I have to start with just ROA, EC and the control variables and then make interaction terms for Country and Industry?
- If I later make interaction terms for Country, Industry, CEP, PF, GUO and Independence, can I add all these in just one regression? I do I have to add them separately and make multiple regressions?

Quite frankly, I'm a bit lost. I have never used panel data or Stata, and I have no idea what the right order is to answer my research question and check for moderation. My main struggle is the interaction terms.

If anyone has suggestions or could tell me the steps I have to follow, please let me know. Thank you in advance!!

↧

Please help: Importing and merging multiple sheets from an excel file while renaming variables using loops

December 31, 2019, 6:44 am

≫ Next: Margins: trouble with continuous interactions under simultaneous fixed effects

≪ Previous: Panel data regression

Hello,
I am quite new using loops and really want to understand it better.
I have an excel file with 11 sheets, I only need a couple sheets from it and need to rerun it everyday with new data but the same variables so am I am trying to write an efficient script to be able to complete what I need to do. For each of the sheets there are patient identifiers but they come in with different column names which is making it difficult to merge when importing within one loop.
If the column that I want to merge on is column_A, but in each sheet is called "column_AA" "column_AB" "columnAC" respective for the Sheet A, Sheet B, Sheet C
What I have so far to import the data in.

Code:

local sheets "Sheet_A Sheet_B Sheet_C"
foreach y in `sheets'{
    import excel using "data_set.xls", sheet(`y') firstrow clear
    save "`y'.dta",replace 
    }

How might I be able to add in a command to rename the column names to a similar one so then I can merge them all?

I was thinking of adding a loop inside it or a second loop after, but then the correct column_AB wouldnt match up with the correct sheet.
this is what i was thinking but doesnt really work

Code:

local variable "column_AA column_AB columnAC" 
        foreach t in `variable'{
        use "`y'.dta", clear
        rename `t' column_A
        }

Thanks for your help/advice/response!

-Ben

↧

Margins: trouble with continuous interactions under simultaneous fixed effects

December 31, 2019, 9:34 am

≫ Next: Panel data - Creating a date variable from year and weeknumber as string

≪ Previous: Please help: Importing and merging multiple sheets from an excel file while renaming variables using loops

Sometimes I wish to control for variable X via fixed effects (say, year fixed effects) but also allow the marginal effect of a second variable to vary continuously with variable X (say, the effect of adopting a new technology might vary linearly or non-linearly with year). In these situations, I am NOT interested in allowing the marginal effect of that second variable to change with *every* value of variable X --- this would waste power, as I believe that the marginal effect of the second variable varies smoothly with variable X.

Stata can run this regression: reg Y X1 i.X1#c.X2 i.X2. However, while a coefficient is calculated for both X1 and i.X1#c.X2, margins is for some reason unable to obtain the marginal effects of X1 over X2.

I have had this problem several times, and right now I'm having this problem in a situation where I have other fixed effect accounted for, and so am using xtreg. However, the problem is generalizable to a situation where one is using reg only. I have replicated the problem in the auto dataset, below, and would be incredibly grateful for thoughts on what's going on.

Code:

sysuse auto, clear
xtset foreign
gen lprice=log(price)
gen HIGHmpg=mpg>25

** Reg 1: This works fine
xtreg lprice i.HIGHmpg i.turn
margins, dydx(i.HIGHmpg)

    ** Works fine w/ no interaction

** Reg 2: This does not work
xtreg lprice i.HIGHmpg i.HIGHmpg#c.turn i.turn
margins, dydx(i.HIGHmpg) at(c.turn=(32 36 40 44 48 52))

    /* Command does not run. Error returned:
        c.turn ambiguous abbreviation
        r(111); */

** Reg 3: This "trick" also doesn't work
gen test=turn
xtreg lprice i.HIGHmpg i.HIGHmpg#c.test i.turn
margins, dydx(i.HIGHmpg) at(c.test=(32 36 40 44 48 52))

    /* Command runs, but interactions deemed "not estimable" */

** Reg 2 w/ reg instead of xtreg
reg lprice i.HIGHmpg i.HIGHmpg#c.turn i.turn
margins, dydx(i.HIGHmpg) at(c.turn=(32 36 40 44 48 52))

    ** Same error given
    
** Reg 3 w/ reg instead of xtreg
reg lprice i.HIGHmpg i.HIGHmpg#c.test i.turn
margins, dydx(i.HIGHmpg) at(c.test=(32 36 40 44 48 52))

    ** Interactions still deemed "not estimable"

** Note: it is possible to allow an interaction between i.HIGHmpg and EVERY
** value of test, as below, but this is not what I want to do, as it wastes power.
** In my own examples, it is helpful to do this because I can see a linear or
** non-linear pattern in the marginal effects, but then I ultimately want to run
** the model allowing only a continuous change in the marginal effects.
xtreg lprice i.HIGHmpg i.HIGHmpg#i.turn i.turn
margins, dydx(i.HIGHmpg) over(i.turn)

↧

Panel data - Creating a date variable from year and weeknumber as string

December 31, 2019, 10:41 am

≫ Next: Find first stage F-stats under xtivreg with factor variables (so no xtoverid)

≪ Previous: Margins: trouble with continuous interactions under simultaneous fixed effects

Stata listers

I am writing with a query relating to panel data for historical prices. I am trying to create a date variable from an Excel file which contains year and weeknumber as a string. Is there a way to convert information available - year and week numbers (as string) into Stata or Excel recognisable dates? Thanks very much.

year	weeknum	Price 1	Price 2	date
1890	2nd week in Jan	76	90
1890	3rd week in Jan	76	90
1890	4th week in Jan	76	90
1890	2nd week in Feb	76	90
1890	3rd week in Feb	76	90
1890	4th week in Feb	76	90
1890	2nd week in March	76	90
1890	3rd week in March	80	94
1890	4th week in March	80	94
1890	5th week in March	80	94

I am not able to attach this data in .dta format for some reason. I am using Stata MP 16.

↧