Quantcast
Channel: Statalist
Viewing all 73243 articles
Browse latest View live

Obtain village name form longitude and latitude & coordinates

$
0
0
Hi Statalist!

This is my first time dealing with GPS dataset.
What I would like to do is to get the name of village for each observation.
I have longitude and latitude in my dataset.

And in other dataset, I have village code and coordinate (UTM)

What I would like to do is,
first, to convert longitude and latitude in the first dataset to UTM coordinate ( or vice versa)
then, to obtain the name of the village by calculating the nearest distance to the village. (Is it the right way?)

I am not so familiar with the code and this kind of method.
I would greatly appreciate for your help!

Thank you in advance.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long data_id str9 event_date int year float(longitude latitude)
4965162 "24-Jan-10" 2010  104.916 11.5624
4965805 "10-Jan-10" 2010  104.916 11.5624
4965858 "08-Jan-10" 2010  104.916 11.5624
4965422 "18-Jan-10" 2010  104.916 11.5624
4965365 "19-Jan-10" 2010 103.9192 12.5387
4965768 "11-Jan-10" 2010  104.916 11.5624
5152519 "10-Jan-10" 2010 104.9455 11.4464
4965107 "25-Jan-10" 2010  103.782 10.6301
4966027 "03-Jan-10" 2010 102.9736 13.5858
4965166 "24-Jan-10" 2010 104.6253  14.207
4965240 "22-Jan-10" 2010  104.916 11.5624
4966035 "03-Jan-10" 2010 104.8265 11.4135
4964886 "29-Jan-10" 2010 103.1344 12.2423
4964990 "27-Jan-10" 2010 103.6591 13.4444
4965634 "14-Jan-10" 2010  104.916 11.5624
4965758 "11-Jan-10" 2010 104.8795 11.3871
4965277 "21-Jan-10" 2010 102.5636  13.658
4965726 "12-Jan-10" 2010  104.916 11.5624
4965958 "05-Jan-10" 2010  104.916 11.5624
4965767 "11-Jan-10" 2010  104.916 11.5624
4965068 "26-Jan-10" 2010  103.782 10.6301
4965838 "09-Jan-10" 2010  104.916 11.5624
4964991 "27-Jan-10" 2010  104.916 11.5624
4965163 "24-Jan-10" 2010  103.782 10.6301
4965666 "13-Jan-10" 2010  104.916 11.5624
4965568 "15-Jan-10" 2010  104.916 11.5624
4953444 "31-Oct-10" 2010 103.8605 13.3617
4953734 "24-Oct-10" 2010  104.916 11.5624
4953567 "27-Oct-10" 2010  104.916 11.5624
4953674 "25-Oct-10" 2010  104.916 11.5624
4953673 "25-Oct-10" 2010 102.9736 13.5858
4954122 "13-Oct-10" 2010  104.916 11.5624
4953860 "20-Oct-10" 2010  104.916 11.5624
4953665 "25-Oct-10" 2010 102.9736 13.5858
5156265 "26-Oct-10" 2010   104.61   12.21
4954086 "14-Oct-10" 2010  104.916 11.5624
4952940 "17-Nov-10" 2010  104.916 11.5624
4953189 "10-Nov-10" 2010 104.2099 12.5325
4953007 "15-Nov-10" 2010  104.916 11.5624
4953160 "11-Nov-10" 2010 103.5295 10.6093
4953292 "06-Nov-10" 2010 103.5295 10.6093
4953219 "09-Nov-10" 2010 103.5295 10.6093
4953360 "03-Nov-10" 2010 105.4635 11.9933
4953271 "07-Nov-10" 2010 103.5295 10.6093
5451707 "17-Nov-10" 2010 103.1982 13.1027
4953270 "07-Nov-10" 2010  104.916 11.5624
4953191 "10-Nov-10" 2010 103.5295 10.6093
4953006 "15-Nov-10" 2010 102.9736 13.5858
4953137 "12-Nov-10" 2010 103.5295 10.6093
4953088 "13-Nov-10" 2010 103.7652 12.5758
4952891 "19-Nov-10" 2010 102.3708 13.5663
4952876 "20-Nov-10" 2010  104.916 11.5624
4952966 "16-Nov-10" 2010  104.916 11.5624
4953238 "08-Nov-10" 2010 103.5295 10.6093
5156266 "02-Nov-10" 2010  102.373 13.2971
4953405 "01-Nov-10" 2010 103.1344 12.2423
4953326 "04-Nov-10" 2010 103.5295 10.6093
4952965 "16-Nov-10" 2010  104.916 11.5624
4953190 "10-Nov-10" 2010  104.916 11.5624
4953311 "05-Nov-10" 2010 103.5295 10.6093
4953361 "03-Nov-10" 2010  104.916 11.5624
4951617 "28-Dec-10" 2010  104.916 11.5624
5156270 "15-Dec-10" 2010  102.373 13.2971
4951563 "29-Dec-10" 2010 104.1231 13.1025
4951818 "23-Dec-10" 2010 104.4274 11.7537
4952018 "16-Dec-10" 2010  104.916 11.5624
4951860 "22-Dec-10" 2010 104.4274 11.7537
4952350 "06-Dec-10" 2010  104.916 11.5624
5156268 "05-Dec-10" 2010 103.1982 13.1027
4952181 "12-Dec-10" 2010 106.4571  12.054
5156272 "20-Dec-10" 2010 102.3757 13.4319
4952234 "10-Dec-10" 2010 103.5295 10.6093
4951777 "24-Dec-10" 2010 104.4274 11.7537
4951616 "28-Dec-10" 2010 104.2099 12.5325
4952512 "01-Dec-10" 2010 102.9736 13.5858
4952393 "05-Dec-10" 2010  104.916 11.5624
4952357 "06-Dec-10" 2010  104.916 11.5624
4952097 "14-Dec-10" 2010  104.916 11.5624
4952356 "06-Dec-10" 2010 104.2099 12.5325
4951564 "29-Dec-10" 2010  104.916 11.5624
5156273 "29-Dec-10" 2010 102.7531 12.2088
4951914 "20-Dec-10" 2010  104.916 11.5624
5156269 "15-Dec-10" 2010 102.3558 13.3316
5156271 "19-Dec-10" 2010 102.3757 13.4319
4951819 "23-Dec-10" 2010  104.916 11.5624
4952067 "15-Dec-10" 2010  104.916 11.5624
5451703 "02-Feb-10" 2010 103.6957 14.4247
4964716 "02-Feb-10" 2010  104.916 11.5624
4963844 "18-Feb-10" 2010 103.6479 13.6838
5152531 "05-Feb-10" 2010  105.712   11.12
5156248 "17-Feb-10" 2010 104.9411 14.2154
4964625 "04-Feb-10" 2010 104.8265 11.4135
5156247 "10-Feb-10" 2010 103.2551 12.7558
4963349 "27-Feb-10" 2010 104.4274 11.7537
4964062 "15-Feb-10" 2010 104.6078 11.5175
4963968 "16-Feb-10" 2010 103.8605 13.3617
4963324 "28-Feb-10" 2010 104.4274 11.7537
4964061 "15-Feb-10" 2010  103.053 13.9398
4962244 "22-Mar-10" 2010  104.916 11.5624
4963178 "03-Mar-10" 2010 103.6957 14.4247
end

Code:
* Example generated by -dataex-. To install: ssc    install    dataex
clear
input long villcode float(vp_x_coord vp_y_coord)
1020101 287775.88 1494766.4
1020102  285878.9 1494760.4
1020103 285878.94 1494361.6
1020104  284880.7   1493561
1020105  284980.4 1494159.3
1020106 284081.97   1492960
1020107  285479.9   1491868
1020108  286877.6   1491972
1020109 286578.16   1491672
1020110 287376.78   1492173
1020111 288075.63   1492574
1020112 288674.66 1492775.3
1020113 285679.78   1489376
1020114 286378.66 1489378.3
1020115  287077.6 1489480.3
1020116 288575.13 1489584.6
1020117 289673.38   1489588
1020118 290571.97   1489591
1020119 291670.16 1489793.8
1020201 282783.97   1493654
1020202 282983.72 1493754.3
1020203 282684.13   1494551
1020204 282484.38   1494650
1020205 282883.63   1495748
1020206 283083.28 1495748.5
1020207 282983.34 1496645.5
1020208 282484.16   1496943
1020209  282783.7 1497043.6
1020210  282683.7   1497841
1020211 282284.25 1498637.3
1020301 279885.28   1488042
1020302 277992.38   1487956
1020303  279090.6   1487860
1020304 276694.53   1486955
1020305 276794.44 1486855.5
1020306 276494.97   1486057
1020307 276495.03 1485857.6
1020308    276495 1485558.5
1020309 276395.22   1485259
1020310  276095.9 1483862.5
1020311 278591.94 1483471.6
1020312 280089.56   1483177
1020313 281587.03 1483979.5
1020314    279690 1484671.4
1020315  280488.5   1485970
1020316  279190.8   1484670
1020317  280688.2   1486569
1020401 292369.03 1489895.6
1020402  293367.4   1489799
1020403 292868.16 1489797.6
1020404 294265.97 1490001.4
1020405 296761.84   1490408
1020406 295663.63 1490105.5
1020407  297760.2 1490610.5
1020408  301154.7   1491319
1020501 294264.94 1498874.4
1020502 292667.53   1498371
1020503  293765.8 1498673.5
1020504 293665.94 1498872.5
1020505  295762.5 1499178.3
1020506 296261.63 1499578.6
1020507 294863.94   1498976
1020508 293166.75 1498472.3
1020601  285680.2   1485787
1020602  285979.6 1486884.6
1020603 285080.97 1487280.6
1020604 284082.63 1486978.3
1020605 285679.94 1487980.4
1020606 286378.84   1487484
1020607 287676.78 1487288.8
1020608 287077.75 1487386.5
1020609  285080.8 1489274.5
1020610  283782.8 1489569.5
1020611  284182.2   1489870
1020612 284781.44 1487877.8
1020613 284681.25 1490768.8
1020614    284182 1491863.8
1020701 285980.13 1482597.6
1020702 285880.16 1483195.5
1020703 285680.34 1484889.6
1020704 286079.97 1481700.8
1020705  284082.8 1485582.5
1020706 284582.25 1483390.8
1020707 288196.66   1481468
1020708 287477.34 1485094.8
1020709  285081.5   1482794
1020710 285680.44   1484092
1020801 282284.97 1492555.8
1020802 282285.03 1491558.6
1020803 282085.47 1491059.5
1020804 281286.72 1490957.4
1020805 280987.34   1490059
1020806  279589.6   1489955
1020807 279589.66 1489556.3
1020808  279789.4 1488958.6
1020809  282085.7   1488268
1020810 281486.72   1488067
1020811 281486.84   1487369
1020812 283383.84 1486278.3
1020813 280288.44 1489857.5
end

How to dissect multiple variables in one column to separate columns

$
0
0
Hello good Sir/Madam,

I am currently working with a list of corporations and their CashFlowTypes, and I wish to see the effect of a certain type of cash flows on a dependent variable. For now, each CashFlowType makes up a new row per corporation, with an adjacent column stating the Amount of that CashFlowType, so say there are 6 types of cash flow, I have 6 rows per corporation. What I would want is 1 row per corporation with 6 different columns, each describing their respective cash flow variable.

Is this possible? If so, with what code could this be accomplished?

I included a picture so you may better understand what I am trying to bring across with my limited programming knowledge. The string variable is in Dutch if you were wondering.

My sincerest thanks in advance!

Jan Paul
Array

Testing and addressing potential attrition bias in panel data

$
0
0
Dear all,

I am using household panel data across three waves (2011, 2013, and 2015). Although the vast majority of households surveyed in 2011 appeared in year 2013 (with an attrition rate of < 5%), there exists a very large attrition rate (about 61%) between 2013 and 2015. It is most likely that the data is Missing Not At Random (MNAR), because households in some regions of the study country were affected by political unrest in year 2015. A closer look at the data indicated that most of the sample households left in the final round were in locations affected by the political unrest in the country.

Currently, I am focusing on the households that remained in the sample, but still worried whether a potential attrition bias could impact my analyses. So far, I have tried to deal with attrition bias using Inverse Probability Weighting (IPW) approach. Prior to calculating IPW, I tested whether attrition in my panel data model is random using a probit in which the dependent variables takes the value one for households which drop out of the sample after the first wave and zero otherwise. The test result indicated the attrition is nonrandom.

Yet, I am not quite sure if using IPW approach indeed addresses an issue arising from such a large attrition rate (>60%). Does anyone suggest whether using IPW approach addresses this attrition bias? Do you have any suggestion if this is an acceptable rate of attrition or whether there is statistically recommended/reasonable rate of attrition in general?

Many Thanks for your help!
Abebayehu

Limit replace observations for n years

$
0
0
Dear All,

I have the following data:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
country ISO3N YEAR BEGINrrs BEGINrrs_3
"Afghanistan" 4 1800 . .
"Afghanistan" 4 1801 . .
"Afghanistan" 4 1802 1 .
"Afghanistan" 4 1803 . .
"Afghanistan" 4 1804 . .
"Afghanistan" 4 1805 . .
"Afghanistan" 4 1806 . .
"Afghanistan" 4 1807 . .
"Afghanistan" 4 1808 . .
"Afghanistan" 4 1809 . .
"Afghanistan" 4 1810 . .
"Afghanistan" 4 1811 . .
"Afghanistan" 4 1812 . .
"Afghanistan" 4 1813 . .
"Afghanistan" 4 1814 1 .
"Afghanistan" 4 1815 . .
"Afghanistan" 4 1816 . .
"Afghanistan" 4 1817 . .
"Afghanistan" 4 1818 . .
end
I am trying to replace BEGINrrs_3=1 when BEGINrrs==1 and for the following 3 years, since I want to have a 3-year post-event clearly pointed out by this dummy.
For instance, BEGINrrs_3 should be equal to 1 in 1802 and the following 3 years: 1803, 1804, 1805.

I have tried using both replace playing with [_n] and [_n-1] weights and carryforward (to be installed from SSC), but I can't figure out how to limit these commands for my 3-year time window.

I would really appreciate your help.
Many thanks!

Best
Giovanni

Problem with generating and replacing variable

$
0
0
Hi,

I have written the following code but I am facing a lot of problems.

Code:
gen tetns_inj=.
 replace tetns_inj=1 if (Tetns_inj_before_birth==0 | Tetns_inj_before_birth==8)
replace tetns_inj=1 if (tetns_inj==.) & (Tetns_inj_before_preg==0 | Tetns_inj_before_preg==8)

Alternatively,
Code:
gen tetns_inj=1 if (Tetns_inj_before_birth==0 | Tetns_inj_before_birth==8) | ( Tetns_inj_before_preg==0 |  Tetns_inj_before_preg==8)
Each time i get the result shown in the picture. Line number 44 should have . in the first column. Instead there is a 1.

Calculate median collecting observations from different variables in wide format

$
0
0
Dear all,


my data structure looks like this.
good 1 price 1 good 2 price 2 etc.
milk 43 egg 43 ..
egg 53 milk 22 ...
egg 21 coffee 4 ...
coffee 10 bread 55 ...
I was wondering how I can calculate the median for different goods picking the observations without having to reshape?

Thank you!

Expanding Observations on Panel data

$
0
0
I have the following data:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str157 id str18 region float(t1 t2 ten_dur_monthly)
"AL1 1AJ-14--ORIENT CLOSE-ST ALBANS-ST ALBANS-HERTFORDSHIRE" "East" 606 691 85
"AL1 1AJ-18--ORIENT CLOSE-ST ALBANS-ST ALBANS-HERTFORDSHIRE" "East" 605 664 59
"AL1 1AJ-22--ORIENT CLOSE-ST ALBANS-ST ALBANS-HERTFORDSHIRE" "East" 617 706 89
"AL1 1AJ-26--ORIENT CLOSE-ST ALBANS-ST ALBANS-HERTFORDSHIRE" "East" 606 655 49
"AL1 1AJ-28--ORIENT CLOSE-ST ALBANS-ST ALBANS-HERTFORDSHIRE" "East" 614 701 87
end
format %tm t1
format %tm t2
To explain the context of the data, these are all different housing transactions, where:
- ID = Address
- region = the region of the UK where the house is located
- t1 = Date Bought
- t2 = Date Sold

The data ranges from 2000-M1 to 2019-M6.

My final aim is to have a Panel dataset where I have a month-year time variable (full range as observed in the data), and the average tenure duration for each region at each point in time. To do this, I was thinking that I need to create firstly the time variable and then, for each address, have a variable that indicates whether tenure has commenced or not, based on the values of t1. Then, we would continue counting the months until t2 is reached. At that point, the house is sold and thus tenure finishes.

I'm struggling with the code to do this. Do you have any suggestions?

Simpson paradox, how to visualize positive tendency when the coefficient doesnt outweigh absolute coefficient of the neg. general tendency?

$
0
0
Hi everyone,

I would like to have your opinion of the following. I am performing a regression on the change in news risk (independent) on weekly return (dependent variable).I would like to know how to visualize a positive underlying tendency for groups in the data (simpson's paradox) when this positive tendency does not outweigh the negative general tendency.

Dummies were included for ext1-ext11, ext1 respresenting the highest group of absolute changes in news risk and ext11 the lowest (dummies taking on 1 when group is equal to i with i taken from 1 to 11). When multiplying the dummies by the change in news risk variable, the coefficient is positive and more positive for higher absolute changes in news risk than for lower absolute changes as can be seen in the table. However, because the overall variable for change in news risk has a higher absolute coefficient than any of these positive coefficients, the overall effect and slope of these groups remains negative (as opposed to some graphs I found online). This is the reason why it is harder for me to visualize the positive tendency for groups behind the graph, simply because this effect does not outweigh the negative general tendency. For example the change in news risk has a coefficient of -30 and the first group (ext1) has a positive coefficient of 27. The positive coefficient does not outweigh the absolute value of the negative coefficient and hence the positive tendency is not visable in my graphs (when making a graph for a certain group).

Any ideas on how I can visualize this positive tendency ? And how I can come up with a formula that captures these positive tendencies of groups?

Array

** Note, i also included the dummies itself in the regression in the figure (without the multiplication but as my question is only about these variables, I only demonstrated these here so it is easier to compare)

Identifying coworkers in spell data

$
0
0
I have a dataset of worker employment spells, and I would like a general way to identify an individual's co-workers when the individual starts a job. Consider the following example dataset:
Code:
input worker_id firm_id start_date end_date
1 1 1 4
1 2 5 10
2 1 2 8
2 2 9 10
3 1 6 7
4 3 2 7
end
When worker 1 starts work in firm 1, she has no co-workers, same when she starts in firm 2. When worker 2 starts in firm 1, worker 1 is her co-worker until time period 4, and worker 3 is her co-worker from time period 6 on. When worker 3 starts, worker 2 is her co-worker for the time worker 3 is in . When worker 4 starts, she has no co-workers.

The following code illustrates something like what I'm after
Code:
levelsof firm_id, local(firms)

tempfile worker_ds
save "`worker_ds'"
clear
gen worker_id = .
tempfile coworker_ds
save "`coworker_ds'"

foreach j of local firms {

    di ""
    di "results for firm `j'"
    di ""
    
    use "`worker_ds'", clear
    keep if firm_id == `j'
    count
    local obs = r(N)
    total_coworkers = .
    
    * identify set of coworkers for each worker in firm j
    forvalues i = 1/`obs' {
        local coworkers`i' ""
        local n_coworkers = 0
        forvalues ii = 1/`obs' {
            if start_date[`ii'] < end_date[`i'] & end_date[`ii'] > start_date[`i'] & `i' != `ii' {
                local nextworker = worker_id[`ii']
                local coworkers`i' `coworkers`i'' `nextworker'
                local n_coworkers = `n_coworkers' + 1
            }
        }
        replace total_coworkers = `n_coworkers' if _n == `i'
    }
    
    * create a worker-coworker level dataset
    expand total_coworkers_all, generate(expand_obs)
    gen coworker_id = .
    sort worker_id
    local n = 1
    
    * fill in values for coworker_id variable
    forvalues i = 1/`obs' {
        di "`i''s coworkers are `coworkers`i''"
        foreach id of local coworkers`i' {
            replace coworker_id = `id' if _n == `n'
            local n = `n' + 1
        }
    }
    
    append using "`coworker_ds'"
    save "`coworker_ds'", replace

}
I say "general way" because while this code would do what I want on test data, it would not in general work. I have 40 million spells, so looping, replacing, or sorting repeatedly will take far too long. The code could also fail in various ways: (i) I could easily hit the size limit for the macro -coworkers`i'-; (ii) the code assumes that each worker works a single spell in a firm; (iii) I'm sure people can come up with other reasonble cases in which this code would fail.

I realise this a broad, somewhat vague question, but it would already be great if anyone has suggestions for commands or features of stata's syntax I could exploit to radically speed this up and make is more robust, or could point me towards references that might have suggestions for how to go about solving this. Even improving individuals steps would be helpful.

There are many things I would like to subsequently do with these data. Two important ones are calculate the number of coworkers with certain charactersitics a worker has during a given spell, and see whether a worker has former coworkers in a firm in a subsequent spell with a different firm. I am using Stata 14.1

Cosinor analysis on iodine dispersion

$
0
0
Hello stata users,

I am trying to perform a cosinor analysis to determine the effect of season on iodine dispersion.
Here is my code:
Array
This code produces this graph:
Array

The problem is that I am trying to produce a graph similar to the one pictured here:
Array
(plucked from https://www.statalist.org/forums/forum/general-stata-discussion/general/1223983-trigonometric-regression)

but I cannot seem to get this code to produce a graph like this. Does anyone have any suggestions?


Thank you for your time and help!!

Psmatch2 standard errors

$
0
0
I have conducted psm using kernel density matching. Since this produces multiple matches i need to adjust for non-independence as here: https://www.google.com/url?sa=t&sour...=1563368902329. I'm not at my computer so i can't refresh my memory, but as i recall, altvariance is not available for kernel density matching. I can't remember if ai(int) is, but even if it is the integer specification cannot be a constant since the number of matches varies. Any ideas on how to adjust the SEs? Bootstrapping seems to be contraindicated.

Problem with margins and vce(unconditional)

$
0
0
Dear All,

I am running a svy:probit and typing margins, vce(unconditional) later.

However, I am getting the following error:


. margins, vce(unconditional)
unconditional standard errors derived assuming full estimation sample;
indepvars dropped observations from the estimation sample
r(459);


I understand this is because my covariates have some missing data, and svy:probit automatically drops those observations while running the regressions.

Is there anyway to get around this problem? I'm not using the subpop() option, but there is a if qualifier at the end of the regression:

svy: probit lfcat c.age##c.age i.female i.educ i.training i.married WApop==1, allbase
margins, vce(unconditional)

Any help would be highly appreciated.

Ipdmetan: how can I get p-values?

$
0
0
Dear all,

I am using IPDmetan after glm with different types of outcomes : count and continuous.
This is an example of my syntax for the 7 continuous outcomes I have:
Code:
foreach i in 01 02 03 04 05 06 07 {
quietly ipdmetan, nograph study(study) saving(Data\Temp\contin`i'ga0.dta, replace): glm contin`i' arm if ga==0
  }
Then I append the datasets created from the "saving" option in order to compile the results in an Excel table later. Below an example of the data I get after IPDmetan.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(_USE _STUDY) str27 _LABELS double(_ES _seES _LCI _UCI _WT) int _NN str19 _EFFECT
1 1 "Trial1"                         .0010613075941062952  .06632984378750287 -.12894279732956715 .13106541251777976  42.59066359681475 1715 "0.00 (-0.13, 0.13)" 
1 2 "Trial2"                        .24214579475676037   .0960527743520479   .0538858164115939 .43040577310192685 20.310121853648674  659 "0.24 (0.05, 0.43)"  
1 3 "Trial3"                         -.0695742444152431   .2143070457804755 -.48960833577815155  .3504598469476653  4.079993362829599  175 "-0.07 (-0.49, 0.35)"
1 4 "Trial4"                 -.03980224575512546  .07533256676238938 -.18745136347236774 .10784687196211683  33.01922118670698 1084 "-0.04 (-0.19, 0.11)"
5 . "Overall (I-squared = 51.1%)"  .033651107808174985 .043287866134241654 -.05119155078252974 .11849376639887971                100 3633 "0.03 (-0.05, 0.12)" 
end
label values _STUDY _STUDY
label def _STUDY 1 "Trial1", modify
label def _STUDY 2 "Trial2", modify
label def _STUDY 3 "Trial3", modify
label def _STUDY 4 "Trial4", modify
Thanks for your help

Possible bug in &quot;label save&quot; in Stata 16?

$
0
0
Hi everyone,

I wanted to bring this up in the forum in case I'm missing something obvious, and it's not actually a bug. The following code will create a do file with the value labels in Stata 15, but will create a blank do file in Stata 16.

Code:
sysuse auto, clear
label save using formatfile, replace
Am I missing something here?

Thanks,
John

existing variable values as new variables

$
0
0
hi my data goes like this:

var1 var2
10000 a
10000 b
10000 c
10001 a
10001 b

i want my data to look like this:

var1 a b c
10000 1 1 1
10001 0 1 0


please help

Compare coefficients across models for panel

$
0
0
I want to test the difference between the coeffiecients on " huaf "and " lauf " which are ratios of high uncertainty avoiding individuals in a group and low uncertainty avoiding individuals in the group, respectively.

xtreg risk huaf i.year ,re vce (cluster msacode)

xtreg risk lauf i.year ,re vce (cluster msacode)

How can this be done after xtreg and also if I want to use xtivreg when I will need to include instrumental variables?

Thank you,

Running regressions in a loop

$
0
0
Hi all,

I have got a large dataset available which has 14 yearly datapoints for several independent variables available for a large number of firms.
I am trying to build a code which loops a number of regressions for a firm. Please see attached a data file which has the variables for two firms.

Precisely, I want the code to regress the data from t-14 until and including t-5 to forecast t+5 (In the dataset the time is given without the 't', as I couldn't find how to add this). In all of these regressions, the dependent variable is E_Next_T and all the independent variables are NegE, E, NegE_Times_E, B and TACC.

Then I want the code to regress the data from t-13 until and including t-4 to forecast t+4, regress t-12 until and including t-3 to forecast t+3, regress t-11 until and including t-2 to forecast t+2, regress t-10 until and including t-1 to forecast t+1.

This way I will find a wide number of outcomes for each independent variable, and I would like Stata to store every factor obtained from the regressions for every independent variable for T+1, t+2, t+3, t+4 and t+5 (I need them later to backtest whether E_Next_T is similar to the real values). So, I need to store the independent variables obtained from the regressions and their p-values in vectors.

Then, I want Stata to do this exact same process for every firm.

Could anyone please help me? I've provided a sample data set here:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(Time Firm NegE E NegE_Times_E B TACC E_Next_T group)
-14 1 0 -50   0 4000 234 200 1
-13 1 1 200 200 4000 452 -40 1
-12 1 0 -40   0 4200 534 400 1
-11 1 1 400 400 4200 123 300 1
-10 1 1 300 300 4200 342 -10 1
 -9 1 0 -10   0 4200 454 430 1
 -8 1 1 430 430 4200 235 520 1
 -7 1 1 520 520 4200 672 -20 1
 -6 1 0 -20   0 3900 312 420 1
 -5 1 1 420 420 3900 213 350 1
 -4 1 1 350 350 3900 673 230 1
 -3 1 1 230 230 4000 456 300 1
 -2 1 1 300 300 4000 345 240 1
 -1 1 1 240 240 4000 734 345 1
-14 2 0 -40   0 7000 741 340 2
-13 2 1 340 340 7000 734 560 2
-12 2 1 560 560 7000 236 700 2
-11 2 1 700 700 7000 452 800 2
-10 2 1 800 800 7000 752 -25 2
 -9 2 0 -25   0 7000 348 300 2
 -8 2 1 300 300 7000 742 350 2
 -7 2 1 350 350 7000 345 360 2
 -6 2 1 360 360 7400 435 400 2
 -5 2 1 400 400 7400 345 450 2
 -4 2 1 450 450 7400 673 430 2
 -3 2 1 430 430 7400 634 470 2
 -2 2 1 470 470 7400 237 500 2
 -1 2 1 500 500 7400 732 470 2
end

Time Series Regression

$
0
0
Hi all,
I'm working on the following sample.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 firm double(fyear earnings)
"000001" 2010  339.84
"000001" 2011 268.608
"000001" 2012  89.147
"000001" 2013 -12.965
"000001" 2014  68.098
"000001" 2015 101.776
"000001" 2016   63.47
"000001" 2017  95.495
"000001" 2018  113.39
"000002" 2010  35.457
"000002" 2011 -37.225
"000002" 2012  14.026
"000002" 2013 -49.064
"000002" 2014 -50.071
"000002" 2015  13.386
"000002" 2016  56.673
"000002" 2017  93.248
"000002" 2018 136.921
end
I would like to estimate the following linear equation using a time-series approach:
earnings(t+1)=a+w*earnings(t)
In particular, I need to calculate this equation using rolling three-year windows for each firm and each year. For example, to forecast earnings for 2018, one would observe the earnings using data from 2017 and estimate the historical persistence of earnings (given by coefficient w) using data from 2014 to 2017.
How could I do that?
Thanks for the attention.

Daniel

Time-series analysis

$
0
0
Hello everyone!

I want to create a graph base on several rounds of ESS survey years. I have immigration attitude variable for each year ranges 0-10, social class variable which has values for low class, middle and upper class.
My idea is to create a line graph which will indicate mean value of each type of class expression of immigration attitude for each year.
I am struggling to create a proper graph and tried most of options I found in the internet.
Could you help me with advise please!

xtabond2 model specification

$
0
0
The AR stats keeps being significant until I raised the lagged dependent variable to the 4th lagged term. So I ran the model below

Code:
xtabond2 proactivity l(1/4)(proactivity) generalcrime feb mar apr may jun jul aug sep oct nov dec, gmm( proactivity generalcrime, lag(5 8)) iv( feb mar apr may jun jul aug sep oct nov dec, equation(level)) twostep robust artests(5)
I tried l.proactivity, l(1/2)(proactivity),l(1/3)(proactivity), this is the only one that has insignificant AR stats beyond AR(1). The model output looks okay below. But it appears very sensitive to model specification. If I add collapse, which isn't supposed to change the results, results change. Any thoughts on why it changes, or other ways to specify the model?

Code:
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step est
> imation.
  Difference-in-Sargan/Hansen statistics may be negative.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: id                              Number of obs      =     25104
Time variable : week                            Number of groups   =       523
Number of instruments = 470                     Obs per group: min =        48
Wald chi2(16) =  14145.86                                      avg =     48.00
Prob > chi2   =     0.000                                      max =        48
------------------------------------------------------------------------------
             |              Corrected
 proactivity |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 proactivity |
         L1. |   .4472726   .0460867     9.71   0.000     .3569444    .5376008
         L2. |   .2588113   .0712742     3.63   0.000     .1191166    .3985061
         L3. |   .1298111   .0540169     2.40   0.016     .0239399    .2356823
         L4. |   .1308565   .0247562     5.29   0.000     .0823352    .1793778
             |
generalcrime |   .0577832   .0373932     1.55   0.122    -.0155061    .1310726
         feb |  -.5542773   .1725027    -3.21   0.001    -.8923764   -.2161782
         mar |  -.7677247   .1638914    -4.68   0.000    -1.088946   -.4465035
         apr |  -.6148198   .1516915    -4.05   0.000    -.9121296     -.31751
         may |  -.6942948   .1564604    -4.44   0.000    -1.000952   -.3876381
         jun |  -.7967815    .156679    -5.09   0.000    -1.103867   -.4896964
         jul |   -.696932   .1484477    -4.69   0.000    -.9878841   -.4059799
         aug |  -.8511846   .1570522    -5.42   0.000    -1.159001    -.543368
         sep |  -.6681694   .1469222    -4.55   0.000    -.9561316   -.3802071
         oct |  -.6614347   .1601904    -4.13   0.000    -.9754021   -.3474672
         nov |  -.8454832   .1686919    -5.01   0.000    -1.176113   -.5148532
         dec |   -.536543   .1588977    -3.38   0.001    -.8479767   -.2251092
       _cons |   .6442327   .2119702     3.04   0.002     .2287787    1.059687
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(5/8).(proactivity generalcrime)
Instruments for levels equation
  Standard
    feb mar apr may jun jul aug sep oct nov dec
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL4.(proactivity generalcrime)
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -6.31  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =   0.71  Pr > z =  0.477
Arellano-Bond test for AR(3) in first differences: z =  -0.81  Pr > z =  0.416
Arellano-Bond test for AR(4) in first differences: z =   0.96  Pr > z =  0.335
Arellano-Bond test for AR(5) in first differences: z =  -0.97  Pr > z =  0.334
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(453)  =1776.52  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(453)  = 473.71  Prob > chi2 =  0.242
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(359)  = 398.55  Prob > chi2 =  0.074
    Difference (null H = exogenous): chi2(94)   =  75.16  Prob > chi2 =  0.923
  iv(feb mar apr may jun jul aug sep oct nov dec, eq(level))
    Hansen test excluding group:     chi2(442)  = 463.10  Prob > chi2 =  0.235
    Difference (null H = exogenous): chi2(11)   =  10.62  Prob > chi2 =  0.476
Code:
. xtabond2 proactivity l(1/4)(proactivity) generalcrime feb mar apr may jun jul aug
> sep oct nov dec, gmm( proactivity generalcrime, lag(5 8) collapse) iv( feb mar apr
>  may jun jul aug sep oct nov dec, equation(level)) twostep robust artests(5)
Favoring space over speed. To switch, type or click on mata: mata set matafavor spee
> d, perm.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: id                              Number of obs      =     25104
Time variable : week                            Number of groups   =       523
Number of instruments = 22                      Obs per group: min =        48
Wald chi2(16) =    133.97                                      avg =     48.00
Prob > chi2   =     0.000                                      max =        48
------------------------------------------------------------------------------
             |              Corrected
 proactivity |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 proactivity |
         L1. |   .0661376   .4625995     0.14   0.886    -.8405409     .972816
         L2. |   1.067306   .2863359     3.73   0.000     .5060984    1.628514
         L3. |   .3243474   .4095168     0.79   0.428    -.4782908    1.126986
         L4. |  -.0056103   .0693738    -0.08   0.936    -.1415804    .1303599
             |
generalcrime |  -.2164601   .2403091    -0.90   0.368    -.6874574    .2545372
         feb |  -.1236117   .9449195    -0.13   0.896     -1.97562    1.728396
         mar |  -.5567039   .8190546    -0.68   0.497    -2.162021    1.048613
         apr |  -.3190208   .8371222    -0.38   0.703     -1.95975    1.321709
         may |  -.2762736   .7620428    -0.36   0.717     -1.76985    1.217303
         jun |    -.41769   .7011003    -0.60   0.551    -1.791821    .9564412
         jul |   -.212813   .7252157    -0.29   0.769     -1.63421    1.208584
         aug |   -.437525   .7145739    -0.61   0.540    -1.838064    .9630141
         sep |   -.309892   .8271875    -0.37   0.708     -1.93115    1.311366
         oct |  -.3654707   .8021538    -0.46   0.649    -1.937663    1.206722
         nov |  -.7637196   .7350004    -1.04   0.299    -2.204294    .6768547
         dec |  -.2051388   .8173252    -0.25   0.802    -1.807067    1.396789
       _cons |  -.8427515   1.483277    -0.57   0.570     -3.74992    2.064417
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(5/8).(proactivity generalcrime) collapsed
Instruments for levels equation
  Standard
    feb mar apr may jun jul aug sep oct nov dec
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL4.(proactivity generalcrime) collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -0.25  Pr > z =  0.800
Arellano-Bond test for AR(2) in first differences: z =  -2.32  Pr > z =  0.020
Arellano-Bond test for AR(3) in first differences: z =   0.22  Pr > z =  0.823
Arellano-Bond test for AR(4) in first differences: z =   1.03  Pr > z =  0.305
Arellano-Bond test for AR(5) in first differences: z =   1.22  Pr > z =  0.224
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(5)    =  29.50  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(5)    =  10.78  Prob > chi2 =  0.056
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(3)    =   4.08  Prob > chi2 =  0.253
    Difference (null H = exogenous): chi2(2)    =   6.70  Prob > chi2 =  0.035
Viewing all 73243 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>