Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 72762

date string: extraction of correct dates

$
0
0
Dear community,

I have date strings exported from surveymonkey in the following format:

Code:
. list date1 in 1

     +---------------------+
     |               date1 |
     |---------------------|
  1. | 10/11/2015 13:04:11 |
All I am trying to achieve is to transform this into a stata compatible format, i.e. %tc.
Here's my strategy and the description of the problems I encounter:

1. I am extracting the months, days, years, hours, minutes, and seconds via

Code:
gen        month    = substr(date1,1,2)
gen        day        = substr(date1,4,2)
gen        year    = substr(date1,7,4)
gen        hour    = substr(date1,12,2)
gen        minute    = substr(date1,15,2)
gen        second    = substr(date1,18,2)
So far, this works pretty well, as documented in this example (first observation):

Code:
.    list date1 month-second in 1

    +-------------------------------------------------------------------+
    date1   month   day   year   hour   minute   second
    -------------------------------------------------------------------
    1.  10/11/2015 13:04:11      10    11   2015     13        4       11
    +-------------------------------------------------------------------+
2. I am trying to bring all of this together via the the complete date function, and the following happens (again example first observation):

Code:
. gen             datefull = mdyhms(month, day, year, hour, minute, second)

. format  datefull        %tc

. list    date1 datefull in 1

     +------------------------------------------+
     |               date1             datefull |
     |------------------------------------------|
  1. | 10/11/2015 13:04:11   11oct2015 13:05:08 |
     +------------------------------------------+
As you can see, I am getting the correct date but incorrect minutes/seconds. More precisely, the resulted time is 57 seconds later than the original time. However, this is not a constant. For example, in the 1000th observation, the resulting time is 38 seconds before the original time. Therefore, I do not believe this is somehow related to the leap seconds, because it should be more or less constant across all observations (time span is less than 2 months).

3. I am suspecting that the problem is related to the day being not extracted accurately for some reason. If I extract "MDY" directly via the date() function and then transform the result into milliseconds via cofd(), I am not getting the exact day:

Code:
. **      a) transform date1 into days since 01/01/1960:
. gen             d1      = date(date1, "MDYhms")

. **      b) transform days into milliseconds
. gen             d2      = cofd(d1)

. format  d2      %tc

. list    d2      in 1, notrim    

                       d2  
  1.   11oct2015 00:00:53
I am hoping to have provided enought details for you to understand my problem but I'd be happy to provide more information if necessary.

Thanks in advance,
Jakob

Viewing all articles
Browse latest Browse all 72762

Trending Articles