indexing - detecting jumps on pandas index dates -
i managed load historical data on data series on large set of financial instruments, indexed date.
i plotting volume , price information without issue.
what want achieve determine if there big jump in dates, see if missing large chunks of data.
the idea had in mind somehow plot difference in between 2 consecutive dates in index , if number superior 3 or 4 ( bigger week end , bank holiday on friday or monday ) there issue.
problem can figure out how compute df[next day]-df[day], df indexed day
you can use shift
series method (note datetimeindex method shifts freq):
in [11]: rng = pd.datetimeindex(['20120101', '20120102', '20120106']) # datetimeindex df.index in [12]: s = pd.series(rng) # df.index instead of rng in [13]: s - s.shift() out[13]: 0 nat 1 1 days, 00:00:00 2 4 days, 00:00:00 dtype: timedelta64[ns] in [14]: s - s.shift() > pd.offsets.day(3).nanos out[14]: 0 false 1 false 2 true dtype: bool
depending on want, perhaps either any, or find problematic values...
in [15]: (s - s.shift() > pd.offsets.day(3).nanos).any() out[15]: true in [16]: s[s - s.shift() > pd.offsets.day(3).nanos] out[16]: 2 2012-01-06 00:00:00 dtype: datetime64[ns]
or perhaps find maximum jump (and is):
in [17]: (s - s.shift()).max() # it's weird returns series... out[17]: 0 4 days, 00:00:00 dtype: timedelta64[ns] in [18]: (s - s.shift()).idxmax() out[18]: 2
if wanted plot this, plotting difference work:
(s - s.shift()).plot()
Comments
Post a Comment