cluster analysis - clustering timestamp with timezone from twitter data -
i have postgres database tweets downloaded , use timestamp timezone column store current_timestamp. want cluster tweets great guy did
https://gis.stackexchange.com/questions/11567/spatial-clustering-with-postgis
but instead of geo-clustering want make time-clustering. mean want cluster tweets groups current_timestamp column. example have 10 tweets:
time | text | tweet_id 2013-07-29 11:17:08.153+03 | text | 12345600bsa9 2013-07-29 11:19:08.153+03 | text | ang698f4s8s4 .. 2013-07-29 16:41:00.968+03 | hello | 6546448965445 2013-07-29 16:43:00.968+03 | world | w9087ol0930j3
so these 4 tweets want make 2 clusters (cluster checking hour distance) 1 cluster 11:.. hour , 1 16:.. hour. of course want extend day cluster, month cluster etc.. assistance guys? in advance
sort data.
define temporal threshold, e.g. 1 hour. if gap previous time larger this, split 2 clusters.
time 1-dimensional; not cluster analysis. 1 dimensional data can sorted, , processed series; easier.
Comments
Post a Comment