mapreduce - Cassandra Hadoop map reduce with wide rows ignores slice predicate -
i have wide row column family im trying run map reduce job against. cf time ordered collection of events, column names timestamps. need run mr job against specific date range in cf.
when run job widerow property set false, expected slice of columns passed mapper class. when set widerow true, entire column family processed, ignoring slice predicate.
the problem have use widerow support, number of columns in slice can grow large , consume memory if loaded in 1 go.
i've found jira task outlines issue, has been closed off "cannot reproduce" - https://issues.apache.org/jira/browse/cassandra-4871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel
im running cassandra 1.2.6 , using cassandra-thrift 1.2.4 & hadoop-core 1.1.2 in jar. cf has been created using cql3.
its worth noting occurs regardless of whether use slicerange or specify columns using setcolumn_names() - still process of columns.
any massively appreciated.
so seems design. in word_count example in github, following comment exists:
// cause predicate ignored in favor of scanning wide row confighelper.setinputcolumnfamily(job.getconfiguration(), keyspace, column_family, true);
urrrrgh. fair enough then. seems crazy there no way limit columns when using wide rows though.
update
apparently solution use new apache.cassandra.hadoop.cql3 library. see new example on github reference: https://github.com/apache/cassandra/blob/trunk/examples/hadoop_cql3_word_count/src/wordcount.java
Comments
Post a Comment