mapreduce - Cassandra Hadoop map reduce with wide rows ignores slice predicate -

May 15, 2012

i have wide row column family im trying run map reduce job against. cf time ordered collection of events, column names timestamps. need run mr job against specific date range in cf.

when run job widerow property set false, expected slice of columns passed mapper class. when set widerow true, entire column family processed, ignoring slice predicate.

the problem have use widerow support, number of columns in slice can grow large , consume memory if loaded in 1 go.

i've found jira task outlines issue, has been closed off "cannot reproduce" - https://issues.apache.org/jira/browse/cassandra-4871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel

im running cassandra 1.2.6 , using cassandra-thrift 1.2.4 & hadoop-core 1.1.2 in jar. cf has been created using cql3.

its worth noting occurs regardless of whether use slicerange or specify columns using setcolumn_names() - still process of columns.

any massively appreciated.

so seems design. in word_count example in github, following comment exists:

// cause predicate ignored in favor of scanning wide row confighelper.setinputcolumnfamily(job.getconfiguration(), keyspace, column_family, true);

urrrrgh. fair enough then. seems crazy there no way limit columns when using wide rows though.

update

apparently solution use new apache.cassandra.hadoop.cql3 library. see new example on github reference: https://github.com/apache/cassandra/blob/trunk/examples/hadoop_cql3_word_count/src/wordcount.java

Search This Blog

Error

mapreduce - Cassandra Hadoop map reduce with wide rows ignores slice predicate -

Comments

Post a Comment

Popular posts from this blog

basic authentication with http post params android -

c++ - End of file on pipe magic during open -

vb.net - Virtual Keyboard commands -