java - How to set up output files based on key names? -
how set output files based on key names?
for example, take popular wordcount example in mapreduce. if give arbitrary file correct syntax should find keys (words) , frequency of appearances. how output each key filename , inside file value?
(i'm asking because current understanding multipleoutputs still need specify specific filename use)
using hadoop 0.20.205.0
(also can point me tutorials hadoop version?)
with this function in multipleoutputs don't need pre-sepecify file names anywhere when initializing job.
use reducer -
void write(k key, v value, string baseoutputpath); baseoutputpath can string representation of key.
e.g. void write(k key, v value, getfilename(key))
getfilename(k key){ return (key.tostring()); } please have @ examples in links, you'll idea.
moreover, don't need user context.write() reducer. rather use multipleouputs' write() function only.
that makes dynamic, in view, if that's want.
note (after comment):
since, said cannot use multipleoutputs, way can that.
- since reduce function deal 1 key, instead of doing context.write(key, value) can use hadoop filesystem api create file (with key name) in hdfs.
something like:
filesystem fs = file.getfilesystem(context.getconfiguration()); fsdataoutputstream fileout = fs.create(key.tostring()); create() function return fsdataoutputstream object. use write() function write file.
close filesystem object after done. -
fs.close();
Comments
Post a Comment