java - Read file and write file which has characters in UTF - 8 (different language) -
i have file has characters like: " joh 1:1 ஆதியிலே வார்த்தை இருந்தது, அந்த வார்த்தை தேவனிடத்திலிருந்தது, அந்த வார்த்தை தேவனாயிருந்தது. "
www.unicode.org/charts/pdf/u0b80.pdf
when use following code:
bufferedwriter = new bufferedwriter (new outputstreamwriter(system.out, "utf8")); the output boxes , other weird characters this:
"�p�^����o֛���;�<�ayՠ؛"
can help?
these complete codes:
file f=new file("e:\\bible.docx"); reader decoded=new inputstreamreader(new fileinputstream(f), standardcharsets.utf_8); bufferedwriter = new bufferedwriter (new outputstreamwriter(system.out, standardcharsets.utf_8)); char[] buffer = new char[1024]; int n; stringbuilder build=new stringbuilder(); while(true){ n=decoded.read(buffer); if(n<0){break;} build.append(buffer,0,n); bufferedwriter.write(buffer); } 
the stringbuilder value shows utf characters when displaying in window shows boxes..
found answer problem!!! encoding correct (i.e utf-8) java reads file utf-8 , string characters utf-8, problem there no font display in netbeans' output panel. after changing font output panel (netbeans->tools->options->misc->output tab) got expected result. same applies when displayed in jtextarea(font needs changed). can't change font windows' cmd prompt.
because output encoded in utf-8, still contains replacement character (u+fffd, �), believe problem occurs when read data.
make sure know encoding input stream uses, , set encoding inputstreamreader according. if that's tamil, guess it's in utf-8. don't know if java supports tace-16. this…
stringbuilder buffer = new stringbuilder(); try (inputstream encoded = ...) { reader decoded = new inputstreamreader(encoded, standardcharsets.utf_8); char[] buffer = new char[1024]; while (true) { int n = decoded.read(buffer); if (n < 0) break; buffer.append(buffer, 0, n); } } string verse = buffer.tostring();
Comments
Post a Comment