We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpSyntax Questions › How to read Unicode text files
Page Index Toggle Pages: 1
How to read Unicode text files (Read 4505 times)
How to read Unicode text files
Sep 28th, 2007, 4:04pm
 
Hi

I've developed an applet on my Mac, and now on Windows machines some displayed texts (read from a file with loadStrings()) contain wrong characters (the umlaute). I think I have to put my texts in a Unicode text file.
Is there an easy way to read text from a Unicode text file?

Thanks for any help.
Adi
Re: How to read Unicode text files
Reply #1 - Sep 28th, 2007, 4:52pm
 
i think the problem is that java reads / writes in the computers encoding. so, if the file was made on a mac and is read on win java assumes it's encoded for/by win. unicode won't help you there as java does not attempt to detect the encoding (which would slow down every read/write process and detection might not even be possible in any case). i think internally java already uses unicode.

anyway here's what might work (more or less directly copied from processings source):

Code:


void setup (){
String[] mytest = loadStrings("myMacFile.txt", "ISO-8859-1");
}

void draw (){
}

static public String[] loadStrings(File file, String _enc) {
InputStream is = openStream(file);
if (is != null) return loadStrings(is, _enc);
return null;
}

public String[] loadStrings(String filename, String _enc) {
InputStream is = openStream(filename);
if (is != null) return loadStrings(is,_enc);

System.err.println("The file \"" + filename + "\" " +
"is missing or inaccessible, make sure " +
"the URL is valid or that the file has been " +
"added to your sketch and is readable.");

return null;
}

static public String[] loadStrings(InputStream input, String _enc)
{
try {
BufferedReader reader =
new BufferedReader(new InputStreamReader(input, _enc));

String lines[] = new String[100];
int lineCount = 0;
String line = null;
while ((line = reader.readLine()) != null) {
if (lineCount == lines.length) {
String temp[] = new String[lineCount << 1];
System.arraycopy(lines, 0, temp, 0, lineCount);
lines = temp;
}
lines[lineCount++] = line;
}
reader.close();

if (lineCount == lines.length) {
return lines;
}

// resize array to appropriate amount for these lines
String output[] = new String[lineCount];
System.arraycopy(lines, 0, output, 0, lineCount);
return output;

}
catch (IOException e) {
e.printStackTrace();
//throw new RuntimeException("Error inside loadStrings()");
}
return null;
}


all you should have to do is supply the correct encoding of the .txt file to the encoding-savvy loadStrings() ..

/F
Re: How to read Unicode text files
Reply #2 - Sep 28th, 2007, 5:28pm
 
Thank you very much for this immediate and perfect answer!

I tried to expand the solution to another encoding scheme: to Unicode (when I saved the file in my editor I had to check the "include byte-order mark" checkbox). I get internally the same strings with:

void setup () {
 String[] mytest = loadStrings("myMacFile.txt", "ISO-8859-1");
 String[] mytestU = loadStrings("myUnicodeFile.txt", "Unicode");
 for (int i = 0; i < mytest.length; i++) {
   println(mytest[i] + " " + mytestU[i]);
 }
}

Thanks again! Adi


Re: How to read Unicode text files
Reply #3 - Sep 28th, 2007, 6:28pm
 
that's because java does not recognize "Unicode" as an encoding .. it uses the default mac-encoding instead. have a look at the types:

http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html

i guess you should try "UTF-8" (without the byte order mark) if you need to use unicode at all.

you can find the underlying systems charset with:
println( System.getProperty("file.encoding") );

F
Re: How to read Unicode text files
Reply #4 - Sep 28th, 2007, 8:38pm
 
Thanks. Just to clarify: InputStreamReader understands "Unicode" (but the program only works with the included byte-order mark), with "UTF-16" it works correctly in any case (with or without mark), and with "UTF-8" I get "?" instead of the non-standard ascii characters.
Adi
Re: How to read Unicode text files
Reply #5 - Aug 13th, 2008, 6:20pm
 
Just an update the above doesn't work anymore with processing 0144 (at least) maybe is the moving to java 1.5

static public String[] loadStrings(File file, String _enc) {
   InputStream is = openStream(file);
   if (is != null) return loadStrings(is, _enc);
   return null;
}

Anyway just comment that part and you still should able to use that :]
Re: How to read Unicode text files
Reply #6 - Aug 13th, 2008, 8:22pm
 
No, *please* read revisions.txt for changes. All files are now treated as UTF-8 by default, to deal with this issue.
Page Index Toggle Pages: 1