Loading...
Logo
Processing Forum
How to read a text file with Arab characters encoded in UTF-8 with processing?
I use InputStreamReader and I work under Windows.

code:

 import java.io.BufferedReader;
  import java.io.InputStreamReader;
  import java.nio.charset.Charset;


String[] lines = new String[0];

void setup() {
  size(300,300);
}

void draw() {
}

void keyPressed() {
  lines = new String[0];
  String filename = "arabe1.txt";
  File f = new File(dataPath(filename));
 
  if(f.exists()) {
    //lines = loadStrings(filename);

    try {
      /* Open a stream to a File (in your data Folder) here */
      InputStream fi = createInput(filename);  
      /* get a reader with your encoding */
      InputStreamReader input = new InputStreamReader( fi, Charset.forName("UTF-8") );
      BufferedReader reader = new BufferedReader(input);

      // read the file line by line
      String line;
      int counter = 0;
      while ((line = reader.readLine()) != null) {
        lines = append(lines, line);
        counter++;
      }
      reader.close();
    }
    catch (IOException e) {
      e.printStackTrace();
    }
  }
 
  for ( int i = 0 ; i < lines.length;i++) println(lines[i]);
}

Replies(3)

Processing official default encoding is UTF-8. And this is why ` loadStrings()` already supports UTF-8 and "under the hood" `loadStrings()` ALREADY does same thing which you did in your code with InputStreamReader(). 

However println() doesn't render to console arabic symbols correctly. (In other words "processing console doesn't support non-english fonts (ie arabic, russian, korean, chinese). 

Let's see an example. If I use as a source arabic text (encoded in UTF-8 without BOM)




And I try to load and display this file with the following sketch.  Then I will get result similar to yours.

Copy code
  1. String[] lines;
  2. void setup(){
  3.    size(800,600);
  4.    // in order to display arabic (or any other charset) characters properly
  5.    // make sure that font you create here supports arabic characters.
  6.    // but usually Arial includes arabic characters.
  7.    PFont f = createFont("Arial", 72);
  8.    textFont(f);
  9.    
  10.    // if your file is UTF-8 then it will be correctly loaded to memory.
  11.    lines = loadStrings("arabic01.txt");
  12.    
  13.    for( String line : lines){
  14.       // this println() will output to console only question marks ????? despite
  15.       // the variable line itself contains VALID arabic symbols in UTF-8
  16.       // this is because processing console doesn't support 
  17.       // non-english alphabets.  
  18.       println(line);
  19.    }
  20. }


  21. void draw(){
  22.   background(0); 
  23.   // however this line will output to screen
  24.   // correct arabic text. (Assuming that font which you're using
  25.   // supports arabic characters).
  26.   // to see if font supports the characters you need: In windows you can choose "Start / run", 
  27.   // type CHARMAP 
  28.   // and when window with fonts opens, choose your font and see if it contains the characters you need.  
  29.   text(lines[0], 10, 100 ); 
  30. }


You can see that I got question marks in Processing console when trying to display first line of file.




But as I said before, it is because Processing CONSOLE doesn't support arabic font. In memory sketch contains valid arabic symbol sequence. So if we load proper font (like shown in the sketch above ) and try to output the line with `text()`, then we get correct rendering. Eg.:


if you are interested in how to load fonts properly in processing, you may find useful my other post on displaying Japanese/Chinese characters.  https://forum.processing.org/topic/japanese-characters-in-processing#25080000002303153

hope this helps
Thank you for the answer. The text is displayed on the screen.