We are about to switch to a new forum software. Until then we have removed the registration on this forum.
I'm looking to pull some JSONArrays from the web via HTTP get requests - and this works fine with English text.
However, when it comes to East Asian languages I get a bunch of weird looking values that looks like 한글메세지
I tried doing the same with both PHP and SQL pages but the results are the same. Reference doc in Processing says : All files loaded and saved by the Processing API use UTF-8 encoding. so it looks like it'll be ok but it's not ^^;
I'd be so grateful if someone can nudge me in the right direction... Thank you!!
Answers
Ok that is so weird - The asian character appears to be fine when I paste the jumbled words into this text area.
The original distorted text looks like this: 한 글 메 세 지 with ; in between the spaces.
that's fine, it's just html encoded (which is why it looks fine when pasted on the html forum)
each of the numbers after the # is the (decimal) unicode value for the character in question. for instance, #54620:
http://www.fileformat.info/info/unicode/char/D55C/index.htm
in php you can try http://php.net/manual/en/function.html-entity-decode.php
in java you might have to write something that parses the string and replaces the encoded bit with the character in question (although i'm sure it's been done a million times by people already)
Hi Koogs, thank you for the response!
I'm assuming it'd be more efficient for the data to be converted on the PHP side rather than encoding in processing ? It's because we have lots of Chinese sentences being collected over a period of time & decoding millions of different character would be resource intensive for the sketch..
Or is there a chance to do this with little resources on Processing? I'm waiting for the team doing the PHP side of things to get back but if it's not too hard to do it in processing I might try that route.
I'm quite new to Processing so it'd be so great if you could point me in the direction.. Thank you so much!!
FYI - this is what I used to encode within Processing sketch with my prof's help :) http://commons.apache.org/proper/commons-lang/download_lang.cgi