Reaching above \uFFFF

edited April 2015 in How To...

I need to access every glyph that has the character code points from \u000 to \u10FFFF (all of them that is, I don't actually have to reach the private area use point, but nevermind). I've manage to write the first FFFF (65536) characters but as I reach \u10000 it's not as smooth any more. I am not really into programming but I've used my computer and found this:

  • Processing treats (an example) char(1AF19) as char(AF19), it skips the beginning of values bigger then FFFF.

  • GoToLoop says: "Java's char type is 16-bit. Anything past it gonna need 2 of it in order to get 1 character! Look for surrogate UTF16 codes! Warning: Advanced stuff!" >>> Is this the only way?

  • I've tried to found something on surrogates, I get the concept and how they mathematically work: can't really figure how to write a syntax in processing though.

  • GoToLoop also says: "2^16 = 1<<16 = 1<<020 = 1<<0x10 = '\uFFFF' + 1 = Character.MAX_VALUE + 1. Full Unicode UTF-16!" >>>;)

  • Can I rewrite PFont to help myself? I've found more documentation on UTF-16 in java, it's Character class and I really don't want to go there;).

  • Or should I learn for example python to do this easier for my self?

What I am doing is a matrix with the 113 021 current unicode characters that is in use, super cute. Thanks in advance! H


  • 1AF19 is a no-manland, reserved code point, so it is a bad example.

    Note that in order to display a high code point, you must use a font supporting it: not all fonts support all defined code points! Arial Unicode, for example, supports lot of them, but for more specialized characters, like Cherokee, you will need specialized fonts.

  • edited April 2015

    Thanks for the answers, this was everything I needed!

    I've managed to write 113 021 different glyphs, which should be all included in unicode7.0. I am having some minor issues with fonts (I use several different to reach every glyph) that contain glyphs that are only a box or the wrong glyph, but I'll look into this tomorrow, also some other issues with long parts of it being pretty boring to look at, this is not my fault though.

    And sorry about the bad example, @PhiLho, I just wrote random letters that looked unicode-ish to me;).

    Once again, great thanks for helping out.

Sign In or Register to comment.