« What's on my mind... | Main | Ruby 1.9 can check your indentation »
TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451c41c69e20105365a567f970c
Listed below are links to weblogs that reference Names of encodings built in to Ruby 1.9:
The comments to this entry are closed.
Why not simply link to IBM's ICU? You'll get all the encodings you will ever need... Why reinvent the will over and over again? :)
Posted by: Dmitrii 'Mamut' Dimandt | December 11, 2008 at 02:40 AM
... the wheel ...
of course, silly me
Posted by: Dmitrii 'Mamut' Dimandt | December 11, 2008 at 02:41 AM
Hmm, afaik UTF-16 != UCS-2. The first one can have 2 or 4 bytes while the latter always has 2 bytes. Anyone can confirm this?
Posted by: Marius | December 11, 2008 at 04:01 AM
I concur: UCS-2 is the full Unicode version 2 set encoded in two bytes per character; UTF-16, like UTF-8, is a variable-length encoding for the full UCS-4 (31-bit, Unicode version 4) set, just a bit bulkier as it's multiples of two bytes rather than one... so they are different, but quite possibly UTF-16 is a clean superset of UCS-2 and obviously in a great many cases (presumably the bottom 32,768 code points) they'll look exactly the same.
Confusingly, it seems that UTF-32 is defined as being the same at UCS-4, but I think the Unicode consortium are pretty confident that they'll never need more than 4 billion code points, even with it segmented into chunks.
Posted by: Jim Driscoll | December 15, 2008 at 06:20 PM