« What's on my mind... | Main | Ruby 1.9 can check your indentation »

December 10, 2008

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451c41c69e20105365a567f970c

Listed below are links to weblogs that reference Names of encodings built in to Ruby 1.9:

Comments

Dmitrii 'Mamut' Dimandt

Why not simply link to IBM's ICU? You'll get all the encodings you will ever need... Why reinvent the will over and over again? :)

Dmitrii 'Mamut' Dimandt

... the wheel ...

of course, silly me

Marius

Hmm, afaik UTF-16 != UCS-2. The first one can have 2 or 4 bytes while the latter always has 2 bytes. Anyone can confirm this?

Jim Driscoll

I concur: UCS-2 is the full Unicode version 2 set encoded in two bytes per character; UTF-16, like UTF-8, is a variable-length encoding for the full UCS-4 (31-bit, Unicode version 4) set, just a bit bulkier as it's multiples of two bytes rather than one... so they are different, but quite possibly UTF-16 is a clean superset of UCS-2 and obviously in a great many cases (presumably the bottom 32,768 code points) they'll look exactly the same.

Confusingly, it seems that UTF-32 is defined as being the same at UCS-4, but I think the Unicode consortium are pretty confident that they'll never need more than 4 billion code points, even with it segmented into chunks.

The comments to this entry are closed.

Now in Beta

  • Programming Ruby, 3rd Edition
    Third Edition, Covering Ruby 1.9, now available
My Photo

Pragmatic Stuff

Photos

  • www.flickr.com
    This is a Flickr badge showing public photos from pragdave tagged with pragdave_badge. Make your own badge here.