« Importing RSS Feeds into Mail.app | Main | BabyDoc »

April 08, 2008

Fun with Ruby 1.9 File Encodings

Ruby 1.9 allows you to specify the character encodings of I/O streams, strings, regexps, symbols, and so on. It also lets you specify the encoding of individual source files (and a complete application can be built from many files, each with different character encodings). Expect to start seeing a rash of obscure source code, at least until the initial excitement abates and cooler thinking prevails.

In the meantime, we can get away with


# encoding: utf-8
require 'mathn'
class Numeric
   def ℃
     (self - 32) * 5/9
   end
   def ℉
     self * 9/5 + 32
   end
end
 
puts 212.℃
puts 100.℉

Or, for those who'd like a peek at the start of a road that eventually leads to madness:


alias ✎ puts 
 
✎ 212.℃
✎ 100.℉

I'm betting this post displays badly on about 50% of the machines that are used to view it. Which is reason enough to tread very lightly down this path…

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451c41c69e200e551cc3e828834

Listed below are links to weblogs that reference Fun with Ruby 1.9 File Encodings:

Comments

s/class/module/

Works in 1.8 with -Ku as well.

*Searches Craigslist for an APL keyboard*

Unfortunately encodings of files also means that there's "binary mode" for non-windows machines now too, as sometimes Ruby 1.9 will decide your file is UTF-8 instead of a plain string-of-bytes now.

open filename, "rb:ascii-8bit" do |io| ... end

Will force a file's contents to be a plain string-of-bytes.

Hah! Awesome. Now I just need to figure out how to make those special characters.. :(

I actually use this in ruby 1.8, to build dynamically build accessor methods from a CSV file's headers, which are in german, so some of them have umlauts (äöü) or even ß. This way I can still use accessor methods, so I found this to be quite useful,

A few months ago (September, maybe?) this was being discussed in #ruby-lang, and you got

* Kernel#√ to get a nice sqrt_2 = √ 2
* Enumerable#⊂ as an alias to Enumerable#include?
* Kernel#Σ: Σ(1, 2, 3, 4) #=> 10
* and a couple others.

You could become pretty evil if you want to ^_^

I was just thinking the other day this would be a great idea. Finally dot and cross matrix multiplication can be realised nicely. This will certainly position Ruby better in the mathamatic & scientific communities.

Actually, browsers do a pretty good job of displaying non-ASCII Unicode characters. In fact, the ease of transmitting non-ASCII data is about 1000 times better than it was a decade ago and character transmission problems are becoming relatively rare. Google "Japanese characters" on any modern computer. Then cut and paste it almost anywhere, even into a "vi" window in a Mac OS X terminal.

Of course there will always be exceptions of newly invented characters, or family-name characters or other odd cases. But standard symbols and accents? No problem.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Now in Beta

  • Programming Ruby, 3rd Edition
    Third Edition, Covering Ruby 1.9, now available
My Photo

Pragmatic Stuff

Photos

  • www.flickr.com
    This is a Flickr badge showing public photos from pragdave tagged with pragdave_badge. Make your own badge here.

Site Search

  • Google Search

    The web
    PragDave