Ruby & Rails

April 08, 2009

Twitter Should Move Away from Ruby

Oh dear. The chattering classes are at it, talking about how the Twitter folks are dissing Ruby by announcing the replacement of some Ruby code with Scala code.

Please stop.

At the kinds of volumes that Twitter handles (and with what I assume is a somewhat scary growth curve), Twitter needs  to improve concurrency—it needs an environment/language with low memory overhead, incredible performance, and super-efficient threading. I don't know if Scala fits that particular bill, but I know that current Ruby implementations don't. It isn't what Ruby's intended to be. So the move away is just sound thinking. (I suspect it also took some courage.) I applaud Alex and the team for this.

Instead of defending Ruby when it's clearly not an appropriate solution, let's think about things the other way around.

The good folks at Twitter started off with Ruby because they wanted to get something running quickly, and they wanted to experiment. And Ruby gave them that. And, what's more, Ruby saw them through at least two rounds of phenomenal  growth. Could they have done it in another language? Sure. But I suspect Ruby, despite the occasional headache, helped them get where they are now. 

And now they've reached the status of world-wide wunderkind, it's time to move on. 

I for one wish them luck. I look forward to the day when our online store reaches the kind of size where we have to move away from Rails. I'll tweet the fact with a tear in my eye, while my yacht sails me off to the sunset.

January 20, 2009

So I need a symetric coroutine example

I'm getting close to wrapping up the new PickAxe. One of the last things I need is an example of symetric coroutines for the standard library section on the Fiber class.

I have lots of asymetric examples, but I'm struggling to come up with something decent for symetric coroutines (which use transfer to pass control between themselves.) I've coded up Conway's line squeezer, but it works better using asymetric coroutines. I've tried to come up with puzzles that are best solved with symetric coroutines, but without luck.  I've coded up a simple blackjack game, but again it works better with asymetric coroutines. Although, for fun, here's the dealer, which shows some of the new Ruby 1.9 array methods:


But, I'm still stuck. So... can any of you clever folk come up with something compelling that uses less that 35 lines of code?

December 29, 2008

Rails Studio Early Registration ends soon

I love doing the Pragmatic Studios. Nicole, Mike, and Chad are all good friends, and that creates a really relaxed and fun atmosphere for what could otherwise be a pretty intense three days.

The next studio is in Denver at the end of January, and I'm really looking forward to it—living in Texas, I relish being able to see mountains and snow. I just wish I was a skier so I could have an excuse to take an extra day or two to head on up to a resort.

If you are planning on going, though, don't put it off—early registration (a $300 savings) ends in a week.

There's one bad side to the studios, though. Mike put together a video showing what studios are like. Notice anything? Food. Lots of it. Scarily good food. Must resist. Must…



December 17, 2008

Ruby 1.9 can check your indentation

All Ruby programmers regularly encounter the mystical error “syntax error, unexpected $end, expecting keyword_end.” We know what it means: we left off an end somewhere in the code. As Ruby compiled our source, it keeps track of nesting, and when it reached the end of file ($end), it was expecting to see one more end keyword, and none was there.

So, we trundle back through the source, and after a while discover we'd deleted just one too many lines during that last edit.

Ruby 1.9 makes that easier. For example, here's a source file:

class Example
  def meth1
    if Time.now.hours > 12
      puts "Afternoon"
  end
  def meth2
    # ...
  end
end

Run it through Ruby 1.9, and you'll get the same old error message:

dave[RUBY3/Book 8:26:48*] ruby t.rb  
t.rb:10: syntax error, unexpected $end, expecting keyword_end

But add the -w flag, and things get more interesting.

dave[RUBY3/Book 8:26:51*] ruby -w t.rb
t.rb:5: warning: mismatched indentations at 'end' with 'if' at 3
t.rb:9: warning: mismatched indentations at 'end' with 'def' at 2
t.rb:10: syntax error, unexpected $end, expecting keyword_end

It's the small things in life...

October 16, 2008

Fun with Ruby 1.9 Regular Expressions

I've reorganized the regular expression content in the new Programming Ruby, and added some cool new advanced examples. This one's fairly straightforward, but I love the fact that I can now start refactoring my more complex patterns, removing duplication.

The stuff below is an extract from the unedited update. It'll appear in the next beta. It follows a discussion of named groups, \k and related stuff.


There’s a trick which allows us to write subroutines inside regular expressions. Recall that we can invoke a named group using \g<name>, and we define the group using (?<name>...). Normally, the definition of the group is itself matched as part of executing the pattern. However, if you add the suffix {0} to the group, it means “zero matches of this group,” so the group is not executed when first encountered.

sentence = %r{ 
    (?<subject>   cat   | dog   | gerbil    ){0} 
    (?<verb>      eats  | drinks| generates ){0} 
    (?<object>    water | bones | PDFs      ){0} 
    (?<adjective> big   | small | smelly    ){0} 

    (?<opt_adj>   (\g<adjective>\s)?     ){0} 

    The\s\g<opt_adj>\g<subject>\s\g<verb>\s\g<opt_adj>\g<object> 
}x

md = sentence.match("The cat drinks water") 
puts "The subject is #{md[:subject]} and the verb is #{md[:verb]}"
 
md = sentence.match("The big dog eats smelly bones") 
puts "The adjective in the second sentence is #{md[:adjective]}" 

sentence =~ "The gerbil generates big PDFs" 
puts "And the object in the last is #{$~[:object]}" 

produces:

The subject is cat and the verb is drinks 
The adjective in the second sentence is smelly 
And the object in the last is PDFs 

Cool, eh?

September 09, 2008

Fun with Procs in Ruby 1.9

Ruby 1.9 adds a lot of features to Proc objects.

Currying is the ability to take a function that accepts n parameters and generate from it one of more functions with some parameter values already filled in. In RUby 1.9, you create a curry-able proc by calling the curry method on it. If you subsequently call this curried proc with fewer parameters than it expects, it will not execute. Instead, it returns a new proc with those parameters already bound.

Let's look at a trivial example. Here's a proc that simply adds two values:

plus = lambda {|a,b| a + b}
puts plus[1,2]

I'm using the [ ] syntax to invoke the proc with arguments, in this case 1 and 2. The code will print 3.

Now let's have some fun.

curried_plus = plus.curry
# create two procs based on plus, but with the first parameter 
# already set to a value
plus_two = curried_plus[2]
plus_ten = curried_plus[10]

puts plus_two[3]
puts plus_ten[3]

On line 1, I create a curried version of the plus proc. I then call it twice, but both times I only pass it one parameter. This means it cannot execute the body. Instead, each time it returns a new proc which is like the original, but which has the first parameter preset to either 2 or 10. In the last two lines, I call these two new procs, supplying the missing parameter. This means they can execute normally, and the code outputs 5 and 13.

You can have a lot of fun with currying, but that's not why we're here today.

Over the weekend, Matz added a new method to the Proc class. You can now use Proc#=== as an alias for Proc.call. So, why on earth would you want to do that? Well, remember that === is used to match terms in a case statement. Over of the AimRed blog, they noted that this feature could be used to make the matching in case statements actually execute code. In their example, they manually added the === method to class Proc

class Proc
  def ===( *parameters )
    self.call( *parameters )
  end
end

Then you can write something like

sunday = lambda{ |time| time.wday == 0 }
monday = lambda{ |time| time.wday == 1 }
# and so on...

case Time.now
when sunday
  puts "Day of rest"
when monday
  puts "work"
# ...
end

See how that works? As Ruby executes the case statement, it looks at each of the parameters of the when clauses in turn. For each, it invokes its === method, passing that method the original case discriminator (Time.now in this example). But with the new === method in class Proc, this will now execute the proc, passing it Time.now as a parameter.

While updating the PickAxe, I noticed that Matz liked this so much that it is now part of 1.9. And it means we can combine this trick with currying to write some fun code:

is_weekday = lambda {|day_of_week, time| time.wday == day_of_week}.curry

sunday    = is_weekday[0]
monday    = is_weekday[1]
tuesday   = is_weekday[2]
wednesday = is_weekday[3]
thursday  = is_weekday[4]
friday    = is_weekday[5]
saturday  = is_weekday[6]

case Time.now
when sunday 
  puts "Day of rest"
when monday, tuesday, wednesday, thursday, friday
  puts "Work"
when saturday
  puts "chores"
end

Is this incredibly efficient? Not really :) But it opens up quite an interesting set of possibilities.

June 17, 2008

Two New Metaprogramming Episodes

I'm about to head off for a couple of weeks vacation, but I didn't want to leave folks waiting for new episodes in the Ruby Object Model and Metaprogramming screencast series, so Mike and I decided to release this week's and next week's episodes together.

Episode 4 looks at instance_ and class_eval, and lays the foundation for a whole bunch of metaprogramming to come. Episode 5 is probably my favorite so far (although I'm rather partial to the Public Service Announcement in #2). In #5, we take a fairly simply programming task and code it up nine different ways, using all of the metaprogramming techniques we've learned to date. It's pretty much pure coding for 36 minutes.

People asked for chapter markers, so we now have chapters in this series. People asked for Ogg support, so we've now got experimental Theora Ogg versions available alongside the Quicktime and iPhone/iPod formats.

I'm really liking this series—it's fun researching it, and fun learning the ins and outs of producing it.

Enjoy!


Dave

June 10, 2008

Screencasting Ruby Metaprogramming

Tv

I've been teaching Ruby (and in particularly, metaprogramming Ruby) for almost 7 years now. And, in that time, I've gradually found ways of cutting through all the confusing stuff to the actual essentials. And when you do that, suddenly things get a lot simpler. I've always know that Ruby didn't really have class methods and singleton methods, for example, but until recently I didn't have a simple way to explain that.

Then, when preparing to give an Advanced Ruby Studio, my thinking crystalized. Metaprogramming in Ruby becomes simple to explain if you focus on four things:

  • Objects, not classes.
  • There is only one kind of method call in Ruby. The "right-then-up" rule covers everything.
  • Understanding that self can only be changed by a method call with a receiver or by a class or module definition makes it easy to keep track of what's going on when metaprogramming.
  • Knowing that Ruby keeps an internal concept of “the current class” which is where def defines its methods. Knowing what changes this makes it easier to know what's going on.

I tried this approach in a number of Studios, and refined it during some talks for RubyFools in Copenhagen and Oslo.

So Mike Clark, who's producing our new series of screencasts, started pushing me to put this description into video. Last week I finally cleared the decks enough to record the first three episodes.

First, I have to say it was a blast. I'd never recorded this many minutes of screencast before, and I was blown away by the amount of time it takes. I was also surprised at the level of detail involved, from microphone setup (which I messed up for a couple of segments) to color matching between codecs, it was fun to learn a whole new set of technologies.

I was also surprised at how hard it was to talk to a microphone. When we write books, we always try to write as if the reader was sitting there next to us. I tried to to the same approach with the screencasts, but it takes a whole new set of skills...

What I really liked was the way that I could live code examples to illustrate points. The first episode has maybe 50/50 code and exposition, and the second and third episodes are mostly code. And the code acts as a great skeleton on which to hang the concepts. Apple-R also keeps me honest.

So, if you're interested in how the Ruby object model really works, and want to improve your metaprogramming chops, why not check them out?

May 23, 2008

New lambda syntax in Ruby 1.9

I'm slowly getting used to the new -> way of specifying lambdas in Ruby 1.9. I still feel that, as a notation, it could be clearer. (I'd personally like just plain backslash, because that looks pretty close to a real lambda character, but that's not going to happen.) But having punctuation, rather than the word lambda, makes a surprising difference to the way my eyes read code.

For example, you could write a method that acts like a while loop.

def my_while(cond, &body)
  while cond.call
    body.call
  end
end   

In Ruby 1.8  and 1.9, you could call this as

a = 0
my_while lambda { a < 5 } do
   puts a
   a += 1
end

But my brain finds that seriously hard to scan. The Ruby 1.9 -> syntax makes it slightly (just slightly, mind you) better:

a = 0
my_while  -> { a < 5 }  do
   puts a
   a += 1
end

I suspect this is just a question of time. In a year or so, we'll parse the -> syntax in our heads without thinking twice. Once it does become natural, I suspect we'll find all sorts of new uses for procs.

May 14, 2008

Silly Useful shoulda trick

_mg_3035

Playing with the shoulda testing framework, I came across a small but useful trick. Because the tests are written inside closures, local class variables are available inside should blocks. They're only evaluated once, so they don't take the place of setup blocks, but they are a nice way of storing test-wide values. Somehow, I like the look of this better than using instance variables or constants—the tests seem to be more uniform and balanced.

require 'test/unit'
require 'shoulda'
require 'date' 

require 'csv_to_html/age_calculator'

class AgeCalculatorTest < Test::Unit::TestCase
                                  
  birth = Date.parse("2003-05-02")

  should "be year difference if now later in year than birth" do
    now = Date.parse("2008-06-15")
    assert_equal 5, AgeCalculator.age_given_dates(now, birth)
  end

  should "be year difference if now later in month than birth" do
    now = Date.parse("2008-06-15")
    assert_equal 5, AgeCalculator.age_given_dates(now, birth)
  end
         
  should "be year difference minus 1 if now earlier in year than birth" do
    now = Date.parse("2008-04-15")
    assert_equal 4, AgeCalculator.age_given_dates(now, birth)
  end

  should "be year difference minus 1 if now earlier in month than birth" do
    now = Date.parse("2008-05-01")
    assert_equal 4, AgeCalculator.age_given_dates(now, birth)
  end

end

Test-Driven Rails Studio

Jim Weirich

I have a blast teaching at the Pragmatic Studios.. Mike and Nicole run excellent courses, the format is incredible, and the students enthusiastic. I love presenting with two or three other instructors—it keeps the energy level high, and I always manage to learn a lot from them.

Which is why it's a shame I won't be able to be at two of the upcoming Studios, Advanced Rails in Denver on June 12–14 and Test-Driven Development with Rails on June 9–11. I'm doubly bummed because the latter is taught by Jim Weirich and Joe O'Brien, both old friends from the Ruby community. I've never heard Joe teach, but I've often sat and admired Jim as he takes some complex subject and reduces in to its basics in the most incredibly entertaining ways. If you're in to Rails, and want to take your testing to the next level, I believe there are still some seats available. And if you do get there, tell 'em I said hi! I'm jealous.

April 28, 2008

Shoulda used this earlier

In many ways, testing software is like going out and getting exercise. You know you should do it, and you know it does you good, but it's also pretty easy to find an excuse to skip it (I'll make it up tomorrow).

So anything that makes testing easier is good, because it cuts down on the excuses not to do it.

One thing I've never really liked about the conventional xUnit-style testing frameworks was the setup and teardown structure. In these frameworks, a test case is a class, and setup and teardown are implemented by methods in that class. Each test is also a method, so the basic flow is

  for each test method in the class
    run setup
    run the test method
    run teardown
  end

Nice and simple. Each test method got the benefit of a standard environment created by the setup method, and the teardown method got the job of tidying up after.

Except… when I'm writing tests, I typically want to set up lots of different scenarios. I'll want A and B and C, then A and B but not C, then A and not B, then A and D, and so on. I had two choices—write lots of test case classes, using subclassing to inherit common setup behavior, or write per-test method setup code (often factored out into helpers). In the end, I almost always did the latter, And that was tedious, and it made it harder to see the tests for the setup code.

I flirted with RSpec. Its spec framework seemed to have what I wanted. But I just couldn't get myself to enjoy using it. (I think it's a cat people/dog people kind of thing)

Enter shoulda

Then, a couple of weeks back, Mike Clark and Chad Fowler introduced me to shoulda. Shoulda isn't a testing framework. Instead, it extends Ruby's existing Test::Unit framework with the idea of test contexts. A context is a section of your test case where all the test methods have something in common. At it simplest, a context could be simply used as an annotation device (and, yes, this is a silly example):

context "My factorial method" do
  should "return 1 when passed 0" do
    assert_equal 1, fact(0)
  end
  should "return 1 when passed 1" do
    assert_equal 1, fact(1)
  end
  should "return 6 when passed 3" do
    assert_equal 6, fact(3)
  end
end    

The stuff in a context can share common setup code—just write a setup block.

class CartTest < Test::Unit::TestCase

  context "An empty cart" do
    setup do
      @cart = orders(:wilmas_empty_cart)
    end

    should "have no line items" do
      assert_equal 0, @cart.line_items.size
    end

    should "have a zero price" do
      assert_equal 0, @cart.price
    end
  end

  context "Some other context..." ...
  end
end

So now, within a single test case I can set up multiple contexts, and each context can have its own environment.

But, take it back to my original problem. I often want to set up hierarchies of related environments for my tests. The shaoulda code handles this wonderfully, because it lets me nest contexts. For example, I'm adding a feature to our store that gives customers some additional information if, during checkout, their credit card transaction was initially rejected because the address was wrong, and was then accepted when they fixed the address. I wanted two tests, one without the prior address error, and one with.

To set up this environment, I needed to set up a shopping cart, create a dummy response from our payment gateway, and post that response to the application. In the case of the prior address error, I also wanted to inject an entry containing that error into the transactions associated with the order prior to generating the response.

With shoulda, I simply created some nested contexts. The top level context did the shared setup, and the inner contexts then set up appropriate environments for their tests. It looked like this:

  context "Checking out"  do
    setup do
      @cart = cart_named(:freds_full_cart)
      @cart.prepare_for_store_authorize!
      @params = approved_authnet_response(@cart)
    end                  
    
    context "with no AVS errors in CC transaction history" do
      setup do
        post :post_from_authnet_authorize, @params
      end

      should_redirect_to "{:action => :receipt}"
    end 
    
    context "with AVS errors in CC transaction history" do
      setup do
        avs_error = CcTransaction.new(:response_code => 2, :response_reason_code => 27)
        @cart.cc_transactions << avs_error
        post :post_from_authnet_authorize, @params
      end

      should_redirect_to "{:action => :explain_avs_mismatch}"
    end
  end 

The outer setup gets run before the execution of each of the inner contexts. And the setup in the inner contexts gets run when running that context. And shoulda keeps track of it all, so I get very natural error messages if an assertion fails. For example, if the test in the second context above fails, I'd get

Checking out with AVS errors in CC transaction history should 
redirect to "{:action => :explain_avs_mixsmatch}". 

So, now, I can finally set up my hierarchies of test environments in a natural way. It isn't revolutionary. It's just one less excuse for not testing…

April 11, 2008

Ruby 1.9 Standard Library Changes

Here's a top-level overview of some of the changes to date in the standard library that comes with Ruby 1.9. (These are the libraries that you get preinstalled with Ruby, but that you have to require into your code.)

  • The base64 library has been removed. Use Array#pack and String#unpack instead.
  • Much of the Complex and Rational libraries are now built in to the interpreter. However, requiring the external libraries adds additional functionally. In the case of Rational, this functionality is minimal.
  • The CMath library has been added.
  • The Enumerator library is now built in.
  • Added Fiber library (adds coroutine support to fibers).
  • Removed ftools (replaced by fileutils).
  • The Generator library has been removed (use Fibers).
  • Added notes on using irb from inside applications.
  • jcode is removed in favor of built-in encoding support.
  • The json library is added.
  • The matrix library no longer requires that you include mathn.
  • The mutex library is now built in.
  • parsedate has been removed. The Date class handles most of its functionality.
  • readbytes has been removed. IO now supports the method directly.
  • require_relative added.
  • Add description of Ripper.
  • Add description of SecureRandom.
  • I've omitted the shell library, as it seems more like a curiosity than something folks would use (and it's broken under 1.9).
  • The soap library is removed.
  • I've omitted the sync library. It is broken under 1.9, and the monitor library seems to be cleaner.
  • Win32API is now deprecated in favor of using the DL library.

It's interesting to me just how much is still changing in Ruby 1.9. But, as I use it more and more, it's also gratifying to see how some of the new idioms make coding just that little sweeter.

I just pushed a new beta of the PickAxe Third Edition with all the library changes.

April 09, 2008

BabyDoc

One of the fun things about updating the PickAxe is getting to come up with examples to show the various APIs in action. Here's a very silly example of using Ripper's event-based API to extract comments that are associated with basic class definitions. It clearly has holes (it doesn't handle class A::B::C, for instance) but it's fairly easy to see how to add a proper state machine and produce something that might be interesting to play with...

require 'ripper'

# This class handles parser events, extracting
# comments and attaching them to class definitions
class BabyRDoc < Ripper::Filter
  def initialize(*)
    super
    reset_state
  end

  def on_default(event, token, output)
    reset_state
    output
  end

  def on_sp(token, output) output end
  alias on_nil on_sp

  def on_comment(comment, output)
    @comment << comment.sub(/^\s*#\s*/, "    ")
    output
  end

  def on_kw(name, output)
    @expecting_class_name = (name == 'class')
    output
  end

  def on_const(name, output)
    if @expecting_class_name
      output << "#{name}:\n"
      output <<  @comment
    end
    reset_state
    output
  end

  private

  def reset_state
    @comment = ""
    @expecting_class_name = false
  end
end

BabyRDoc.new(File.read(__FILE__)).parse(STDOUT)

Run this with Ruby 1.9 (or, I guess, 1.8 with Ripper installed), and you'll see

BabyRDoc: 
    This class handles parser events, extracting 
    comments and attaching them to class definitions

April 08, 2008

Fun with Ruby 1.9 File Encodings

Ruby 1.9 allows you to specify the character encodings of I/O streams, strings, regexps, symbols, and so on. It also lets you specify the encoding of individual source files (and a complete application can be built from many files, each with different character encodings). Expect to start seeing a rash of obscure source code, at least until the initial excitement abates and cooler thinking prevails.

In the meantime, we can get away with


# encoding: utf-8
require 'mathn'
class Numeric
   def ℃
     (self - 32) * 5/9
   end
   def ℉
     self * 9/5 + 32
   end
end
 
puts 212.℃
puts 100.℉

Or, for those who'd like a peek at the start of a road that eventually leads to madness:


alias ✎ puts 
 
✎ 212.℃
✎ 100.℉

I'm betting this post displays badly on about 50% of the machines that are used to view it. Which is reason enough to tread very lightly down this path…

March 19, 2008

Ruby 1.9 Built-in Library--Finished First Pass

One of the scary things about revving the PickAxe for Ruby 1.9 is updating the reference section where I describe all the built-in classes and methods. It involves working through the interpreter source, looking for all the rb_define_method calls (and their friends) and then reading the C implementation of the corresponding methods. Many methods are unchanged from 1.8. But, at the same time, many have changed. Often they take an additional parameter, or return an Enumerator where previously they required a block. Then there are the new classes (something like 6 of them) and new methods. (It looks like there are over 200 new built-in methods in the current Ruby 1.9).

All in all, I count something like 300 [1.9] flags in the new library reference. Some flag stuff is as trivial as a change of a default return type, while others flag entire new classes.

It's incredibly time-consuming work, and I'm constantly grumbling while doing it. But I come out the other end knowing a whole bunch about the library, and with a deeper respect for the folks who maintain it.

March 18, 2008

Complex and Rational are now built-in to 1.9

Just when I thought I'd finished documenting the standard library for the new PickAxe, I did one last svn up of the Ruby interpreter source and discovered that the Complex and Rational classes are now builtins—no need to require the library to get the basic functionality. The change also affects a number of other built-in classes (you can now say nil.to_c, for example). I'm not 100% sure I agree with rolling in Complex, but the addition of rational numbers is a welcome change.

Unfortunately it's back to the drawing board on my plans to release a new beta today...

January 08, 2008

A loud "Huzzah!" was heard throughout the land

Eric Hodel is giving RDoc some love. You can't imagine how happy that makes me.

When I first wrote RDoc, I was trying to find a way of solving two problems:

1. Adding comments to the largely uncommented C source of Ruby, and
2. Providing a means for library writers easily to document their creations.

I'd just finished the PickAxe, and I wanted to take the work Andy and I had done reverse engineering the Ruby API and add it back into the interpreter source code.

I set myself constraints with RDoc and ri:

* it should produce at least some documentation even on totally uncommented source files
* it should extract tacit information from the program source (for example guessing good names for block parameters by looking for yield statements inside methods)
* the markup in the source files should be unobtrusive. In the typical case, someone reading the source should not even notice that the comments follow markup conventions
* it should only use libraries that come pre-installed with Ruby
* the documentation it produced should be portable across machines and architectures
* it should allow incremental documentation. Libraries that you install over time can add methods to existing classes. As you add these libraries, the method lists in the classes you extend should grow to reflect the changes
* it should be secure. People pushed many times to add the ability to execute code during the documentation process. I didn't want to have code run on an end user's machine during a process that ostensibly was simply installing documentation (particularly as these installations often ran as root)
* it should be throw-away

The last one might be a surprise, but the real objective of RDoc wasn't the tool. The real objective was to set a standard that meant that future libraries would get documented in a consistent and usable way. And so RDoc and ri compromised like crazy. Rather than a database or some complex binary format, they used a set of directory trees in the user's filesystem to store documentation. This documentation, which is basically a set of Ruby objects, was stored using YAML, rather than marshaled objects or Ruby source. Even though YAML is slow, it is more portable than marshaled objects, and more secure than Ruby source. The parser in RDoc was a wild hack on the parser in irb. This means it performs a static, not dynamic, analysis and that it is sometimes confused by edge cases in Ruby syntax. So be it.

But the very worst part of RDoc/ri is the output side. I wanted to be able to produce output in a variety of formats: HTML, plain text, XML, chm, LaTeX, and so on. So the analysis side of RDoc produces a data structure, and passes it to the output side. Here I made a stupid design decision. What RDoc generates internally is basically nested hashes. This has a couple of major advantages. In particular, there's a kind of fractal property when traversing it: it doesn't matter how deep you are in the structure—all you pass to the next routine down is a hash. But it has a major downside—it's a bitch to work with. If I were doing it again, I'd use Structs.

Finally, there's the generation of the output itself. I needed a templating system and, for what seemed like good reasons at the time, I wrote my own. It was only a handful of lines of code initially. It's still only a couple of hundred. It did a few things well, but ultimately it was ugly as sin. But now, as Erb has become something of a standard, it is definitely the right time to replace it.

RDoc and ri are, in a way, the ultimate stone soup. The code itself is not the output of the project. The real output is the thousands of libraries that are now self-documenting. Eric and the crew are busy on the stew, replacing the stones with real and tasty ingredients. When they are finished, we'll be able to use all that library documentation in remarkable new ways. So, a big thank you to Eric and Seattle.rb, and to all the Ruby coders who've created such a great base of documentation for us all.

Here's to RDoc 2.0.

January 02, 2008

Pipelines Using Fibers in Ruby 1.9--Part II

In the previous post, I developed a class called PipelineElement. This made it relatively easy to create elements that act as producers and filters in a programmatic pipeline. Using it, we could write Ruby 1.9 code like:

    10.times do
      puts (evens | multiples_of_three | multiples_of_seven).resume
    end

The construct in the loop is a pipeline containing three chunks of code: a generator of even numbers, a filter that only passes multiples of three, and another filter that passes multiples of seven. Numbers are passed from the producer to the first filter, and then from that filter to the next, until finally popping out and being made available to puts.

However, creating these pipeline elements is still something of a pain. It turns out that we can simplify things when it comes to creating filters. In the implementation I'll show here, we'll only handle the case of simple transforming filters—filters that take an input, do something to it, and write the result to the filter chain.

Let's revisit the PipelineElement class

    class PipelineElement

       attr_accessor :source

       def initialize
         @fiber_delegate = Fiber.new do
           process
         end
       end

       def |(other)
         other.source = self
         other
       end

       def resume
         @fiber_delegate.resume
       end

       def process
         while value = input
           handle_value(value)
         end
       end

       def handle_value(value)
         output(value)
       end

       def input
         source.resume
       end

       def output(value)
         Fiber.yield(value)
       end
     end

The process method is the driving loop. It reads the next input from the pipeline, then calls handle_value to deal with it. In the base class, handle_value simply echoes the input to the output-real filters subclass PipelineElement and subclass this method.

Let's make a small change to the handle_value method.

    def handle_value(value)
      output(transform(value))
    end

    def transform(value)
      value
    end

By doing this, we've split the transformation of the incoming value into a separate method. And the work done by this method no longer uses any of the state in the PipelineElement object, which means we can also do it in a block in the caller's context. Let's change our PipelineElement class to allow this. We'll have the constructor take an optional block, and we'll use that block in preference to the transform. Here's another listing, showing just the changed methods.

    class PipelineElement

      def initialize(&block)
        @transformer = block || method(:transform)
        @fiber_delegate = Fiber.new do
          process
        end
      end

      # ...

      def handle_value(value)
        output(@transformer.call(value))
      end
    end

This illustrates a cool (and underused) feature of Ruby. Method objects (created with the method(...) call) are duck-typed with proc objects: we can use .call(params) on both. This is a great way of letting users of a class change its behavior either by subclassing and overriding a method, or by simply passing in a block.

With this change in place, we can now write transforming filters using blocks. This is a lot more compact that the previous subclassing approach.

    class Evens < PipelineElement
      def process
        value = 0
        loop do
          output(value)
          value += 2
        end
      end
    end

    evens = Evens.new

    tripler     = PipelineElement.new {|val| val * 3}
    incrementer = PipelineElement.new {|val| val + 1}

    5.times do
      puts (evens | tripler | incrementer ).resume
    end

This outputs 1, 7, 13, 19, and 25.

Different Kinds of Filter

This approach works well if all we want is transforming filters. But what if we would also like to simplify filters that either pass of don't pass values based on some criteria? A block would seem like a great way of specifying the condition, but we've already used our one block parameter up. Subclassing to the rescue. We can create two subclasses, Transformer and Filter. One sets the @transformer instance variable to any block it is passed. The other sets @filter. Here's the relevant code:

    class PipelineElement

      attr_accessor :source

      def initialize(&block)
        @transformer  ||= method(:transform)
        @filter       ||= method(:filter)
        @fiber_delegate = Fiber.new do
          process
        end
      end

      # ...

      def handle_value(value)
        output(@transformer.call(value)) if @filter.call(value)
      end

      def transform(value)
        value
      end

      def filter(value)
        true
      end
    end

    class Transformer < PipelineElement
      def initialize(&block)
        @transformer = block
        super
      end
    end

    class Filter < PipelineElement
      def initialize(&block)
        @filter = block
        super
      end
    end

Thus equipped, we can write:

    tripler          = Transformer.new {|val| val * 3}
    incrementer      = Transformer.new {|val| val + 1}
    multiple_of_five = Filter.new {|val| val % 5 == 0}

    5.times do
      puts (evens | tripler | incrementer | multiple_of_five ).resume
    end

Moving The Blocks Inline

Our final hack lets us move the blocks directly into the pipeline.

Let's look at the actual pipeline code:

    puts (evens | tripler | incrementer | multiple_of_five ).resume

Those pipe characters are simply calls to the | method in class PipelineElement. And methods can take block arguments, right? So what stops us writing

    puts (evens | {|v| v*3} | {|v| v+1} | multiple_of_five ).resume

It turns out that Ruby stops us. The brace characters are taken to be hash parameters, not blocks, so Ruby gets its knickers in a twist. Fortunately, that's easily fixed by making the method calls explicit.

    puts (evens .| {|v| v*3} .| {|v| v+1} .| multiple_of_five ).resume

Now we just need to make the | method accept an optional block. If the block is present, we use it to create a new transformer.

    def |(other=nil, &block)
      other = Transformer.new(&block) if block
      other.source = self
      other
    end

Ruby 1.9 lets you chain method calls across lines, so we can tidy up our pipeline visually.

    5.times do
      puts (evens 
            .| {|v| v*3}
            .| {|v| v+1}
            .| multiple_of_five 
           ).resume
    end

A Palindrome Finder

Let's finish with another trivial example. We'll create a generic producer class that takes a collection and passes it, one element at a time, into the pipeline.

    class Pump < PipelineElement
      def initialize(source)
        @source = source
        super()
      end
      def process
        @source.each {|item| Fiber.yield item}
        nil
      end
    end

Now we can write a simple palindrome finder (a palindrome is a word which is the same when spelled backwards).

    words = Pump.new %w{Madam, the civic radar rotator is not level.}
    is_palindrome = Filter.new {|word| word == word.reverse}

    pipeline = words .| {|word| word.downcase.tr("^a-z", '') } .| is_palindrome

    while word = pipeline.resume
      puts word
    end

This outputs: madam, civic, radar, rotator, level.

But what if we instead want to show each word in the input stream, and flag it if it is a palindrome? That's easily done, but we won't do it the easy way. Instead, let's show a more convoluted method, because it might be useful in the general case.

There's no law to say that a transformer that receives a string as input has to write a string as output. It could, if it wanted to, write an array. Or a structure. So we could write:

    WordInfo = Struct.new(:original, :forwards, :backwords)

    words = Pump.new %w{Madam, the civic radar rotator is not level.}

    normalize = Transformer.new {|word| [word, word.downcase.tr("^a-z", '')] }

    to_word_info = Transformer.new do |word, normalized|
      reversed = normalized.reverse
      WordInfo.new(word, normalized, reversed)
    end

    formatter = Transformer.new do |word_info|
      if word_info.forwards == word_info.backwords
        "'#{word_info.original}' is a palindrome"
      else
        "'#{word_info.original}' is not a palindrome"
      end
    end

    pipeline = words | normalize | to_word_info | formatter

    while word = pipeline.resume
      puts word
    end

This outputs

    'Madam,' is a palindrome
    'the' is not a palindrome
    'civic' is a palindrome
    'radar' is a palindrome
    'rotator' is a palindrome
    'is' is not a palindrome
    'not' is not a palindrome
    'level.' is a palindrome

So, What's the Point?

Is this a great way of writing a palindrome finder? Not really. But...

What we've done here is turned the way a program works on it's head. We've written chunks of isolated code, each of which either filters or transforms an input. We've then independently knitted these chunks together. That's a high degree of decoupling. We can also leave it until runtime to determine what gets put into the pipeline (and the order that it appears in the pipeline), which means we can move more power into the hands of our users.

Could we have done all this without Fibers? Of course. Could we do it without Ruby 1.9? Absolutely. But sometimes factors come together which lead us to experiment with new ways of thinking about our code.

This pipeline stuff is not revolutionary, and it isn't generally applicable. But it's fun to play with. And, for me, that's the main thing.

A Wee Postscript

All this content is stuff that I decided not to include in the third edition of the PickAxe. It didn't work in the section on fibers, because it uses programming techniques not yet covered. It didn't work later because, as an example of various programming techniques, it is just too long.

December 31, 2007

Pipelines Using Fibers in Ruby 1.9

Users of the command line are familiar with the idea of building pipelines: a chain of simple commands strung together to the output of one becomes the input of the next. Using pipelines and a basic set of primitives, shell users can accomplish some sophisticated tasks. Here's a basic Unix shell pipeline that reports the ten longest .tip files in the current directory, based on the number of lines in each file:

 wc -l *.tip | grep \.tip | sort -n | tail -10

Let's see how to add something similar to Ruby. By the end of this set of two articles, we'll be able to write things like

puts (even_numbers | tripler | incrementer | multiple_of_five ).resume

and a palindrome finder using blocks:

words            = Pump.new %w{Madam, the civic radar rotator is not level.}
is_palindrome = Filter.new {|word| word == word.reverse}

pipeline = words .| {|word| word.downcase.tr("^a-z", '') } .| is_palindrome

while word = pipeline.resume
  puts word
end

Great code? Nope. But getting there is fun. And, who knows? The techniques might well be useful in your next project.

A Daily Dose of Fiber

Ruby 1.9 adds support for Fibers. At their most basic, let you create simple generators (much as you could do previously with blocks. Here's a trivial example: a fiber that generates successive Fibonacci numbers:

      fib = Fiber.new do
        f1 = f2 = 1
        loop do
          Fiber.yield f1
          f1, f2 = f2, f1 + f2
        end
      end

      10.times { puts fib.resume }

A fiber is somewhat like a thread, except you have control over when it gets scheduled. Initially, a fiber is suspended. When you resume it, it runs the block until the block finishes, or it hits a Fiber.yield. This is similar to a regular block yield: it suspends the fiber and passes control back to the resume. Any value passed to Fiber.yield becomes the value returned by resume.

By default, a fiber can only yield back to the code that resumed it. However, if you require the "fiber" library, Fibers get extended with a transfer method that allows one fiber to transfer control to another. Fibers then become fully fledged coroutines. However, we won't be needing all that power today.

Instead, let's get back to the idea of creating pipelines of functionality in code, much as you can create pipelines in the shell.

As a starting point, let's write two fibers. One's a generator—it creates a list of even numbers. The second is a consumer. All it does it accept values from the generator and print them. We'll make the consumer stop after printing 10 numbers.

    evens = Fiber.new do
      value = 0
      loop do
        Fiber.yield value
        value += 2
      end
    end

    consumer = Fiber.new do
      10.times do
        next_value = evens.resume
        puts next_value
      end
    end

    consumer.resume

Note how we had to use resume to kick off the consumer. Technically, the consumer doesn't have to be a Fiber, but, as we'll see in a minute, making it one gives us some flexibility.

As a next step, notice how we've created some coupling in this code. Our consumer fiber has the name of the evens generator coded into it. Let's wrap both fibers in a method, and pass the name of the generator into the consumer method.

    def evens
      Fiber.new do
        value = 0
        loop do
          Fiber.yield value
          value += 2
        end
      end
    end

    def consumer(source)
      Fiber.new do
        10.times do
          next_value = source.resume
          puts next_value
        end
      end
    end

    consumer(evens).resume

OK. Let's add one more fiber to the weave. We'll create a filter that only passes on numbers that are multiples of three. Again, we'll wrap it in a method.

    def evens
      Fiber.new do
        value = 0
        loop do
          Fiber.yield value
          value += 2
        end
      end
    end

    def multiples_of_three(source)
      Fiber.new do
        loop do
          next_value = source.resume
          Fiber.yield next_value if next_value % 3 == 0
        end
      end
    end

    def consumer(source)
      Fiber.new do
        10.times do
          next_value = source.resume
          puts next_value
        end
      end
    end

    consumer(multiples_of_three(evens)).resume

Running this, we get the output

0
6
12
18
. . .

This is getting cool. We write little chunks of code, and then combine them to get work done. Just like a pipeline. Except...

We can do better. First, the composition looks backwards. Because we're passing methods to methods, we write

    consumer(multiples_of_three(evens))

Instead, we'd like to write

    evens | multiples_of_three | consumer

Also, there's a fair amount of duplication in this code. Each of our little pipeline methods has the same overall structure, and each is coupled to the implementation of fibers. Let's see if we can fix this.

Wrapping Fibers

As is usual when we're refactoring towards a solution, we're about to get really messy. Don't worry, though. It will all wash off, and we'll end up with something a lot neater.

First, let's create a class that represents something that can appear in our pipeline. At it's heart is the process method. This reads something from the input side of the pipe, then "handles" that value. The default handling is to write that value to the output side of the pipeline, passing it on to the next element in the chain.

    class PipelineElement

      attr_accessor :source

      def initialize
        @fiber_delegate = Fiber.new do
          process
        end
      end

      def resume
        @fiber_delegate.resume
      end

      def process
        while value = input
          handle_value(value)
        end
      end

      def handle_value(value)
        output(value)
      end

      def input
        source.resume
      end

      def output(value)
        Fiber.yield(value)
      end
    end

When I first wrote this, I was tempted to make PipelineElement a subclass of Fiber, but that leads to coupling. In the end, the pipeline elements delegate to a separate Fiber object.

The first element of the pipeline doesn't receive any input from prior elements (because there are no prior elements), so we need to override its process method.

    class Evens < PipelineElement
       def process
         value = 0
         loop do
           output(value)
           value += 2
         end
       end
    end

    evens = Evens.new

Just to make things more interesting, we'll create a generic MultiplesOf filter, so we can filter based on any number, and not just 3:

    class MultiplesOf < PipelineElement
      def initialize(factor)
        @factor = factor
        super()
      end
      def handle_value(value)
        output(value) if value % @factor == 0
      end
    end

    multiples_of_three = MultiplesOf.new(3)
    multiples_of_seven = MultiplesOf.new(7)

Then we just knit it all together into a pipeline:

    multiples_of_three.source = evens
    multiples_of_seven.source = multiples_of_three

    10.times do
      puts multiples_of_seven.resume
    end

We get 0, 42, 84, 126, 168, and so on as output. (Any output stream that contains 42 must be correct, so no need for any unit tests here.)

But we're still a little way from our ideal of being able to pipe these puppies together. It's a good thing that Ruby let's us override the "|" operator. Up in class PipelineElement, define a new method:

    def |(other)
      other.source = self
      other
    end        

This allows us to write:

    10.times do
      puts (evens | multiples_of_three | multiples_of_seven).resume
    end

or even:

    pipeline = evens | multiples_of_three | multiples_of_seven

    10.times do
      puts pipeline.resume
    end

Cool, or what?

In The Next Thrilling Installment

The next post will take these basic ideas and tart them up a bit, allowing us to use blocks directly in pipelines. We'll also reveal why our PipelineElement class I just wrote is somewhat more complicated than might seem necessary. In the meantime, here's the full source of the code so far.

    class PipelineElement

      attr_accessor :source

      def initialize
        @fiber_delegate = Fiber.new do
          process
        end
      end

      def |(other)
        other.source = self
        other
      end

      def resume
        @fiber_delegate.resume
      end

      def process
        while value = input
          handle_value(value)
        end
      end

      def handle_value(value)
        output(value)
      end

      def input
        source.resume
      end

      def output(value)
        Fiber.yield(value)
      end
    end

    ##
    # The classes below are the elements in our pipeline
    #
     class Evens < PipelineElement
       def process
         value = 0
         loop do
           output(value)
           value += 2
         end
       end
     end

    class MultiplesOf < PipelineElement
      def initialize(factor)
        @factor = factor
        super()
      end
      def handle_value(value)
        output(value) if value % @factor == 0
      end
    end

    evens = Evens.new
    multiples_of_three = MultiplesOf.new(3)
    multiples_of_seven = MultiplesOf.new(7)

    pipeline = evens | multiples_of_three | multiples_of_seven

    10.times do
      puts pipeline.resume
    end

Now in Beta

  • Programming Ruby, 3rd Edition
    Third Edition, Covering Ruby 1.9, now available
My Photo

Pragmatic Stuff

Photos

  • www.flickr.com
    This is a Flickr badge showing public photos from pragdave tagged with pragdave_badge. Make your own badge here.

Site Search

  • Google Search

    The web
    PragDave