Twitter Should Move Away from Ruby
Oh dear. The chattering classes are at it, talking about how the Twitter folks are dissing Ruby by announcing the replacement of some Ruby code with Scala code.
Oh dear. The chattering classes are at it, talking about how the Twitter folks are dissing Ruby by announcing the replacement of some Ruby code with Scala code.
I love doing the Pragmatic Studios. Nicole, Mike, and Chad are all good friends, and that creates a really relaxed and fun atmosphere for what could otherwise be a pretty intense three days.
The next studio is in Denver at the end of January, and I'm really looking forward to it—living in Texas, I relish being able to see mountains and snow. I just wish I was a skier so I could have an excuse to take an extra day or two to head on up to a resort.
If you are planning on going, though, don't put it off—early registration (a $300 savings) ends in a week.
There's one bad side to the studios, though. Mike put together a video showing what studios are like. Notice anything? Food. Lots of it. Scarily good food. Must resist. Must…
All Ruby programmers regularly encounter the mystical error “syntax error, unexpected $end, expecting keyword_end.” We know what it means: we left off an end somewhere in the code. As Ruby compiled our source, it keeps track of nesting, and when it reached the end of file ($end), it was expecting to see one more end keyword, and none was there.
So, we trundle back through the source, and after a while discover we'd deleted just one too many lines during that last edit.
Ruby 1.9 makes that easier. For example, here's a source file:
class Example
def meth1
if Time.now.hours > 12
puts "Afternoon"
end
def meth2
# ...
end
end
Run it through Ruby 1.9, and you'll get the same old error message:
dave[RUBY3/Book 8:26:48*] ruby t.rb
t.rb:10: syntax error, unexpected $end, expecting keyword_end
But add the -w flag, and things get more interesting.
dave[RUBY3/Book 8:26:51*] ruby -w t.rb
t.rb:5: warning: mismatched indentations at 'end' with 'if' at 3
t.rb:9: warning: mismatched indentations at 'end' with 'def' at 2
t.rb:10: syntax error, unexpected $end, expecting keyword_end
It's the small things in life...
I've reorganized the regular expression content in the new Programming Ruby, and added some cool new advanced examples. This one's fairly straightforward, but I love the fact that I can now start refactoring my more complex patterns, removing duplication.
The stuff below is an extract from the unedited update. It'll appear in the next beta. It follows a discussion of named groups, \k and related stuff.
There’s a trick which allows us to write subroutines inside regular expressions. Recall that we can invoke a named group using \g<name>, and we define the group using (?<name>...). Normally, the definition of the group is itself matched as part of executing the pattern. However, if you add the suffix {0} to the group, it means “zero matches of this group,” so the group is not executed when first encountered.
sentence = %r{
(?<subject> cat | dog | gerbil ){0}
(?<verb> eats | drinks| generates ){0}
(?<object> water | bones | PDFs ){0}
(?<adjective> big | small | smelly ){0}
(?<opt_adj> (\g<adjective>\s)? ){0}
The\s\g<opt_adj>\g<subject>\s\g<verb>\s\g<opt_adj>\g<object>
}x
md = sentence.match("The cat drinks water")
puts "The subject is #{md[:subject]} and the verb is #{md[:verb]}"
md = sentence.match("The big dog eats smelly bones")
puts "The adjective in the second sentence is #{md[:adjective]}"
sentence =~ "The gerbil generates big PDFs"
puts "And the object in the last is #{$~[:object]}"
produces:
The subject is cat and the verb is drinks
The adjective in the second sentence is smelly
And the object in the last is PDFs
Cool, eh?
Ruby 1.9 adds a lot of features to Proc objects.
Currying is the ability to take a function that accepts n parameters and generate from it one of more functions with some parameter values already filled in. In RUby 1.9, you create a curry-able proc by calling the curry method on it. If you subsequently call this curried proc with fewer parameters than it expects, it will not execute. Instead, it returns a new proc with those parameters already bound.
Let's look at a trivial example. Here's a proc that simply adds two values:
plus = lambda {|a,b| a + b}
puts plus[1,2]
I'm using the [ ] syntax to invoke the proc with arguments, in this case 1 and 2. The code will print 3.
Now let's have some fun.
curried_plus = plus.curry # create two procs based on plus, but with the first parameter # already set to a value plus_two = curried_plus[2] plus_ten = curried_plus[10] puts plus_two[3] puts plus_ten[3]
On line 1, I create a curried version of the plus proc. I then call it twice, but both times I only pass it one parameter. This means it cannot execute the body. Instead, each time it returns a new proc which is like the original, but which has the first parameter preset to either 2 or 10. In the last two lines, I call these two new procs, supplying the missing parameter. This means they can execute normally, and the code outputs 5 and 13.
You can have a lot of fun with currying, but that's not why we're here today.
Over the weekend, Matz added a new method to the Proc class. You can now use Proc#=== as an alias for Proc.call. So, why on earth would you want to do that? Well, remember that === is used to match terms in a case statement. Over of the AimRed blog, they noted that this feature could be used to make the matching in case statements actually execute code. In their example, they manually added the === method to class Proc
class Proc
def ===( *parameters )
self.call( *parameters )
end
end
Then you can write something like
sunday = lambda{ |time| time.wday == 0 }
monday = lambda{ |time| time.wday == 1 }
# and so on...
case Time.now
when sunday
puts "Day of rest"
when monday
puts "work"
# ...
end
See how that works? As Ruby executes the case statement, it looks at each of the parameters of the when clauses in turn. For each, it invokes its === method, passing that method the original case discriminator (Time.now in this example). But with the new === method in class Proc, this will now execute the proc, passing it Time.now as a parameter.
While updating the PickAxe, I noticed that Matz liked this so much that it is now part of 1.9. And it means we can combine this trick with currying to write some fun code:
is_weekday = lambda {|day_of_week, time| time.wday == day_of_week}.curry
sunday = is_weekday[0]
monday = is_weekday[1]
tuesday = is_weekday[2]
wednesday = is_weekday[3]
thursday = is_weekday[4]
friday = is_weekday[5]
saturday = is_weekday[6]
case Time.now
when sunday
puts "Day of rest"
when monday, tuesday, wednesday, thursday, friday
puts "Work"
when saturday
puts "chores"
end
Is this incredibly efficient? Not really :) But it opens up quite an interesting set of possibilities.
I'm about to head off for a couple of weeks vacation, but I didn't want to leave folks waiting for new episodes in the Ruby Object Model and Metaprogramming screencast series, so Mike and I decided to release this week's and next week's episodes together.
Episode 4 looks at instance_ and class_eval, and lays the foundation for a whole bunch of metaprogramming to come. Episode 5 is probably my favorite so far (although I'm rather partial to the Public Service Announcement in #2). In #5, we take a fairly simply programming task and code it up nine different ways, using all of the metaprogramming techniques we've learned to date. It's pretty much pure coding for 36 minutes.
People asked for chapter markers, so we now have chapters in this series. People asked for Ogg support, so we've now got experimental Theora Ogg versions available alongside the Quicktime and iPhone/iPod formats.
I'm really liking this series—it's fun researching it, and fun learning the ins and outs of producing it.
Enjoy!
Dave
I've been teaching Ruby (and in particularly, metaprogramming Ruby) for almost 7 years now. And, in that time, I've gradually found ways of cutting through all the confusing stuff to the actual essentials. And when you do that, suddenly things get a lot simpler. I've always know that Ruby didn't really have class methods and singleton methods, for example, but until recently I didn't have a simple way to explain that.
Then, when preparing to give an Advanced Ruby Studio, my thinking crystalized. Metaprogramming in Ruby becomes simple to explain if you focus on four things:
self can only be changed by a method call with a receiver or by a class or module definition makes it easy to keep track of what's going on when metaprogramming.def defines its methods. Knowing what changes this makes it easier to know what's going on.
I tried this approach in a number of Studios, and refined it during some talks for RubyFools in Copenhagen and Oslo.
So Mike Clark, who's producing our new series of screencasts, started pushing me to put this description into video. Last week I finally cleared the decks enough to record the first three episodes.
First, I have to say it was a blast. I'd never recorded this many minutes of screencast before, and I was blown away by the amount of time it takes. I was also surprised at the level of detail involved, from microphone setup (which I messed up for a couple of segments) to color matching between codecs, it was fun to learn a whole new set of technologies.
I was also surprised at how hard it was to talk to a microphone. When we write books, we always try to write as if the reader was sitting there next to us. I tried to to the same approach with the screencasts, but it takes a whole new set of skills...
What I really liked was the way that I could live code examples to illustrate points. The first episode has maybe 50/50 code and exposition, and the second and third episodes are mostly code. And the code acts as a great skeleton on which to hang the concepts. Apple-R also keeps me honest.
So, if you're interested in how the Ruby object model really works, and want to improve your metaprogramming chops, why not check them out?
I'm slowly getting used to the new -> way of specifying lambdas in Ruby 1.9. I still feel that, as a notation, it could be clearer. (I'd personally like just plain backslash, because that looks pretty close to a real lambda character, but that's not going to happen.) But having punctuation, rather than the word lambda, makes a surprising difference to the way my eyes read code.
For example, you could write a method that acts like a while loop.
def my_while(cond, &body)
while cond.call
body.call
end
end
In Ruby 1.8 and 1.9, you could call this as
a = 0
my_while lambda { a < 5 } do
puts a
a += 1
end
But my brain finds that seriously hard to scan. The Ruby 1.9 -> syntax makes it slightly (just slightly, mind you) better:
a = 0
my_while -> { a < 5 } do
puts a
a += 1
end
I suspect this is just a question of time. In a year or so, we'll parse the -> syntax in our heads without thinking twice. Once it does become natural, I suspect we'll find all sorts of new uses for procs.
Playing with the shoulda testing framework, I came across a small but useful trick. Because the tests are written inside closures, local class variables are available inside should blocks. They're only evaluated once, so they don't take the place of setup blocks, but they are a nice way of storing test-wide values. Somehow, I like the look of this better than using instance variables or constants—the tests seem to be more uniform and balanced.
birth = Date.parse("2003-05-02")
now = Date.parse("2008-06-15")
assert_equal 5, AgeCalculator.age_given_dates(now, birth)
end
now = Date.parse("2008-06-15")
assert_equal 5, AgeCalculator.age_given_dates(now, birth)
end
now = Date.parse("2008-04-15")
assert_equal 4, AgeCalculator.age_given_dates(now, birth)
end
now = Date.parse("2008-05-01")
assert_equal 4, AgeCalculator.age_given_dates(now, birth)
end
end

I have a blast teaching at the Pragmatic Studios.. Mike and Nicole run excellent courses, the format is incredible, and the students enthusiastic. I love presenting with two or three other instructors—it keeps the energy level high, and I always manage to learn a lot from them.
Which is why it's a shame I won't be able to be at two of the upcoming Studios, Advanced Rails in Denver on June 12–14 and Test-Driven Development with Rails on June 9–11. I'm doubly bummed because the latter is taught by Jim Weirich and Joe O'Brien, both old friends from the Ruby community. I've never heard Joe teach, but I've often sat and admired Jim as he takes some complex subject and reduces in to its basics in the most incredibly entertaining ways. If you're in to Rails, and want to take your testing to the next level, I believe there are still some seats available. And if you do get there, tell 'em I said hi! I'm jealous.
In many ways, testing software is like going out and getting exercise. You know you should do it, and you know it does you good, but it's also pretty easy to find an excuse to skip it (I'll make it up tomorrow).
So anything that makes testing easier is good, because it cuts down on the excuses not to do it.
One thing I've never really liked about the conventional xUnit-style testing frameworks was the setup and teardown structure. In these frameworks, a test case is a class, and setup and teardown are implemented by methods in that class. Each test is also a method, so the basic flow is
for each test method in the class
run setup
run the test method
run teardown
end
Nice and simple. Each test method got the benefit of a standard environment created by the setup method, and the teardown method got the job of tidying up after.
Except… when I'm writing tests, I typically want to set up lots of different scenarios. I'll want A and B and C, then A and B but not C, then A and not B, then A and D, and so on. I had two choices—write lots of test case classes, using subclassing to inherit common setup behavior, or write per-test method setup code (often factored out into helpers). In the end, I almost always did the latter, And that was tedious, and it made it harder to see the tests for the setup code.
I flirted with RSpec. Its spec framework seemed to have what I wanted. But I just couldn't get myself to enjoy using it. (I think it's a cat people/dog people kind of thing)
Then, a couple of weeks back, Mike Clark and Chad Fowler introduced me to shoulda. Shoulda isn't a testing framework. Instead, it extends Ruby's existing Test::Unit framework with the idea of test contexts. A context is a section of your test case where all the test methods have something in common. At it simplest, a context could be simply used as an annotation device (and, yes, this is a silly example):
context "My factorial method" do
should "return 1 when passed 0" do
assert_equal 1, fact(0)
end
should "return 1 when passed 1" do
assert_equal 1, fact(1)
end
should "return 6 when passed 3" do
assert_equal 6, fact(3)
end
end
The stuff in a context can share common setup code—just write a setup block.
class CartTest < Test::Unit::TestCase
context "An empty cart" do
setup do
@cart = orders(:wilmas_empty_cart)
end
should "have no line items" do
assert_equal 0, @cart.line_items.size
end
should "have a zero price" do
assert_equal 0, @cart.price
end
end
context "Some other context..." ...
end
end
So now, within a single test case I can set up multiple contexts, and each context can have its own environment.
But, take it back to my original problem. I often want to set up hierarchies of related environments for my tests. The shaoulda code handles this wonderfully, because it lets me nest contexts. For example, I'm adding a feature to our store that gives customers some additional information if, during checkout, their credit card transaction was initially rejected because the address was wrong, and was then accepted when they fixed the address. I wanted two tests, one without the prior address error, and one with.
To set up this environment, I needed to set up a shopping cart, create a dummy response from our payment gateway, and post that response to the application. In the case of the prior address error, I also wanted to inject an entry containing that error into the transactions associated with the order prior to generating the response.
With shoulda, I simply created some nested contexts. The top level context did the shared setup, and the inner contexts then set up appropriate environments for their tests. It looked like this:
context "Checking out" do
setup do
@cart = cart_named(:freds_full_cart)
@cart.prepare_for_store_authorize!
@params = approved_authnet_response(@cart)
end
context "with no AVS errors in CC transaction history" do
setup do
post :post_from_authnet_authorize, @params
end
should_redirect_to "{:action => :receipt}"
end
context "with AVS errors in CC transaction history" do
setup do
avs_error = CcTransaction.new(:response_code => 2, :response_reason_code => 27)
@cart.cc_transactions << avs_error
post :post_from_authnet_authorize, @params
end
should_redirect_to "{:action => :explain_avs_mismatch}"
end
end
The outer setup gets run before the execution of each of the inner contexts. And the setup in the inner contexts gets run when running that context. And shoulda keeps track of it all, so I get very natural error messages if an assertion fails. For example, if the test in the second context above fails, I'd get
Checking out with AVS errors in CC transaction history should
redirect to "{:action => :explain_avs_mixsmatch}".
So, now, I can finally set up my hierarchies of test environments in a natural way. It isn't revolutionary. It's just one less excuse for not testing…
Here's a top-level overview of some of the changes to date in the standard library that comes with Ruby 1.9. (These are the libraries that you get preinstalled with Ruby, but that you have to require into your code.)
base64 library has been removed. Use Array#pack and String#unpack instead.Complex and Rational libraries are now built in to the interpreter. However, requiring the external libraries adds additional functionally. In the case of Rational, this functionality is minimal.CMath library has been added.Enumerator library is now built in.Fiber library (adds coroutine support to fibers).ftools (replaced by fileutils).Generator library has been removed (use Fibers).irb from inside applications.jcode is removed in favor of built-in encoding support.json library is added.matrix library no longer requires that you include mathn.mutex library is now built in.parsedate has been removed. The Date class handles most of its functionality.readbytes has been removed. IO now supports the method directly.require_relative added.Ripper.SecureRandom.shell library, as it seems more like a curiosity than something folks would use (and it's broken under 1.9).soap library is removed.sync library. It is broken under 1.9, and the monitor library seems to be cleaner.Win32API is now deprecated in favor of using the DL library.It's interesting to me just how much is still changing in Ruby 1.9. But, as I use it more and more, it's also gratifying to see how some of the new idioms make coding just that little sweeter.
I just pushed a new beta of the PickAxe Third Edition with all the library changes.
One of the fun things about updating the PickAxe is getting to come up with examples to show the various APIs in action. Here's a very silly example of using Ripper's event-based API to extract comments that are associated with basic class definitions. It clearly has holes (it doesn't handle class A::B::C, for instance) but it's fairly easy to see how to add a proper state machine and produce something that might be interesting to play with...
require 'ripper'
# This class handles parser events, extracting
# comments and attaching them to class definitions
class BabyRDoc < Ripper::Filter
def initialize(*)
super
reset_state
end
def on_default(event, token, output)
reset_state
output
end
def on_sp(token, output) output end
alias on_nil on_sp
def on_comment(comment, output)
@comment << comment.sub(/^\s*#\s*/, " ")
output
end
def on_kw(name, output)
@expecting_class_name = (name == 'class')
output
end
def on_const(name, output)
if @expecting_class_name
output << "#{name}:\n"
output << @comment
end
reset_state
output
end
private
def reset_state
@comment = ""
@expecting_class_name = false
end
end
BabyRDoc.new(File.read(__FILE__)).parse(STDOUT)
Run this with Ruby 1.9 (or, I guess, 1.8 with Ripper installed), and you'll see
BabyRDoc:
This class handles parser events, extracting
comments and attaching them to class definitions
Ruby 1.9 allows you to specify the character encodings of I/O streams, strings, regexps, symbols, and so on. It also lets you specify the encoding of individual source files (and a complete application can be built from many files, each with different character encodings). Expect to start seeing a rash of obscure source code, at least until the initial excitement abates and cooler thinking prevails.
In the meantime, we can get away with
# encoding: utf-8
require 'mathn'
class Numeric
def ℃
(self - 32) * 5/9
end
def ℉
self * 9/5 + 32
end
end
puts 212.℃
puts 100.℉
Or, for those who'd like a peek at the start of a road that eventually leads to madness:
alias ✎ puts
✎ 212.℃
✎ 100.℉
I'm betting this post displays badly on about 50% of the machines that are used to view it. Which is reason enough to tread very lightly down this path…
rb_define_method calls (and their friends) and then reading the C implementation of the corresponding methods. Many methods are unchanged from 1.8. But, at the same time, many have changed. Often they take an additional parameter, or return an Enumerator where previously they required a block. Then there are the new classes (something like 6 of them) and new methods. (It looks like there are over 200 new built-in methods in the current Ruby 1.9).
All in all, I count something like 300 [1.9] flags in the new library reference. Some flag stuff is as trivial as a change of a default return type, while others flag entire new classes.
It's incredibly time-consuming work, and I'm constantly grumbling while doing it. But I come out the other end knowing a whole bunch about the library, and with a deeper respect for the folks who maintain it.
Just when I thought I'd finished documenting the standard library for the new PickAxe, I did one last svn up of the Ruby interpreter source and discovered that the Complex and Rational classes are now builtins—no need to require the library to get the basic functionality. The change also affects a number of other built-in classes (you can now say nil.to_c, for example). I'm not 100% sure I agree with rolling in Complex, but the addition of rational numbers is a welcome change.
Unfortunately it's back to the drawing board on my plans to release a new beta today...
Eric Hodel is giving RDoc some love. You can't imagine how happy that makes me.
When I first wrote RDoc, I was trying to find a way of solving two problems:
1. Adding comments to the largely uncommented C source of Ruby, and
2. Providing a means for library writers easily to document their creations.
I'd just finished the PickAxe, and I wanted to take the work Andy and I had done reverse engineering the Ruby API and add it back into the interpreter source code.
I set myself constraints with RDoc and ri:
* it should produce at least some documentation even on totally uncommented source files
* it should extract tacit information from the program source (for example guessing good names for block parameters by looking for yield statements inside methods)
* the markup in the source files should be unobtrusive. In the typical case, someone reading the source should not even notice that the comments follow markup conventions
* it should only use libraries that come pre-installed with Ruby
* the documentation it produced should be portable across machines and architectures
* it should allow incremental documentation. Libraries that you install over time can add methods to existing classes. As you add these libraries, the method lists in the classes you extend should grow to reflect the changes
* it should be secure. People pushed many times to add the ability to execute code during the documentation process. I didn't want to have code run on an end user's machine during a process that ostensibly was simply installing documentation (particularly as these installations often ran as root)
* it should be throw-away
The last one might be a surprise, but the real objective of RDoc wasn't the tool. The real objective was to set a standard that meant that future libraries would get documented in a consistent and usable way. And so RDoc and ri compromised like crazy. Rather than a database or some complex binary format, they used a set of directory trees in the user's filesystem to store documentation. This documentation, which is basically a set of Ruby objects, was stored using YAML, rather than marshaled objects or Ruby source. Even though YAML is slow, it is more portable than marshaled objects, and more secure than Ruby source. The parser in RDoc was a wild hack on the parser in irb. This means it performs a static, not dynamic, analysis and that it is sometimes confused by edge cases in Ruby syntax. So be it.
But the very worst part of RDoc/ri is the output side. I wanted to be able to produce output in a variety of formats: HTML, plain text, XML, chm, LaTeX, and so on. So the analysis side of RDoc produces a data structure, and passes it to the output side. Here I made a stupid design decision. What RDoc generates internally is basically nested hashes. This has a couple of major advantages. In particular, there's a kind of fractal property when traversing it: it doesn't matter how deep you are in the structure—all you pass to the next routine down is a hash. But it has a major downside—it's a bitch to work with. If I were doing it again, I'd use Structs.
Finally, there's the generation of the output itself. I needed a templating system and, for what seemed like good reasons at the time, I wrote my own. It was only a handful of lines of code initially. It's still only a couple of hundred. It did a few things well, but ultimately it was ugly as sin. But now, as Erb has become something of a standard, it is definitely the right time to replace it.
RDoc and ri are, in a way, the ultimate stone soup. The code itself is not the output of the project. The real output is the thousands of libraries that are now self-documenting. Eric and the crew are busy on the stew, replacing the stones with real and tasty ingredients. When they are finished, we'll be able to use all that library documentation in remarkable new ways. So, a big thank you to Eric and Seattle.rb, and to all the Ruby coders who've created such a great base of documentation for us all.
Here's to RDoc 2.0.
In the previous post, I developed a class called PipelineElement. This made it relatively easy to create elements that act as producers and filters in a programmatic pipeline. Using it, we could write Ruby 1.9 code like:
10.times do
puts (evens | multiples_of_three | multiples_of_seven).resume
end
The construct in the loop is a pipeline containing three chunks of code: a generator of even numbers, a filter that only passes multiples of three, and another filter that passes multiples of seven. Numbers are passed from the producer to the first filter, and then from that filter to the next, until finally popping out and being made available to puts.
However, creating these pipeline elements is still something of a pain. It turns out that we can simplify things when it comes to creating filters. In the implementation I'll show here, we'll only handle the case of simple transforming filters—filters that take an input, do something to it, and write the result to the filter chain.
Let's revisit the PipelineElement class
class PipelineElement
attr_accessor :source
def initialize
@fiber_delegate = Fiber.new do
process
end
end
def |(other)
other.source = self
other
end
def resume
@fiber_delegate.resume
end
def process
while value = input
handle_value(value)
end
end
def handle_value(value)
output(value)
end
def input
source.resume
end
def output(value)
Fiber.yield(value)
end
end
The process method is the driving loop. It reads the next input from the pipeline, then calls handle_value to deal with it. In the base class, handle_value simply echoes the input to the output-real filters subclass PipelineElement and subclass this method.
Let's make a small change to the handle_value method.
def handle_value(value)
output(transform(value))
end
def transform(value)
value
end
By doing this, we've split the transformation of the incoming value into a separate method. And the work done by this method no longer uses any of the state in the PipelineElement class to allow this. We'll have the constructor take an optional block, and we'll use that block in preference to the transform. Here's another listing, showing just the changed methods.
class PipelineElement
def initialize(&block)
@transformer = block || method(:transform)
@fiber_delegate = Fiber.new do
process
end
end
# ...
def handle_value(value)
output(@transformer.call(value))
end
end
This illustrates a cool (and underused) feature of Ruby. Method objects (created with the method(...) call) are duck-typed with proc objects: we can use .call(params) on both. This is a great way of letting users of a class change its behavior either by subclassing and overriding a method, or by simply passing in a block.
With this change in place, we can now write transforming filters using blocks. This is a lot more compact that the previous subclassing approach.
class Evens < PipelineElement
def process
value = 0
loop do
output(value)
value += 2
end
end
end
evens = Evens.new
tripler = PipelineElement.new {|val| val * 3}
incrementer = PipelineElement.new {|val| val + 1}
5.times do
puts (evens | tripler | incrementer ).resume
end
This outputs 1, 7, 13, 19, and 25.
This approach works well if all we want is transforming filters. But what if we would also like to simplify filters that either pass of don't pass values based on some criteria? A block would seem like a great way of specifying the condition, but we've already used our one block parameter up. Subclassing to the rescue. We can create two subclasses, Transformer and Filter. One sets the @transformer instance variable to any block it is passed. The other sets @filter. Here's the relevant code:
class PipelineElement
attr_accessor :source
def initialize(&block)
@transformer ||= method(:transform)
@filter ||= method(:filter)
@fiber_delegate = Fiber.new do
process
end
end
# ...
def handle_value(value)
output(@transformer.call(value)) if @filter.call(value)
end
def transform(value)
value
end
def filter(value)
true
end
end
class Transformer < PipelineElement
def initialize(&block)
@transformer = block
super
end
end
class Filter < PipelineElement
def initialize(&block)
@filter = block
super
end
end
Thus equipped, we can write:
tripler = Transformer.new {|val| val * 3}
incrementer = Transformer.new {|val| val + 1}
multiple_of_five = Filter.new {|val| val % 5 == 0}
5.times do
puts (evens | tripler | incrementer | multiple_of_five ).resume
end
Our final hack lets us move the blocks directly into the pipeline.
Let's look at the actual pipeline code:
puts (evens | tripler | incrementer | multiple_of_five ).resume
Those pipe characters are simply calls to the | method in class PipelineElement. And methods can take block arguments, right? So what stops us writing
puts (evens | {|v| v*3} | {|v| v+1} | multiple_of_five ).resume
It turns out that Ruby stops us. The brace characters are taken to be hash parameters, not blocks, so Ruby gets its knickers in a twist. Fortunately, that's easily fixed by making the method calls explicit.
puts (evens .| {|v| v*3} .| {|v| v+1} .| multiple_of_five ).resume
Now we just need to make the | method accept an optional block. If the block is present, we use it to create a new transformer.
def |(other=nil, &block)
other = Transformer.new(&block) if block
other.source = self
other
end
Ruby 1.9 lets you chain method calls across lines, so we can tidy up our pipeline visually.
5.times do
puts (evens
.| {|v| v*3}
.| {|v| v+1}
.| multiple_of_five
).resume
end
Let's finish with another trivial example. We'll create a generic producer class that takes a collection and passes it, one element at a time, into the pipeline.
class Pump < PipelineElement
def initialize(source)
@source = source
super()
end
def process
@source.each {|item| Fiber.yield item}
nil
end
end
Now we can write a simple palindrome finder (a palindrome is a word which is the same when spelled backwards).
words = Pump.new %w{Madam, the civic radar rotator is not level.}
is_palindrome = Filter.new {|word| word == word.reverse}
pipeline = words .| {|word| word.downcase.tr("^a-z", '') } .| is_palindrome
while word = pipeline.resume
puts word
end
This outputs: madam, civic, radar, rotator, level.
But what if we instead want to show each word in the input stream, and flag it if it is a palindrome? That's easily done, but we won't do it the easy way. Instead, let's show a more convoluted method, because it might be useful in the general case.
There's no law to say that a transformer that receives a string as input has to write a string as output. It could, if it wanted to, write an array. Or a structure. So we could write:
WordInfo = Struct.new(:original, :forwards, :backwords)
words = Pump.new %w{Madam, the civic radar rotator is not level.}
normalize = Transformer.new {|word| [word, word.downcase.tr("^a-z", '')] }
to_word_info = Transformer.new do |word, normalized|
reversed = normalized.reverse
WordInfo.new(word, normalized, reversed)
end
formatter = Transformer.new do |word_info|
if word_info.forwards == word_info.backwords
"'#{word_info.original}' is a palindrome"
else
"'#{word_info.original}' is not a palindrome"
end
end
pipeline = words | normalize | to_word_info | formatter
while word = pipeline.resume
puts word
end
This outputs
'Madam,' is a palindrome
'the' is not a palindrome
'civic' is a palindrome
'radar' is a palindrome
'rotator' is a palindrome
'is' is not a palindrome
'not' is not a palindrome
'level.' is a palindrome
Is this a great way of writing a palindrome finder? Not really. But...
What we've done here is turned the way a program works on it's head. We've written chunks of isolated code, each of which either filters or transforms an input. We've then independently knitted these chunks together. That's a high degree of decoupling. We can also leave it until runtime to determine what gets put into the pipeline (and the order that it appears in the pipeline), which means we can move more power into the hands of our users.
Could we have done all this without Fibers? Of course. Could we do it without Ruby 1.9? Absolutely. But sometimes factors come together which lead us to experiment with new ways of thinking about our code.
This pipeline stuff is not revolutionary, and it isn't generally applicable. But it's fun to play with. And, for me, that's the main thing.
All this content is stuff that I decided not to include in the third edition of the PickAxe. It didn't work in the section on fibers, because it uses programming techniques not yet covered. It didn't work later because, as an example of various programming techniques, it is just too long.
Users of the command line are familiar with the idea of building pipelines: a chain of simple commands strung together to the output of one becomes the input of the next. Using pipelines and a basic set of primitives, shell users can accomplish some sophisticated tasks. Here's a basic Unix shell pipeline that reports the ten longest .tip files in the current directory, based on the number of lines in each file:
wc -l *.tip | grep \.tip | sort -n | tail -10
Let's see how to add something similar to Ruby. By the end of this set of two articles, we'll be able to write things like
puts (even_numbers | tripler | incrementer | multiple_of_five ).resume
and a palindrome finder using blocks:
words = Pump.new %w{Madam, the civic radar rotator is not level.}
is_palindrome = Filter.new {|word| word == word.reverse}
pipeline = words .| {|word| word.downcase.tr("^a-z", '') } .| is_palindrome
while word = pipeline.resume
puts word
end
Great code? Nope. But getting there is fun. And, who knows? The techniques might well be useful in your next project.
Ruby 1.9 adds support for Fibers. At their most basic, let you create simple generators (much as you could do previously with blocks. Here's a trivial example: a fiber that generates successive Fibonacci numbers:
fib = Fiber.new do
f1 = f2 = 1
loop do
Fiber.yield f1
f1, f2 = f2, f1 + f2
end
end
10.times { puts fib.resume }
A fiber is somewhat like a thread, except you have control over when it gets scheduled. Initially, a fiber is suspended. When you resume it, it runs the block until the block finishes, or it hits a Fiber.yield. This is similar to a regular block yield: it suspends the fiber and passes control back to the resume. Any value passed to Fiber.yield becomes the value returned by resume.
By default, a fiber can only yield back to the code that resumed it. However, if you require the "fiber" library, Fibers get extended with a transfer method that allows one fiber to transfer control to another. Fibers then become fully fledged coroutines. However, we won't be needing all that power today.
Instead, let's get back to the idea of creating pipelines of functionality in code, much as you can create pipelines in the shell.
As a starting point, let's write two fibers. One's a generator—it creates a list of even numbers. The second is a consumer. All it does it accept values from the generator and print them. We'll make the consumer stop after printing 10 numbers.
evens = Fiber.new do
value = 0
loop do
Fiber.yield value
value += 2
end
end
consumer = Fiber.new do
10.times do
next_value = evens.resume
puts next_value
end
end
consumer.resume
Note how we had to use resume to kick off the consumer. Technically, the consumer doesn't have to be a Fiber, but, as we'll see in a minute, making it one gives us some flexibility.
As a next step, notice how we've created some coupling in this code. Our consumer fiber has the name of the evens generator coded into it. Let's wrap both fibers in a method, and pass the name of the generator into the consumer method.
def evens
Fiber.new do
value = 0
loop do
Fiber.yield value
value += 2
end
end
end
def consumer(source)
Fiber.new do
10.times do
next_value = source.resume
puts next_value
end
end
end
consumer(evens).resume
OK. Let's add one more fiber to the weave. We'll create a filter that only passes on numbers that are multiples of three. Again, we'll wrap it in a method.
def evens
Fiber.new do
value = 0
loop do
Fiber.yield value
value += 2
end
end
end
def multiples_of_three(source)
Fiber.new do
loop do
next_value = source.resume
Fiber.yield next_value if next_value % 3 == 0
end
end
end
def consumer(source)
Fiber.new do
10.times do
next_value = source.resume
puts next_value
end
end
end
consumer(multiples_of_three(evens)).resume
Running this, we get the output
0
6
12
18
. . .
This is getting cool. We write little chunks of code, and then combine them to get work done. Just like a pipeline. Except...
We can do better. First, the composition looks backwards. Because we're passing methods to methods, we write
consumer(multiples_of_three(evens))
Instead, we'd like to write
evens | multiples_of_three | consumer
Also, there's a fair amount of duplication in this code. Each of our little pipeline methods has the same overall structure, and each is coupled to the implementation of fibers. Let's see if we can fix this.
As is usual when we're refactoring towards a solution, we're about to get really messy. Don't worry, though. It will all wash off, and we'll end up with something a lot neater.
First, let's create a class that represents something that can appear in our pipeline. At it's heart is the process method. This reads something from the input side of the pipe, then "handles" that value. The default handling is to write that value to the output side of the pipeline, passing it on to the next element in the chain.
class PipelineElement
attr_accessor :source
def initialize
@fiber_delegate = Fiber.new do
process
end
end
def resume
@fiber_delegate.resume
end
def process
while value = input
handle_value(value)
end
end
def handle_value(value)
output(value)
end
def input
source.resume
end
def output(value)
Fiber.yield(value)
end
end
When I first wrote this, I was tempted to make PipelineElement a subclass of Fiber, but that leads to coupling. In the end, the pipeline elements delegate to a separate Fiber object.
The first element of the pipeline doesn't receive any input from prior elements (because there are no prior elements), so we need to override its process method.
class Evens < PipelineElement
def process
value = 0
loop do
output(value)
value += 2
end
end
end
evens = Evens.new
Just to make things more interesting, we'll create a generic MultiplesOf filter, so we can filter based on any number, and not just 3:
class MultiplesOf < PipelineElement
def initialize(factor)
@factor = factor
super()
end
def handle_value(value)
output(value) if value % @factor == 0
end
end
multiples_of_three = MultiplesOf.new(3)
multiples_of_seven = MultiplesOf.new(7)
Then we just knit it all together into a pipeline:
multiples_of_three.source = evens
multiples_of_seven.source = multiples_of_three
10.times do
puts multiples_of_seven.resume
end
We get 0, 42, 84, 126, 168, and so on as output. (Any output stream that contains 42 must be correct, so no need for any unit tests here.)
But we're still a little way from our ideal of being able to pipe these puppies together. It's a good thing that Ruby let's us override the "|" operator. Up in class PipelineElement, define a new method:
def |(other)
other.source = self
other
end
This allows us to write:
10.times do
puts (evens | multiples_of_three | multiples_of_seven).resume
end
or even:
pipeline = evens | multiples_of_three | multiples_of_seven
10.times do
puts pipeline.resume
end
Cool, or what?
The next post will take these basic ideas and tart them up a bit, allowing us to use blocks directly in pipelines. We'll also reveal why our PipelineElement class I just wrote is somewhat more complicated than might seem necessary. In the meantime, here's the full source of the code so far.
class PipelineElement
attr_accessor :source
def initialize
@fiber_delegate = Fiber.new do
process
end
end
def |(other)
other.source = self
other
end
def resume
@fiber_delegate.resume
end
def process
while value = input
handle_value(value)
end
end
def handle_value(value)
output(value)
end
def input
source.resume
end
def output(value)
Fiber.yield(value)
end
end
##
# The classes below are the elements in our pipeline
#
class Evens < PipelineElement
def process
value = 0
loop do
output(value)
value += 2
end
end
end
class MultiplesOf < PipelineElement
def initialize(factor)
@factor = factor
super()
end
def handle_value(value)
output(value) if value % @factor == 0
end
end
evens = Evens.new
multiples_of_three = MultiplesOf.new(3)
multiples_of_seven = MultiplesOf.new(7)
pipeline = evens | multiples_of_three | multiples_of_seven
10.times do
puts pipeline.resume
end