March 14, 2008

Source Code for that Testing Library

Ring the bells that still can ring
Forget your perfect offering
There is a crack in everything
That's how the light gets in.
  —Leonard Cohen

Yesterday, I posted on a trivial little testing library I hacked together. I've put the source online. Get the source through Rob's git repository (see below).

In the meantime, I discovered a problem with the idea of intercepting comparison operators, the technique used by the expect method. Ruby doesn't really have != and !~ methods. Instead, the parser maps (a != b) into !(a == b). This means that the ComparisonProxy cannot intercept calls to either of these. This is a problem because

   expect(1) != 1
actually passes, because it becomes !(expect(1) == 1), and the expect method is happy with that.

I'm betting there's a way around this...

Update: 14:26 CDT.

  • Rob Sanheim has set up a Git repository for the code. He says

    I've put this up on github to watch what forks or releases develop around it.

    
    git clone git://github.com/rsanheim/prag_dave_testing.git
    
  • Michael Neumann suggested a way around the negated == and =~ tests using source inspection:

    
     class ComparatorProxy
       def ==(obj)
         # try to get the source code position of the call
         # and see if it's a != or a ==
       end
     end
    

March 13, 2008

Playing with a Testing Library

  • I'd like to be able to express my unit tests fairly naturally, using the conditional operators built into the language. So, for example, I'd want to write:

    
          expect(factorial(5)) == 120
          expect(factorial(10)) > 10000
        
  • I'd like the error messages to show both the code that caused the error and the values that caused the error. So, for example, I'd want the following (incorrect) test

    
          expect(factorial(6)) == 600
        

    to output something like

          /Users/dave/tmp/tmc/blog_tests.rb:16
              the code was: expect(factorial(6)) == 600,
              but 720 != 600
        

    and

    
          expect(1) > 2
        

    should say

    
          /Users/dave/Play/tmc/blog_tests.rb:11
          	the code was: expect(1) > 2,
          	but 1 <= 2
        

    (Note how the expression showing the actual values negates the comparison operator to make it easier to read.)

  • I annotate my code with comments, so I'd like to be able to annotate my tests the same way.
    
        expect(factorial(6)) == 600  # Deliberate bad test
      

    should produce something like

        /Users/dave/tmp/tmc/blog_tests.rb:17
           Deliberate bad test
           the code was: expect(factorial(6)) == 600,
           but 720 != 600  
      

    Sometimes I write longer comments.

    
       # The factorial of 6 is a special case,
       # because of the labor laws in Las Vegas
       expect(factorial(6)) == 600
       

    So the resulting errors are longer, too.

               
       /Users/dave/Play/tmc/blog_tests.rb:21
          The factorial of 6 is a special case, because of the labor laws in Las Vegas
          the code was: expect(factorial(6)) == 600,
          but 720 != 600
       
  • I like to be able to group my tests.

          testing("positive factorials") do
            expect(factorial(1)) == 1
            expect(factorial(2)) == 2
            expect(factorial(5)) == 120
          end
    
          testing("factorial of zero") do
            expect(factorial(0)) == 1
          end
    
          testing("negative factorials") do
            expect(factorial(-1)) == 1
            expect(factorial(-5)) == 1
          end
        
  • I like the description of the group to appear along with any individual test annotation if a test fails.

    
          testing("factorial of zero") do
            # this test is deliberately wrong
            expect(factorial(0)) == 0
          end
        
    will produce
          /Users/dave/Play/tmc/blog_tests.rb:31--while testing factorial of zero
              this test is deliberately wrong
              the code was: expect(factorial(0)) == 0,
              but 1 != 0
        
  • I like to have the flexibility to set up the environment for a group of tests. I also like to have the idea of a global environment which doesn't get messed up by the running of tests (so that subsequent tests can run in that environment. I don't see why I should have to package things into methods with magic names to have that happen. Instead, why not just have transactional instance variables? That way, I can use regular methods to set up the state for a test.

          @order = Order.new("Dave Thomas", "Ruby Book")
    
          testing("normal case") do
            expect(@order.valid?) == true
          end
    
          testing("missing name in order") do
            @order.name = nil
            expect(@order.valid?) == false
            expect(@order.error)  == "missing name"
          end
    
          # Check that order is reset to valid state here
          expect(@order.valid?) == true
        

    So, in the preceding case, the second testing block changed the @order object. However, once the block terminated, the object was restored to its initial (valid) state.

So, while waiting for the last day of the Rails Studio to start, I hacked together a quick proof of concept. It's less than 100 lines of code. All the output shown here was generated by it. Is this worth developing into something usable?

March 11, 2008

The 'Language' in Domain-Specific Language Doesn't Mean English (or French, or Japanese, or ...)

I'm a really big fan of Domain-Specific Languages. Andy and I plugged them back in '98 when writing The Pragmatic Programmer. I've written my share of them over the years, and I've used even more. Which is why it is distressing to see that a whole group of developers are writing DSLs (and discussing DSLs) without seeming to get one of the fundamental principles behind good DSL design.

Domain experts don't speak a natural language

Let's say that another way. Whenever domain experts communicate, they may seem to be speaking in English (or French, or whatever). But they are not. They are speaking jargon, a specialized language that they've invented as a shorthand for communicating effectively with their peers. Jargon may use English words, but these words have been warped into having very different meanings—meanings that you only learn through experience in the field.

Let's look at some successful domain specific languages before turning our attention on the way that some DSLs are trying just a little too hard.

Success Story 1: Dependency Management in Make

The Make utility has been a mainstay of Unix software development for over 30 years. You can complain about some strange syntax rules (some of which involve the invisible difference between tabs and spaces), but it would be hard to argue that Make hasn't had a major impact in the open source world.

At its heart, Make addresses the building of systems from components in the presence of dependencies. Make lets me express the dependencies between header files, source files, object files, libraries, and executable images. It also lets me specify the commands to execute to resolve those dependencies when certain items are missing. For example, I could say


my_prog.o: my_prog_.c common.h

extras.o:   extras.c common.h

my_prog:    my_prog.o extras.o
            cc -o my_prog -lc my_prog.o extras.o

This example of the Make DSL says that my_prog.o depends on my_prog.c and common.h, and that extras.o depends on extras.c and also depends on common.h. The final program, my_prog, depends on the two object files. To build the program, we have to execute the cc command on the line that follows the dependency line. No build command is needed for the object files: in this case Make knows what to do implicitly.

People who build software from source are domain experts in the area of dependencies and build commands. They need concise ways of expressing that expertise, of saying things like "if I ask you to ensure my program is up-to-date, and the common header file has been changed, then I want you to rebuild all the dependent object files before then rebuilding the main program". Make is by no means perfect, but its longevity shows that it goes a long way as a DSL to meeting its expert's needs.

Success Story 2: Active Record Declarations

Love it or loathe it, you have to admit that Rails has changed the game. And one reason is its extensive use of DSLs. For example, when you are writing model classes, you are claiming to be an expert on your application's domain, and in the relationships between objects in that domain. And Rails has a nifty DSL to let you express those relationships.


class Post < ActiveRecord::Base
  has_many :comments
  ...
end

class Comment < ActiveRecord::Base
  belongs_to :post
  ...
end

The two lines containing has_many and belongs_to are part of a data modeling DSL provided by Rails. Behind the scenes, this simple-looking code creates a whole heap of supporting infrastructure in the application, infrastructure that allows the programmer to easily navigate and manage the relationships between (in this case) posts and comments.

At first blush, this might seem like an English-language DSL. But, despite appearances, has_many and belongs_to are not English phrases. They are jargon from the world of modeling. They have a specific meaning in that context, a meaning that is clear to developers using Rails (because those developers take on the role of domain modeler when they start writing the application).

Success Story 3: Groovy Builders

The Groovy language has a wonderful way of expressing data in code. The builder concept lets you construct a set of nodes as a side effect of code execution. You can then express those nodes as (for example) XML, or JSON, or Swing user interfaces. Here's a trivial example that constructs some nodes describing a person which we can then output as XML.


  result = new StringWriter
  xml = new groovy.xml.MarkupBuilder(result)
  xml.person(category: 'employee') {
    name('dave')
    likes('programming')
  }
  println result

This would generate something like


  <person category="employee">
    <name>dave</name>
    <likes>programming</likes>
  </person>

(Jim Wierich took this idea and created the wonderful Ruby Builder library, the basis of Rails' XML-generating templates.)

Again, we have a DSL aimed squaring at someone who knows what they are doing. If you're creating XML, then you know that the elements can be nested, that they can have textual content, and that elements have optional attributes. The Builder DSL takes care of all the details for you—the angle brackets, any quoting, and so on—but you still have to know the underlying concepts. Again, the language of the DSL is the language of the domain.

Seduced by Language

Over the years, people have looked at DSLs and wondered just how far they can be taken. Would it be possible to create a DSL that could be used by somewhat who wasn't a domain expert? So far, the answer is “no.” The problem is that abstractions leak—to do things in a domain, you need to know the domain. The folks who brought us Startrek TNG pretended otherwise. Jean Luc Picard used an English language DSL to talk to his food dispenser. It worked every time. But, in the real world, you know that the first time someone said "Earl Gray, hot" to this magic box, they'd be surprised when a naked English peer covered in baby oil popped out.

The reality is that languages such as English, French, and so on, are imprecise. That ambiguity makes them powerful. Because of this, whenever we try to create a DSL that looks like a natural language, we fall short. Take AppleScript as an example. On the face of it, it looks nice and expressive—we're writing something that looks very natural. Here's an example from the Apple example scripts.


  set this_file to choose file without invisibles
  try
  	tell application "Image Events"
  		launch
  		set this_image to open this_file
  		scale this_image to size 640
  		save this_image with icon
  		close this_image
  	end tell
  on error error_message
  	display dialog error_message
  end try

Kind of makes sense, doesn't it? I thought so too. So, for years, I've been trying to get into AppleScript. I keep trying, and I keep failing. Because the language is deceptive. They try to make it English-like. But it isn't English. It's a programming language. And it has rules and a syntax that are very unEnglish like. There's a major cognitive dissonance—I have to take ideas expressed in a natural language (the problem), then map them into an artificial language (the AppleScript programming model), but then write something that is a kind of faux natural language. (Piers Cawley calls these kinds of DSLs domain-specific pidgin, but my understanding is that pidgins are full languages, and our code hasn't got that far.)

What's the point? When you're writing logic like this, with exception handling, command sequencing, and (in more advanced examples) conditionals and loops, then what you're doing is programming. The domain is the world of code. If you're not up to programming, then you shouldn't be writing AppleScript. And if you are up to programming, then AppleScript just gets in your way.

But this isn't a discussion of AppleScript. That's just an example of the kind of trouble you get into when you forget what the domain is and try to create natural language DSLs.

Testing Times

Here's a little code from a test written using the test/spec framework.


  specify "should be a string" do
    @result.should.be.a.kind_of String
  end
  specify "value should be 'cat'" do
    @result.should.equal "cat"
  end

It's an elegant example of what can be done with Ruby. And, don't get me wrong. I'm not picking on Chris here. I think he's created a clever framework, and one that is likely to become quite popular.

But let's look at it from a DSL point of view. What is the domain? I'm thinking it is the specification of the correct behavior of programs. And who are the domain experts? That's a trickier question to answer. In an ideal world, it would be the business users. But, the reality is that if the business users had the time, patience, an inclination to write things at this level, they wouldn't need programmers. Don't kid yourselves—writing these specs is programming, and the domain experts are programmers.

As a programmer, a couple of things leap out at me from these tests. First, there's the duplication. The specify lines are a form of grouping, and each contains a string documenting what that group tests. But the whole point of the DSL part of the exercise is to make that blindingly obvious anyway. Now the BDD folks say that you write the specifications first, without any content, and then gradually add the tests in the blocks as you add supporting application code. I'd suggest that you might want to look at ways of removing the eventual duplication by transforming the specification into the test.

But for me the really worrying thing is the syntax. @result.should.be.a.kind.of String. It reads like English. But it isn't. The words are separated by periods, except the last two, where we have a space. As a programmer, I know why. But as a user, I worry about it. In the first example, we write @result.should.be.a.kind_of. Why not kind.of? If I want to test that floats are roughly equal, I'd have said @result.should.be.close value. Why not close.to value?

Trivial details, but it means that I can't just write tests using my knowledge of English—I have to look things up. ANd if I have to do that, why not just use a language/API that is closers to the domain of specifications and testing? Chris's work is great, but it illustrates how a DSL that pretends to be English can never really get there. The domain of his language is software development--it would be perfectly OK to produce a DSL that makes sense in that domain.

RSpec is another behavior-driven testing framework. Here's part of a specification (or should it be test?).


  describe "(empty)" do

    it { @stack.should be_empty }

    it_should_behave_like "non-full Stack"

    it "should complain when sent #peek" do
      lambda { @stack.peek }.should raise_error(StackUnderflowError)
    end

    it "should complain when sent #pop" do
      lambda { @stack.pop }.should raise_error(StackUnderflowError)
    end

  end

Another nice, readable piece of code, full of clever Ruby tricks. But, again, the attempt to create a natural language feel in the DSL leads to all sorts of leaks in the abstraction. Look at the use of should. We have should be_empty. Here, the actual assertion is (somewhat surprisingly) "should be_". That's right—the be_ part is really part of the should, indicating that what follows the underscore is a predicate method to be called (after adding a question mark, so we'd call @result.empty? in this case). Then we have another way of using _should_ in the phrase it_should_behave_like—all one word. Then there's a third way of using should when we reach should raise_error. And, of course, all these uses of _should_ differ from the use in test/spec, even though both strive for an English-like interface. The same kinds of dissonance occur with the use of it in the first three lines (it {...} vs. it_should_... vs. it "...").

It's a Domain Language

Just to reiterate, I'm not bashing either of these testing frameworks—they are popular and I'm in favor of anything that brings folks to the practices of testing.

However, I am concerned that the popularity of these frameworks, and other similar uses of English-as-a-DSL, may lead developers astray. Martin Fowler writes about fluent interfaces. I think his work might have been misunderstood—the fluency here is programmer fluency, not English fluency. It's writing succinct, expressive code (and, in particular, using method chaining where appropriate).

The language in a DSL should be the language of the domain, not the natural language of the developer. Resist the temptation to use cute tricks to make the DSL more like a natural, human language. By doing so you might add to its readability, but I can guarantee that you'll be taking away from its writability, and you'll be adding uncertainty and ambiguity (the strengths of natural languages). The second you find yourself writing


  def a
    self
  end

so that you can use "a" as a connector in

  
    add.a.diary.entry.for("Lunch").for(August.10.at(3.pm))

you know you've crossed a line. This is not longer a DSL. It's broken English.

March 06, 2008

They Shoot Presenters, Don't They?

Advanced Rails

Remember the Sydney Pollack film They Shoot Horses, Don't They? We got to watch a set of desperate dancers turn themselves into near corpses as they compete in a grueling dance marathon. Who can dance continuously the longest?

Well, it looks like Mike and Nicole's copy must have arrived from NetFlix. It's clearly inspired their latest offering—back-to-back Advanced Rails and Advanced Ruby Studios in Reston this May.

Actually, it's a really clever idea. Many people want to do both, and by piggybacking them like this, these folks will be able to save a couple of days travel time. (Of course, you don't have to do both--they're open for individual enrollment too.)

And, here's a hint. Once you've attended any Studio, you're an alumnus. And alumni get a discount. So, if you did happen to want to attend both, it seems to me that for the second, you're an alum. So you should probably choose the alumni rate for that second Studio...

It should be fun betting on which of the presenters drops first.

January 28, 2008

QCon Interview Online

Jim Coplien chatted with me on video during last year's QCon in London. They've just put the result online.

January 14, 2008

The Canary Benefit

I haven't done production work on a Windows machine for a long, long time. My desktops have been Linux since 0.99pl11 (was that '93?). My laptops were Windows until Linux started working on them, and then I switched.

It was good. I put up with the hassles: the upgrades, the incompatibilities, the laptops that would talk to some video projectors but not others. I got behind in my patching, and had a server root-kitted once (well before we had an online store, in case you're concerned).

As the business grew, I found myself spending more and more time admining boxes. So, somewhat late, I made the switch maybe 3 or 4 years ago, first with an Apple laptop, then with a Mac Pro. As I grew more and more confident in the decision, I switched more and more of what I did to OSX. A bunch of our externally facing code runs on Linux and BSD, but it is all administered by third parties—folks whose job is to keep up with all the stuff that needs doing. Everything else runs on Macs.

And I've never really regretted the switch. I still don't.

But, like miners keeping an eye on the canary, I monitor the one real thing that makes my switching viable. The key benefit of switching for me is the lack of hassle. I spend a bit extra for stuff that just works. .Mac syncing lets me move from desktop to laptop without a thought, but now I have a syncing loop where each machine tries to override information that is identical in the other. Things just aren't as smooth as they were.

Am I regretting the switch? No. OSX works well for what I do, and it gives me access to tools I need (such as InDesign for covers, Sibelius for scoring, and so on). But I'm also not such a acolyte that I'll never move off the Mac if I start seeing the kind of hassle I used to experience with Linux reentering my life. Modern Linux distros have come a long way since I last used one on a laptop, and I know that I could probably switch if I needed to with little regret.

So, what do I want from Apple, both tomorrow at MacWorld and then over the coming months? Easy. I want to see fewer cool features—features which seem to add problems—and a refocusing on what made Apple the machine of choice for a certain kind of developers. I want my Mac to be as hassle free, secure, and reliable as it was when I first started using OSX.

Right now, the hassle-free canary seems to be somewhat distressed. I'm monitoring its health closely.

January 08, 2008

A loud "Huzzah!" was heard throughout the land

Eric Hodel is giving RDoc some love. You can't imagine how happy that makes me.

When I first wrote RDoc, I was trying to find a way of solving two problems:

1. Adding comments to the largely uncommented C source of Ruby, and
2. Providing a means for library writers easily to document their creations.

I'd just finished the PickAxe, and I wanted to take the work Andy and I had done reverse engineering the Ruby API and add it back into the interpreter source code.

I set myself constraints with RDoc and ri:

* it should produce at least some documentation even on totally uncommented source files
* it should extract tacit information from the program source (for example guessing good names for block parameters by looking for yield statements inside methods)
* the markup in the source files should be unobtrusive. In the typical case, someone reading the source should not even notice that the comments follow markup conventions
* it should only use libraries that come pre-installed with Ruby
* the documentation it produced should be portable across machines and architectures
* it should allow incremental documentation. Libraries that you install over time can add methods to existing classes. As you add these libraries, the method lists in the classes you extend should grow to reflect the changes
* it should be secure. People pushed many times to add the ability to execute code during the documentation process. I didn't want to have code run on an end user's machine during a process that ostensibly was simply installing documentation (particularly as these installations often ran as root)
* it should be throw-away

The last one might be a surprise, but the real objective of RDoc wasn't the tool. The real objective was to set a standard that meant that future libraries would get documented in a consistent and usable way. And so RDoc and ri compromised like crazy. Rather than a database or some complex binary format, they used a set of directory trees in the user's filesystem to store documentation. This documentation, which is basically a set of Ruby objects, was stored using YAML, rather than marshaled objects or Ruby source. Even though YAML is slow, it is more portable than marshaled objects, and more secure than Ruby source. The parser in RDoc was a wild hack on the parser in irb. This means it performs a static, not dynamic, analysis and that it is sometimes confused by edge cases in Ruby syntax. So be it.

But the very worst part of RDoc/ri is the output side. I wanted to be able to produce output in a variety of formats: HTML, plain text, XML, chm, LaTeX, and so on. So the analysis side of RDoc produces a data structure, and passes it to the output side. Here I made a stupid design decision. What RDoc generates internally is basically nested hashes. This has a couple of major advantages. In particular, there's a kind of fractal property when traversing it: it doesn't matter how deep you are in the structure—all you pass to the next routine down is a hash. But it has a major downside—it's a bitch to work with. If I were doing it again, I'd use Structs.

Finally, there's the generation of the output itself. I needed a templating system and, for what seemed like good reasons at the time, I wrote my own. It was only a handful of lines of code initially. It's still only a couple of hundred. It did a few things well, but ultimately it was ugly as sin. But now, as Erb has become something of a standard, it is definitely the right time to replace it.

RDoc and ri are, in a way, the ultimate stone soup. The code itself is not the output of the project. The real output is the thousands of libraries that are now self-documenting. Eric and the crew are busy on the stew, replacing the stones with real and tasty ingredients. When they are finished, we'll be able to use all that library documentation in remarkable new ways. So, a big thank you to Eric and Seattle.rb, and to all the Ruby coders who've created such a great base of documentation for us all.

Here's to RDoc 2.0.

January 03, 2008

Two New Groovy Titles

Groovy

Just to prove we're not totally Ruby-centric, we just took two books on Groovy into beta.

Venkat has written Programming Groovy: Dynamic Productivity for the Java Developer, a wonderful introduction to the language. And Scott Davis complements it with Groovy Recipes: Greasing the Wheels of Java.

January 02, 2008

Pipelines Using Fibers in Ruby 1.9--Part II

In the previous post, I developed a class called PipelineElement. This made it relatively easy to create elements that act as producers and filters in a programmatic pipeline. Using it, we could write Ruby 1.9 code like:

    10.times do
      puts (evens | multiples_of_three | multiples_of_seven).resume
    end

The construct in the loop is a pipeline containing three chunks of code: a generator of even numbers, a filter that only passes multiples of three, and another filter that passes multiples of seven. Numbers are passed from the producer to the first filter, and then from that filter to the next, until finally popping out and being made available to puts.

However, creating these pipeline elements is still something of a pain. It turns out that we can simplify things when it comes to creating filters. In the implementation I'll show here, we'll only handle the case of simple transforming filters—filters that take an input, do something to it, and write the result to the filter chain.

Let's revisit the PipelineElement class

    class PipelineElement

       attr_accessor :source

       def initialize
         @fiber_delegate = Fiber.new do
           process
         end
       end

       def |(other)
         other.source = self
         other
       end

       def resume
         @fiber_delegate.resume
       end

       def process
         while value = input
           handle_value(value)
         end
       end

       def handle_value(value)
         output(value)
       end

       def input
         source.resume
       end

       def output(value)
         Fiber.yield(value)
       end
     end

The process method is the driving loop. It reads the next input from the pipeline, then calls handle_value to deal with it. In the base class, handle_value simply echoes the input to the output-real filters subclass PipelineElement and subclass this method.

Let's make a small change to the handle_value method.

    def handle_value(value)
      output(transform(value))
    end

    def transform(value)
      value
    end

By doing this, we've split the transformation of the incoming value into a separate method. And the work done by this method no longer uses any of the state in the PipelineElement object, which means we can also do it in a block in the caller's context. Let's change our PipelineElement class to allow this. We'll have the constructor take an optional block, and we'll use that block in preference to the transform. Here's another listing, showing just the changed methods.

    class PipelineElement

      def initialize(&block)
        @transformer = block || method(:transform)
        @fiber_delegate = Fiber.new do
          process
        end
      end

      # ...

      def handle_value(value)
        output(@transformer.call(value))
      end
    end

This illustrates a cool (and underused) feature of Ruby. Method objects (created with the method(...) call) are duck-typed with proc objects: we can use .call(params) on both. This is a great way of letting users of a class change its behavior either by subclassing and overriding a method, or by simply passing in a block.

With this change in place, we can now write transforming filters using blocks. This is a lot more compact that the previous subclassing approach.

    class Evens < PipelineElement
      def process
        value = 0
        loop do
          output(value)
          value += 2
        end
      end
    end

    evens = Evens.new

    tripler     = PipelineElement.new {|val| val * 3}
    incrementer = PipelineElement.new {|val| val + 1}

    5.times do
      puts (evens | tripler | incrementer ).resume
    end

This outputs 1, 7, 13, 19, and 25.

Different Kinds of Filter

This approach works well if all we want is transforming filters. But what if we would also like to simplify filters that either pass of don't pass values based on some criteria? A block would seem like a great way of specifying the condition, but we've already used our one block parameter up. Subclassing to the rescue. We can create two subclasses, Transformer and Filter. One sets the @transformer instance variable to any block it is passed. The other sets @filter. Here's the relevant code:

    class PipelineElement

      attr_accessor :source

      def initialize(&block)
        @transformer  ||= method(:transform)
        @filter       ||= method(:filter)
        @fiber_delegate = Fiber.new do
          process
        end
      end

      # ...

      def handle_value(value)
        output(@transformer.call(value)) if @filter.call(value)
      end

      def transform(value)
        value
      end

      def filter(value)
        true
      end
    end

    class Transformer < PipelineElement
      def initialize(&block)
        @transformer = block
        super
      end
    end

    class Filter < PipelineElement
      def initialize(&block)
        @filter = block
        super
      end
    end

Thus equipped, we can write:

    tripler          = Transformer.new {|val| val * 3}
    incrementer      = Transformer.new {|val| val + 1}
    multiple_of_five = Filter.new {|val| val % 5 == 0}

    5.times do
      puts (evens | tripler | incrementer | multiple_of_five ).resume
    end

Moving The Blocks Inline

Our final hack lets us move the blocks directly into the pipeline.

Let's look at the actual pipeline code:

    puts (evens | tripler | incrementer | multiple_of_five ).resume

Those pipe characters are simply calls to the | method in class PipelineElement. And methods can take block arguments, right? So what stops us writing

    puts (evens | {|v| v*3} | {|v| v+1} | multiple_of_five ).resume

It turns out that Ruby stops us. The brace characters are taken to be hash parameters, not blocks, so Ruby gets its knickers in a twist. Fortunately, that's easily fixed by making the method calls explicit.

    puts (evens .| {|v| v*3} .| {|v| v+1} .| multiple_of_five ).resume

Now we just need to make the | method accept an optional block. If the block is present, we use it to create a new transformer.

    def |(other=nil, &block)
      other = Transformer.new(&block) if block
      other.source = self
      other
    end

Ruby 1.9 lets you chain method calls across lines, so we can tidy up our pipeline visually.

    5.times do
      puts (evens 
            .| {|v| v*3}
            .| {|v| v+1}
            .| multiple_of_five 
           ).resume
    end

A Palindrome Finder

Let's finish with another trivial example. We'll create a generic producer class that takes a collection and passes it, one element at a time, into the pipeline.

    class Pump < PipelineElement
      def initialize(source)
        @source = source
        super()
      end
      def process
        @source.each {|item| Fiber.yield item}
        nil
      end
    end

Now we can write a simple palindrome finder (a palindrome is a word which is the same when spelled backwards).

    words = Pump.new %w{Madam, the civic radar rotator is not level.}
    is_palindrome = Filter.new {|word| word == word.reverse}

    pipeline = words .| {|word| word.downcase.tr("^a-z", '') } .| is_palindrome

    while word = pipeline.resume
      puts word
    end

This outputs: madam, civic, radar, rotator, level.

But what if we instead want to show each word in the input stream, and flag it if it is a palindrome? That's easily done, but we won't do it the easy way. Instead, let's show a more convoluted method, because it might be useful in the general case.

There's no law to say that a transformer that receives a string as input has to write a string as output. It could, if it wanted to, write an array. Or a structure. So we could write:

    WordInfo = Struct.new(:original, :forwards, :backwords)

    words = Pump.new %w{Madam, the civic radar rotator is not level.}

    normalize = Transformer.new {|word| [word, word.downcase.tr("^a-z", '')] }

    to_word_info = Transformer.new do |word, normalized|
      reversed = normalized.reverse
      WordInfo.new(word, normalized, reversed)
    end

    formatter = Transformer.new do |word_info|
      if word_info.forwards == word_info.backwords
        "'#{word_info.original}' is a palindrome"
      else
        "'#{word_info.original}' is not a palindrome"
      end
    end

    pipeline = words | normalize | to_word_info | formatter

    while word = pipeline.resume
      puts word
    end

This outputs

    'Madam,' is a palindrome
    'the' is not a palindrome
    'civic' is a palindrome
    'radar' is a palindrome
    'rotator' is a palindrome
    'is' is not a palindrome
    'not' is not a palindrome
    'level.' is a palindrome

So, What's the Point?

Is this a great way of writing a palindrome finder? Not really. But...

What we've done here is turned the way a program works on it's head. We've written chunks of isolated code, each of which either filters or transforms an input. We've then independently knitted these chunks together. That's a high degree of decoupling. We can also leave it until runtime to determine what gets put into the pipeline (and the order that it appears in the pipeline), which means we can move more power into the hands of our users.

Could we have done all this without Fibers? Of course. Could we do it without Ruby 1.9? Absolutely. But sometimes factors come together which lead us to experiment with new ways of thinking about our code.

This pipeline stuff is not revolutionary, and it isn't generally applicable. But it's fun to play with. And, for me, that's the main thing.

A Wee Postscript

All this content is stuff that I decided not to include in the third edition of the PickAxe. It didn't work in the section on fibers, because it uses programming techniques not yet covered. It didn't work later because, as an example of various programming techniques, it is just too long.

December 31, 2007

Pipelines Using Fibers in Ruby 1.9

Users of the command line are familiar with the idea of building pipelines: a chain of simple commands strung together to the output of one becomes the input of the next. Using pipelines and a basic set of primitives, shell users can accomplish some sophisticated tasks. Here's a basic Unix shell pipeline that reports the ten longest .tip files in the current directory, based on the number of lines in each file:

 wc -l *.tip | grep \.tip | sort -n | tail -10

Let's see how to add something similar to Ruby. By the end of this set of two articles, we'll be able to write things like

puts (even_numbers | tripler | incrementer | multiple_of_five ).resume

and a palindrome finder using blocks:

words            = Pump.new %w{Madam, the civic radar rotator is not level.}
is_palindrome = Filter.new {|word| word == word.reverse}

pipeline = words .| {|word| word.downcase.tr("^a-z", '') } .| is_palindrome

while word = pipeline.resume
  puts word
end

Great code? Nope. But getting there is fun. And, who knows? The techniques might well be useful in your next project.

A Daily Dose of Fiber

Ruby 1.9 adds support for Fibers. At their most basic, let you create simple generators (much as you could do previously with blocks. Here's a trivial example: a fiber that generates successive Fibonacci numbers:

      fib = Fiber.new do
        f1 = f2 = 1
        loop do
          Fiber.yield f1
          f1, f2 = f2, f1 + f2
        end
      end

      10.times { puts fib.resume }

A fiber is somewhat like a thread, except you have control over when it gets scheduled. Initially, a fiber is suspended. When you resume it, it runs the block until the block finishes, or it hits a Fiber.yield. This is similar to a regular block yield: it suspends the fiber and passes control back to the resume. Any value passed to Fiber.yield becomes the value returned by resume.

By default, a fiber can only yield back to the code that resumed it. However, if you require the "fiber" library, Fibers get extended with a transfer method that allows one fiber to transfer control to another. Fibers then become fully fledged coroutines. However, we won't be needing all that power today.

Instead, let's get back to the idea of creating pipelines of functionality in code, much as you can create pipelines in the shell.

As a starting point, let's write two fibers. One's a generator—it creates a list of even numbers. The second is a consumer. All it does it accept values from the generator and print them. We'll make the consumer stop after printing 10 numbers.

    evens = Fiber.new do
      value = 0
      loop do
        Fiber.yield value
        value += 2
      end
    end

    consumer = Fiber.new do
      10.times do
        next_value = evens.resume
        puts next_value
      end
    end

    consumer.resume

Note how we had to use resume to kick off the consumer. Technically, the consumer doesn't have to be a Fiber, but, as we'll see in a minute, making it one gives us some flexibility.

As a next step, notice how we've created some coupling in this code. Our consumer fiber has the name of the evens generator coded into it. Let's wrap both fibers in a method, and pass the name of the generator into the consumer method.

    def evens
      Fiber.new do
        value = 0
        loop do
          Fiber.yield value
          value += 2
        end
      end
    end

    def consumer(source)
      Fiber.new do
        10.times do
          next_value = source.resume
          puts next_value
        end
      end
    end

    consumer(evens).resume

OK. Let's add one more fiber to the weave. We'll create a filter that only passes on numbers that are multiples of three. Again, we'll wrap it in a method.

    def evens
      Fiber.new do
        value = 0
        loop do
          Fiber.yield value
          value += 2
        end
      end
    end

    def multiples_of_three(source)
      Fiber.new do
        loop do
          next_value = source.resume
          Fiber.yield next_value if next_value % 3 == 0
        end
      end
    end

    def consumer(source)
      Fiber.new do
        10.times do
          next_value = source.resume
          puts next_value
        end
      end
    end

    consumer(multiples_of_three(evens)).resume

Running this, we get the output

0
6
12
18
. . .

This is getting cool. We write little chunks of code, and then combine them to get work done. Just like a pipeline. Except...

We can do better. First, the composition looks backwards. Because we're passing methods to methods, we write

    consumer(multiples_of_three(evens))

Instead, we'd like to write

    evens | multiples_of_three | consumer

Also, there's a fair amount of duplication in this code. Each of our little pipeline methods has the same overall structure, and each is coupled to the implementation of fibers. Let's see if we can fix this.

Wrapping Fibers

As is usual when we're refactoring towards a solution, we're about to get really messy. Don't worry, though. It will all wash off, and we'll end up with something a lot neater.

First, let's create a class that represents something that can appear in our pipeline. At it's heart is the process method. This reads something from the input side of the pipe, then "handles" that value. The default handling is to write that value to the output side of the pipeline, passing it on to the next element in the chain.

    class PipelineElement

      attr_accessor :source

      def initialize
        @fiber_delegate = Fiber.new do
          process
        end
      end

      def resume
        @fiber_delegate.resume
      end

      def process
        while value = input
          handle_value(value)
        end
      end

      def handle_value(value)
        output(value)
      end

      def input
        source.resume
      end

      def output(value)
        Fiber.yield(value)
      end
    end

When I first wrote this, I was tempted to make PipelineElement a subclass of Fiber, but that leads to coupling. In the end, the pipeline elements delegate to a separate Fiber object.

The first element of the pipeline doesn't receive any input from prior elements (because there are no prior elements), so we need to override its process method.

    class Evens < PipelineElement
       def process
         value = 0
         loop do
           output(value)
           value += 2
         end
       end
    end

    evens = Evens.new

Just to make things more interesting, we'll create a generic MultiplesOf filter, so we can filter based on any number, and not just 3:

    class MultiplesOf < PipelineElement
      def initialize(factor)
        @factor = factor
        super()
      end
      def handle_value(value)
        output(value) if value % @factor == 0
      end
    end

    multiples_of_three = MultiplesOf.new(3)
    multiples_of_seven = MultiplesOf.new(7)

Then we just knit it all together into a pipeline:

    multiples_of_three.source = evens
    multiples_of_seven.source = multiples_of_three

    10.times do
      puts multiples_of_seven.resume
    end

We get 0, 42, 84, 126, 168, and so on as output. (Any output stream that contains 42 must be correct, so no need for any unit tests here.)

But we're still a little way from our ideal of being able to pipe these puppies together. It's a good thing that Ruby let's us override the "|" operator. Up in class PipelineElement, define a new method:

    def |(other)
      other.source = self
      other
    end        

This allows us to write:

    10.times do
      puts (evens | multiples_of_three | multiples_of_seven).resume
    end

or even:

    pipeline = evens | multiples_of_three | multiples_of_seven

    10.times do
      puts pipeline.resume
    end

Cool, or what?

In The Next Thrilling Installment

The next post will take these basic ideas and tart them up a bit, allowing us to use blocks directly in pipelines. We'll also reveal why our PipelineElement class I just wrote is somewhat more complicated than might seem necessary. In the meantime, here's the full source of the code so far.

    class PipelineElement

      attr_accessor :source

      def initialize
        @fiber_delegate = Fiber.new do
          process
        end
      end

      def |(other)
        other.source = self
        other
      end

      def resume
        @fiber_delegate.resume
      end

      def process
        while value = input
          handle_value(value)
        end
      end

      def handle_value(value)
        output(value)
      end

      def input
        source.resume
      end

      def output(value)
        Fiber.yield(value)
      end
    end

    ##
    # The classes below are the elements in our pipeline
    #
     class Evens < PipelineElement
       def process
         value = 0
         loop do
           output(value)
           value += 2
         end
       end
     end

    class MultiplesOf < PipelineElement
      def initialize(factor)
        @factor = factor
        super()
      end
      def handle_value(value)
        output(value) if value % @factor == 0
      end
    end

    evens = Evens.new
    multiples_of_three = MultiplesOf.new(3)
    multiples_of_seven = MultiplesOf.new(7)

    pipeline = evens | multiples_of_three | multiples_of_seven

    10.times do
      puts pipeline.resume
    end