« The iPhone Book is in Beta! | Main | Ruby 1.9.1 Preview 1 is out »

October 16, 2008

Fun with Ruby 1.9 Regular Expressions

I've reorganized the regular expression content in the new Programming Ruby, and added some cool new advanced examples. This one's fairly straightforward, but I love the fact that I can now start refactoring my more complex patterns, removing duplication.

The stuff below is an extract from the unedited update. It'll appear in the next beta. It follows a discussion of named groups, \k and related stuff.


There’s a trick which allows us to write subroutines inside regular expressions. Recall that we can invoke a named group using \g<name>, and we define the group using (?<name>...). Normally, the definition of the group is itself matched as part of executing the pattern. However, if you add the suffix {0} to the group, it means “zero matches of this group,” so the group is not executed when first encountered.

sentence = %r{ 
    (?<subject>   cat   | dog   | gerbil    ){0} 
    (?<verb>      eats  | drinks| generates ){0} 
    (?<object>    water | bones | PDFs      ){0} 
    (?<adjective> big   | small | smelly    ){0} 

    (?<opt_adj>   (\g<adjective>\s)?     ){0} 

    The\s\g<opt_adj>\g<subject>\s\g<verb>\s\g<opt_adj>\g<object> 
}x

md = sentence.match("The cat drinks water") 
puts "The subject is #{md[:subject]} and the verb is #{md[:verb]}"
 
md = sentence.match("The big dog eats smelly bones") 
puts "The adjective in the second sentence is #{md[:adjective]}" 

sentence =~ "The gerbil generates big PDFs" 
puts "And the object in the last is #{$~[:object]}" 

produces:

The subject is cat and the verb is drinks 
The adjective in the second sentence is smelly 
And the object in the last is PDFs 

Cool, eh?

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451c41c69e20105358b7236970b

Listed below are links to weblogs that reference Fun with Ruby 1.9 Regular Expressions:

Comments

Great stuff Dave, can't wait to play around with that. Thanks for sharing.

Interesting, looks a lot like a parser combinator method of parsing e.g. Parsec for Haskell.

Tiny stylistic quibble, but it's become common practice in the Perl community when writing complex patterns like this to do something like %r{(?x) ... } rather than %r{ ... }x. It gets the pragmas up front so you can understand the executing context before you start trying to understand what the expression means.

You might consider having an example search for Orwell's "verbal false limbs"? http://www.calvinvanhoek.com/articles/2007/04/politics-english-language/

I've written several samples for a german blog (Ruby-Mine) using Oniguruma in Ruby 1.9. Unfortunately the associated texts are germal only (I was too lazy to translate it before Ruby 1.9.1 is stable), but maybe you can use some of them for your book. If you are interested, I will send them (together with some english remarks) and you can use them for your book.

Here is one example - A pocket calculator. The basic part can be used for verifying input data in GUI fields.

pattern = / (?\g\+\g|\g-\g|\g){0}
(?|\g\*\g|\g\/\g|\g){0}
(?[-+]?\g|\(\g\)){0}
(?\g|\g){0}
(?[a-zA-Z_]\w*){0}
(?\d+(\.\d+)?){0}
^((?\g)=)?(?\g)$
/x

vars = Hash.new(0)
basbind = binding

# print 'input> ' # for interactive usage only
while (!(inp = DATA.gets).chomp.match(/^quit$/i))
if (md = inp.chomp.gsub(/\s+/,'').match(pattern))
expr = md[:expr].gsub(/([a-zA-Z_]\w*)/, 'vars["\1"]')
erg = eval(expr, basbind)
vars[md[:var]] = erg if md[:var]
puts "#{inp.chomp}, Result> #{(md[:var])?(md[:var]+'='):''}#{erg}"
else
puts "+++++ Incorrect input: '#{inp.chomp}'"
end
# print 'input> ' # for interactive usage only
end
puts '***** Show variables *****'
vars.keys.sort.each{|v|puts "#{v}=#{vars[v]}"}
puts '******* End ********'
__END__
30+12
a = 30 + 12
b = 2*a
c = -(a*a+5)
d = (6+5*a)*c
quit

Dave,

That is an impressive set of examples and I want to understand them better. In the second sentence, md[:adjective] returns the last adjective in the string. Is there a way to get the first one ('big')?

I'm looking forward to the final version of the book! Thanks.

Also, I just tried to run your example in my local build of 1.9 (1.9.0-3_1) and it failed. The error message I get is: 'regex.rb:13:in `': undefined method `[]' for nil:NilClass (NoMethodError)'

Which 1.9 build are you using for this example?

I'm using

ruby 1.9.0 (2008-10-14 revision 15427) [i386-darwin9.5.0]

I'm guessing a simply typo, so the pattern didn't match, and the match returned nil.

Can you access the first named match? Not using the [:adjective] syntax. For some reason, Ruby lets you index into numbered matches, but not named ones.


Dave

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Now in Beta

  • Programming Ruby, 3rd Edition
    Third Edition, Covering Ruby 1.9, now available
My Photo

Pragmatic Stuff

Photos

  • www.flickr.com
    This is a Flickr badge showing public photos from pragdave tagged with pragdave_badge. Make your own badge here.

Site Search

  • Google Search

    The web
    PragDave