Fun with Ruby 1.9 Regular Expressions
I've reorganized the regular expression content in the new Programming Ruby, and added some cool new advanced examples. This one's fairly straightforward, but I love the fact that I can now start refactoring my more complex patterns, removing duplication.
The stuff below is an extract from the unedited update. It'll appear in the next beta. It follows a discussion of named groups, \k and related stuff.
There’s a trick which allows us to write subroutines inside regular expressions. Recall that we can invoke a named group using \g<name>, and we define the group using (?<name>...). Normally, the definition of the group is itself matched as part of executing the pattern. However, if you add the suffix {0} to the group, it means “zero matches of this group,” so the group is not executed when first encountered.
sentence = %r{
(?<subject> cat | dog | gerbil ){0}
(?<verb> eats | drinks| generates ){0}
(?<object> water | bones | PDFs ){0}
(?<adjective> big | small | smelly ){0}
(?<opt_adj> (\g<adjective>\s)? ){0}
The\s\g<opt_adj>\g<subject>\s\g<verb>\s\g<opt_adj>\g<object>
}x
md = sentence.match("The cat drinks water")
puts "The subject is #{md[:subject]} and the verb is #{md[:verb]}"
md = sentence.match("The big dog eats smelly bones")
puts "The adjective in the second sentence is #{md[:adjective]}"
sentence =~ "The gerbil generates big PDFs"
puts "And the object in the last is #{$~[:object]}"
produces:
The subject is cat and the verb is drinks
The adjective in the second sentence is smelly
And the object in the last is PDFs
Cool, eh?




Great stuff Dave, can't wait to play around with that. Thanks for sharing.
Posted by: Russell Jones | October 17, 2008 at 01:27 AM
Interesting, looks a lot like a parser combinator method of parsing e.g. Parsec for Haskell.
Posted by: Justin George | October 17, 2008 at 03:03 AM
Tiny stylistic quibble, but it's become common practice in the Perl community when writing complex patterns like this to do something like %r{(?x) ... } rather than %r{ ... }x. It gets the pragmas up front so you can understand the executing context before you start trying to understand what the expression means.
Posted by: Piers Cawley | October 17, 2008 at 04:18 AM
You might consider having an example search for Orwell's "verbal false limbs"? http://www.calvinvanhoek.com/articles/2007/04/politics-english-language/
Posted by: Bil Kleb | October 17, 2008 at 01:50 PM
I've written several samples for a german blog (Ruby-Mine) using Oniguruma in Ruby 1.9. Unfortunately the associated texts are germal only (I was too lazy to translate it before Ruby 1.9.1 is stable), but maybe you can use some of them for your book. If you are interested, I will send them (together with some english remarks) and you can use them for your book.
Here is one example - A pocket calculator. The basic part can be used for verifying input data in GUI fields.
pattern = / (?\g\+\g|\g-\g|\g){0}
(?|\g\*\g|\g\/\g|\g){0}
(?[-+]?\g|\(\g\)){0}
(?\g|\g){0}
(?[a-zA-Z_]\w*){0}
(?\d+(\.\d+)?){0}
^((?\g)=)?(?\g)$
/x
vars = Hash.new(0)
basbind = binding
# print 'input> ' # for interactive usage only
while (!(inp = DATA.gets).chomp.match(/^quit$/i))
if (md = inp.chomp.gsub(/\s+/,'').match(pattern))
expr = md[:expr].gsub(/([a-zA-Z_]\w*)/, 'vars["\1"]')
erg = eval(expr, basbind)
vars[md[:var]] = erg if md[:var]
puts "#{inp.chomp}, Result> #{(md[:var])?(md[:var]+'='):''}#{erg}"
else
puts "+++++ Incorrect input: '#{inp.chomp}'"
end
# print 'input> ' # for interactive usage only
end
puts '***** Show variables *****'
vars.keys.sort.each{|v|puts "#{v}=#{vars[v]}"}
puts '******* End ********'
__END__
30+12
a = 30 + 12
b = 2*a
c = -(a*a+5)
d = (6+5*a)*c
quit
Posted by: Wolfgang Nádasi-Donner (WoNáDo) | October 17, 2008 at 02:06 PM
Dave,
That is an impressive set of examples and I want to understand them better. In the second sentence, md[:adjective] returns the last adjective in the string. Is there a way to get the first one ('big')?
I'm looking forward to the final version of the book! Thanks.
Posted by: Mario Aquino | October 19, 2008 at 09:54 PM
Also, I just tried to run your example in my local build of 1.9 (1.9.0-3_1) and it failed. The error message I get is: 'regex.rb:13:in `': undefined method `[]' for nil:NilClass (NoMethodError)'
Which 1.9 build are you using for this example?
Posted by: Mario Aquino | October 19, 2008 at 10:13 PM
I'm using
ruby 1.9.0 (2008-10-14 revision 15427) [i386-darwin9.5.0]
I'm guessing a simply typo, so the pattern didn't match, and the match returned nil.
Can you access the first named match? Not using the [:adjective] syntax. For some reason, Ruby lets you index into numbered matches, but not named ones.
Dave
Posted by: Dave Thomas | October 19, 2008 at 10:34 PM