« They Shoot Presenters, Don't They? | Main | Playing with a Testing Library »

March 11, 2008

The 'Language' in Domain-Specific Language Doesn't Mean English (or French, or Japanese, or ...)

I'm a really big fan of Domain-Specific Languages. Andy and I plugged them back in '98 when writing The Pragmatic Programmer. I've written my share of them over the years, and I've used even more. Which is why it is distressing to see that a whole group of developers are writing DSLs (and discussing DSLs) without seeming to get one of the fundamental principles behind good DSL design.

Domain experts don't speak a natural language

Let's say that another way. Whenever domain experts communicate, they may seem to be speaking in English (or French, or whatever). But they are not. They are speaking jargon, a specialized language that they've invented as a shorthand for communicating effectively with their peers. Jargon may use English words, but these words have been warped into having very different meanings—meanings that you only learn through experience in the field.

Let's look at some successful domain specific languages before turning our attention on the way that some DSLs are trying just a little too hard.

Success Story 1: Dependency Management in Make

The Make utility has been a mainstay of Unix software development for over 30 years. You can complain about some strange syntax rules (some of which involve the invisible difference between tabs and spaces), but it would be hard to argue that Make hasn't had a major impact in the open source world.

At its heart, Make addresses the building of systems from components in the presence of dependencies. Make lets me express the dependencies between header files, source files, object files, libraries, and executable images. It also lets me specify the commands to execute to resolve those dependencies when certain items are missing. For example, I could say


my_prog.o: my_prog_.c common.h

extras.o:   extras.c common.h

my_prog:    my_prog.o extras.o
            cc -o my_prog -lc my_prog.o extras.o

This example of the Make DSL says that my_prog.o depends on my_prog.c and common.h, and that extras.o depends on extras.c and also depends on common.h. The final program, my_prog, depends on the two object files. To build the program, we have to execute the cc command on the line that follows the dependency line. No build command is needed for the object files: in this case Make knows what to do implicitly.

People who build software from source are domain experts in the area of dependencies and build commands. They need concise ways of expressing that expertise, of saying things like "if I ask you to ensure my program is up-to-date, and the common header file has been changed, then I want you to rebuild all the dependent object files before then rebuilding the main program". Make is by no means perfect, but its longevity shows that it goes a long way as a DSL to meeting its expert's needs.

Success Story 2: Active Record Declarations

Love it or loathe it, you have to admit that Rails has changed the game. And one reason is its extensive use of DSLs. For example, when you are writing model classes, you are claiming to be an expert on your application's domain, and in the relationships between objects in that domain. And Rails has a nifty DSL to let you express those relationships.


class Post < ActiveRecord::Base
  has_many :comments
  ...
end

class Comment < ActiveRecord::Base
  belongs_to :post
  ...
end

The two lines containing has_many and belongs_to are part of a data modeling DSL provided by Rails. Behind the scenes, this simple-looking code creates a whole heap of supporting infrastructure in the application, infrastructure that allows the programmer to easily navigate and manage the relationships between (in this case) posts and comments.

At first blush, this might seem like an English-language DSL. But, despite appearances, has_many and belongs_to are not English phrases. They are jargon from the world of modeling. They have a specific meaning in that context, a meaning that is clear to developers using Rails (because those developers take on the role of domain modeler when they start writing the application).

Success Story 3: Groovy Builders

The Groovy language has a wonderful way of expressing data in code. The builder concept lets you construct a set of nodes as a side effect of code execution. You can then express those nodes as (for example) XML, or JSON, or Swing user interfaces. Here's a trivial example that constructs some nodes describing a person which we can then output as XML.


  result = new StringWriter
  xml = new groovy.xml.MarkupBuilder(result)
  xml.person(category: 'employee') {
    name('dave')
    likes('programming')
  }
  println result

This would generate something like


  <person category="employee">
    <name>dave</name>
    <likes>programming</likes>
  </person>

(Jim Wierich took this idea and created the wonderful Ruby Builder library, the basis of Rails' XML-generating templates.)

Again, we have a DSL aimed squaring at someone who knows what they are doing. If you're creating XML, then you know that the elements can be nested, that they can have textual content, and that elements have optional attributes. The Builder DSL takes care of all the details for you—the angle brackets, any quoting, and so on—but you still have to know the underlying concepts. Again, the language of the DSL is the language of the domain.

Seduced by Language

Over the years, people have looked at DSLs and wondered just how far they can be taken. Would it be possible to create a DSL that could be used by somewhat who wasn't a domain expert? So far, the answer is “no.” The problem is that abstractions leak—to do things in a domain, you need to know the domain. The folks who brought us Startrek TNG pretended otherwise. Jean Luc Picard used an English language DSL to talk to his food dispenser. It worked every time. But, in the real world, you know that the first time someone said "Earl Gray, hot" to this magic box, they'd be surprised when a naked English peer covered in baby oil popped out.

The reality is that languages such as English, French, and so on, are imprecise. That ambiguity makes them powerful. Because of this, whenever we try to create a DSL that looks like a natural language, we fall short. Take AppleScript as an example. On the face of it, it looks nice and expressive—we're writing something that looks very natural. Here's an example from the Apple example scripts.


  set this_file to choose file without invisibles
  try
  	tell application "Image Events"
  		launch
  		set this_image to open this_file
  		scale this_image to size 640
  		save this_image with icon
  		close this_image
  	end tell
  on error error_message
  	display dialog error_message
  end try

Kind of makes sense, doesn't it? I thought so too. So, for years, I've been trying to get into AppleScript. I keep trying, and I keep failing. Because the language is deceptive. They try to make it English-like. But it isn't English. It's a programming language. And it has rules and a syntax that are very unEnglish like. There's a major cognitive dissonance—I have to take ideas expressed in a natural language (the problem), then map them into an artificial language (the AppleScript programming model), but then write something that is a kind of faux natural language. (Piers Cawley calls these kinds of DSLs domain-specific pidgin, but my understanding is that pidgins are full languages, and our code hasn't got that far.)

What's the point? When you're writing logic like this, with exception handling, command sequencing, and (in more advanced examples) conditionals and loops, then what you're doing is programming. The domain is the world of code. If you're not up to programming, then you shouldn't be writing AppleScript. And if you are up to programming, then AppleScript just gets in your way.

But this isn't a discussion of AppleScript. That's just an example of the kind of trouble you get into when you forget what the domain is and try to create natural language DSLs.

Testing Times

Here's a little code from a test written using the test/spec framework.


  specify "should be a string" do
    @result.should.be.a.kind_of String
  end
  specify "value should be 'cat'" do
    @result.should.equal "cat"
  end

It's an elegant example of what can be done with Ruby. And, don't get me wrong. I'm not picking on Chris here. I think he's created a clever framework, and one that is likely to become quite popular.

But let's look at it from a DSL point of view. What is the domain? I'm thinking it is the specification of the correct behavior of programs. And who are the domain experts? That's a trickier question to answer. In an ideal world, it would be the business users. But, the reality is that if the business users had the time, patience, an inclination to write things at this level, they wouldn't need programmers. Don't kid yourselves—writing these specs is programming, and the domain experts are programmers.

As a programmer, a couple of things leap out at me from these tests. First, there's the duplication. The specify lines are a form of grouping, and each contains a string documenting what that group tests. But the whole point of the DSL part of the exercise is to make that blindingly obvious anyway. Now the BDD folks say that you write the specifications first, without any content, and then gradually add the tests in the blocks as you add supporting application code. I'd suggest that you might want to look at ways of removing the eventual duplication by transforming the specification into the test.

But for me the really worrying thing is the syntax. @result.should.be.a.kind.of String. It reads like English. But it isn't. The words are separated by periods, except the last two, where we have a space. As a programmer, I know why. But as a user, I worry about it. In the first example, we write @result.should.be.a.kind_of. Why not kind.of? If I want to test that floats are roughly equal, I'd have said @result.should.be.close value. Why not close.to value?

Trivial details, but it means that I can't just write tests using my knowledge of English—I have to look things up. ANd if I have to do that, why not just use a language/API that is closers to the domain of specifications and testing? Chris's work is great, but it illustrates how a DSL that pretends to be English can never really get there. The domain of his language is software development--it would be perfectly OK to produce a DSL that makes sense in that domain.

RSpec is another behavior-driven testing framework. Here's part of a specification (or should it be test?).


  describe "(empty)" do

    it { @stack.should be_empty }

    it_should_behave_like "non-full Stack"

    it "should complain when sent #peek" do
      lambda { @stack.peek }.should raise_error(StackUnderflowError)
    end

    it "should complain when sent #pop" do
      lambda { @stack.pop }.should raise_error(StackUnderflowError)
    end

  end

Another nice, readable piece of code, full of clever Ruby tricks. But, again, the attempt to create a natural language feel in the DSL leads to all sorts of leaks in the abstraction. Look at the use of should. We have should be_empty. Here, the actual assertion is (somewhat surprisingly) "should be_". That's right—the be_ part is really part of the should, indicating that what follows the underscore is a predicate method to be called (after adding a question mark, so we'd call @result.empty? in this case). Then we have another way of using _should_ in the phrase it_should_behave_like—all one word. Then there's a third way of using should when we reach should raise_error. And, of course, all these uses of _should_ differ from the use in test/spec, even though both strive for an English-like interface. The same kinds of dissonance occur with the use of it in the first three lines (it {...} vs. it_should_... vs. it "...").

It's a Domain Language

Just to reiterate, I'm not bashing either of these testing frameworks—they are popular and I'm in favor of anything that brings folks to the practices of testing.

However, I am concerned that the popularity of these frameworks, and other similar uses of English-as-a-DSL, may lead developers astray. Martin Fowler writes about fluent interfaces. I think his work might have been misunderstood—the fluency here is programmer fluency, not English fluency. It's writing succinct, expressive code (and, in particular, using method chaining where appropriate).

The language in a DSL should be the language of the domain, not the natural language of the developer. Resist the temptation to use cute tricks to make the DSL more like a natural, human language. By doing so you might add to its readability, but I can guarantee that you'll be taking away from its writability, and you'll be adding uncertainty and ambiguity (the strengths of natural languages). The second you find yourself writing


  def a
    self
  end

so that you can use "a" as a connector in

  
    add.a.diary.entry.for("Lunch").for(August.10.at(3.pm))

you know you've crossed a line. This is not longer a DSL. It's broken English.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/2226312/26911860

Listed below are links to weblogs that reference The 'Language' in Domain-Specific Language Doesn't Mean English (or French, or Japanese, or ...):

Comments

I think there is another problem.
People seem to put together multiple paradigms as a "domain specific language". For example, someone said that BLAST is a domain specific language. BLAST contains only of 20 amino acids arrange in different order. So you end up with a text file that has like "PKKRKVSDNN" and so on.

Thats it basically. Sure this file is specific to the "domain" of biology but is this a domain specific language? I think no.
It lacks some power. Activerecord however is a true domain specific language as far as I am concerned.

However, there exists another problem:
on the one hand, we have interface oriented design (with a touch of DSL to it):
- order 3 pizza
or even funnier
- order 3 pizza in 3 hours if lowest_price_a_piece < 5 $

And the various component, what qualifies as a pizza, could be described too. I think for a user, this is pretty understandable, and I think a system should be able to both use such a language, and also present a FLEXIBLE GUI to the user. (Empower the user)

But what do you see in something like Activerecord? Everything must be valid ruby syntax! You cant just continue if you have an error in the syntax ...
Ruby syntax is beautiful, it is the cleanest language as far as I am concerned, especially because you can omit the () at any time, but I dont think that a "perfect" domain specific language should struggle with syntax issues at all, at least not on the language level part.

For me, a real DSL is more these that were used in the game zak mckracken (although its ancient, and not a good example for today's world, but back then it was kinda cool).

Take your example of:
add.a.diary.entry.for("Lunch").for(August.10.at(3.pm))

It is quite nicely looking, but why would we need to use the . ?
Of course because ruby needs them. And the DSL needs to reside on the ruby-level.

On the one hand you have all of ruby power, but I think it would be better to have a language that lies outside of ruby, and providing the user with a fault-tolerant, terse, human-readable syntax (which btw is my main problem with XML, it tends to get very verbose and ugly IMO)

The focus should be on the D in DSL, whereas some of us are fixating more on the L. I can see it.

Great post. 100% agreement from me.

N.B. A pidgin is a rudimentary proto-language. When a pidgin evolves beyond a simple combination of its parent languages to become a language in its own right, it becomes a "creole". So "domain-specific pidgin" might be a useful term.

As far as RSpec is concerned, there's another issue: While your code might satisfy all the specifications and stories you can put together with RSpec, that might not be enough.
The reason for this is the fact that RSpec changes the tested object by adding methods which end up as methods of Object. This in turn means that you didn't actually test the thing which is used in the production environment, but something (if only slightly) different. But this difference might make, well, the difference. (Yes, I know that I am nitpicking.)
I think that RSpec really is a good tool to specify desired behaviour, but there is a reason that specification and testing are different tasks in software development. Mixing these two up might cause trouble.

How does Jay Fields' Expectations testing DSL compare with the 'specs? http://blog.jayfields.com/2007/12/ruby-expectation-gem.html

Enjoyed the article very much. I have noticed a number of things lately popping up that I think are trying too hard to be English. Your last example is clearly crossing the line but there are some places where it is gray. For example in ActiveSupport we have:

3.days.from_now

On one hand you might argue this is going too far. On the other hand this notation is the easiest method of doing date manipulation I have ever seen. Prior to using it I always had to look up the reference documentation whenever I deal with dates. Now with this syntax I can always do whatever manipulation I need off the top of my head.

The reason I bring this up is that your last example is really just an extreme version of 3.days.from_now. Your example is obviously wrong but the ActiveSupport date manipulation is genuinely useful. So my question is when does it move from genuinely useful to obviously wrong. It seems the line is blurry.

My other complaint with DSL's is that with languages like Ruby most of them have become internal DSLs. Now internal DSLs have some nice advantages. No parsing, full power of a language, etc. But I don't think all DSLs should be internal. Sometimes you can get much more expressive by breaking the confines of your language syntax.

Maybe things like Rubinius will give us a system where we get the ease and integration of creating an internal DSL but the flexibility of an external DSL. But until then people need to really think if their DSL should be internal or external.

"By doing so you might add to its readability, but I can guarantee that you'll be taking away from its writability". So it's a trade-off, and actually it's often not a bad trade-off to err on the side of readability rather than writability. We read code more often that we write it. And this is why Perl takes lot of flack -- as (it's perceived that) ease of writing takes precedence over ease of reading.

In your last example, it's really easy to comprehend the intent of the code. That's a big plus, in my book. A minus is that it's more wearisome to write -- but a more serious issue is the actual action of the code is obscured. Win some, lose some, but I'm not yet convinced of the absolute demerit of fake-English DSLs.

Eric:

I agree that 3.days.from_now is a wonderful little DSL. But, I think it's readable English because good names were chosen for the methods, in the same way that array.size is readable. The intent here was to find a way to express scaling and relative times, and not first and foremost to create an English-like syntax. Having worked out the way they wanted it to work in the domain (that is, they'd use seconds as the unit, seconds from epoch for absolute time, and not worry about checking the usage), the readability came along easily. They didn't have to distort the code to make it English-like.


Dave

Matt:

The tradeoff is two-fold: writability, but also all the additional code that you need to carry around to implement the DSL.


Dave

Hooray! At last somebody well known is saying this too. Natural languages have serious problems for programming - as the COBOL standards committee found in 1959.

I don't like it when people contort their object model - shifting responsibilities around to the wrong places just to make random fragments of code "read more like English". This isn't a subjective thing this approach can end up with code that runs more slowly because the methods are in the wrong classes.

I discussed this in "When shouldn't you use a DSL?" at http://skillsmatter.com/menu/844

I have also been trying to get into Applescript for years too, and I also fail every time. Around me, other people are making Applescript do something, and yet I can't make it do anything.

They have told me that I'm just too much of a programmer, but it seems the only way to do Applescript is to memorise vast passages of existing code like a Bible scholar.

Another issue with DSLs in "Not Quite English" is the fact that many programmers are not very fluent in English or are native English speakers. Which means that they will struggle with the English part for the DSL.
Now, while most programming languages derive parts of the language from English (keywords, class and method names etc.), these parts are easy to understand even without understanding English very well. (At least they were for me when I started programming quite a few years ago when I didn't know much about the English language.)
In the end I think a DSL shouldn't be designed to close to a natural language (English or not) because it makes it harder to learn - and might interfere with your English learning (which I think you should also do).

A language that suffers heavily from the problem above is the text adventure design language Inform 7 (http://www.inform-fiction.org). It is an amazing effort, and when using it as intended it makes writing an adventure game a pleasure, but when you start using your knowledge of English to accomplish new things, your knowledge interferes and Inform misunderstands.

Still, I think Inform 7 may be the furthest anyone has gone in making an English computer language.

Dave, fantastic post. I've never been able to clearly express why I dislike AppleScript, but you've done it beautifully. I'm still a fairly new Rails programmer, with only a couple of applications completed, and I'm still sticking with Test::Unit. rSpec looks seductive, and I'm glad to read your critiques.

AppleScript is a curious case, and worth a little more analysis, IMHO.

AppleScript's keyword-oriented appearance is in itself a Good Thing, given that one of its major requirements is that non-programmers should be able to understand a program's general purpose and action just from reading it. In order to achieve that goal, most of the usual symbolic cues that describe a language's semantics - dots, braces, etc. - were removed from the AppleScript syntax. This brings us to AppleScript's primary problem: rather than finding some other way to supply this structural information to the user - e.g. via a structure editor such as Carnegie Mellon's Alice - it was *left out completely*. So while it's very easy to read an AppleScript and understand _what_ it does, it's damnably hard to figure out exactly _how_ it does it, because the syntax effectively obfuscates, not clarifies, the language semantics. Hence AppleScript's reputation as a "read-only language".

A secondary problem with AppleScript's appearance is that it accidentally encourages unrealistic user assumptions. With AppleScript looking so English-like, naïve newcomers automatically assume that because it _looks_ like English, it will also _behave_ like it. Doesn't matter how many times the documentation says "English-LIKE syntax" or lists all its formal rules and restrictions, the moment that a newcomer sees that AppleScript code looks quite like something they already know (i.e. written English), they *immediately* form all sorts of very strong associations and conclusions about its nature, which then have to be undone the long, hard way. Again, presenting it within an unfamiliar environment such as a structure editor would have gone a long way to preventing these assumptions being prematurely formed - slowing new users down a bit so that they can grow a more realistic mental model over time, while still retaining the initial high-level readability that makes the language appealing to the end-user audience in the first place.

(There are other problems too, but most of these are actually caused by applications' badly designed/implemented APIs and frequent lack of adequate API documentation - a problem which is hardly unique or specific to AppleScript.)

I do believe AppleScript should serve as a significant object lesson to anyone else thinking of designing their own language, domain-specific or not, and I heartily recommend the following paper by one of the original designers:

The Development of AppleScript (W. Cook, 2006)
http://www.cs.utexas.edu/users/wcook/Drafts/2006/ashopl.pdf

HTH

has
--
http://appscript.sourceforge.net

p.s. & FWIW, here's your AppleScript example in Ruby (with apologies for lame formatting):


require 'osax'; include Appscript, OSAX

this_file = osax.choose_file(:invisibles => false)
begin
ie = app('Image Events')
ie.launch
this_image = ie.open(this_file)
this_image.scale(:to_size => 640)
this_image.save(:icon => true)
this_image.close
rescue => e
osax.display_dialog(e.to_s)
end


While the extra syntax makes the semantics clear for someone who already knows Ruby, to someone who doesn't, it's just extra noise that provides no help in understanding the script's detailed mechanics while also making it harder to grasp its overall purpose.

So I think AppleScript's original designers had the right overall idea, considering the type of audience they were targeting. It's just the execution that they ultimately flubbed, although this is perhaps understandable given that end-user programming research wasn't nearly as advanced at the time and they were largely working from a blank slate.

(OTOH, it's just a little irksome that after 15 years Apple are still beating the same lame horse painfully onwards, instead of learning from its mistakes and coming up with a better replacement; but that's a separate rant for another time and place...)

Awesome post. I totally agree that the RSpec is not a DSL. I always had to look for the documentation when I had to write the test.

Hi. I try in SqlOrm to define a DSL for dynamic sql queries

sqlorm.sourceforge.net

the main motivation has been to enable programmers to specify the various "bodies" an sql statement consists of irregardless of the required order of these bodies. The framework takes care of this. You could say that the domain here is just plain sql.

Nice post, Dave.

I've just finished a Ruby EAI test tool, which has a DSL on top do define parameterized testing scenarios.

I think the most valuable usage of Ruby internal DSL techniques (metaprogramming) is to express relationships between things by syntactical structure (nested blocks) rather than explicit method calls - which are still in place behind the scenes of course.
Also, presenting domain-specific vocabulary as if it was part of the language is a great technique. With vocabulary I mean single words like in my case "testsuite", "scenario", or "activity".

The language is targeted at test designers, who need to learn a this domain-specific vocabulary. That makes the most sense to me.

The idiom to form sentences with dot-separated methods did not attract me much. I've seen a piece of Java code lately, I think it was part of the Apache ServiceMix, where this idiom has been used. It did not help me to understand anything easier. It was way over the top.

One should not squeeze idioms out of a language that has not been built for it, just because it is currently hot. It's cool with Ruby, but Java has not been designed for this. Even C with its macro preprocessor was closer :-)

I think you have to differentiate with regard to the target audience of your DSL. If it is a business - DSL (see Jay Fields) you must put much more effort in simple structures (small vocabulary, simple grammar but expressive/concise) for easy reading AND writing by business people. There an external DSL is almost always preferrable (no programming language syntax rules). Another important thing there is editor support. We are all used to our intellisense, type completion IDE's when programming statically languages. So editor support with highlighting, word completion, grammar check and preferably an simulator for the dsl-expression just typed in is very helpful in this regard.

On the other hand you have developers. These are used to programming languages and their syntax requirements. So you can build fluent, concise APIs that expose the functionality in a readable, human(developer) friendly way.
Here the advantages of internal DSLs are a real benefits, integrating the DSL into you application is dead easy, having no extra parsing step helps as well and the users are used to this kind of syntax and will have no problem reading and writing these more program-like DSLs.
You can skip all the "English like" syntactic sugar that adds not so much value in readability but makes writing the expressions harder.

Michael

adding to the examples used by Dave:
* Rake, Ant
* Date & TIme API
* JMock
* Jequel - an internal fluent Java-SQL-DSL, using the real tables and fields which allows you to program SQL in a typesafe manner, see http://jequel.de
Example:
select(ARTICLE.OID)
.from(ARTICLE, ARTICLE_COLOR)
.where(ARTICLE.OID.eq(ARTICLE_COLOR.ARTICLE_OID)
.and(ARTICLE.ARTICLE_NO.is_not(NULL)));

More Resources: Martin Fowlers DSL book in process:
* http://martinfowler.com/dslwip
* Thoughtworks DSL podcasts

Post a comment

If you have a TypeKey or TypePad account, please Sign In

Now in Beta

  • Programming Ruby, 3rd Edition
    Third Edition, Covering Ruby 1.9, now in beta
My Photo

Site Search

  • Google Search

    The web
    PragDave

Pragmatic Stuff

Photos

  • www.flickr.com
    This is a Flickr badge showing public photos from pragdave tagged with pragdave_badge. Make your own badge here.