The ActiveRecord component in Rails offers a convenient and powerful interface between the set-oriented world of relational databases and the object-oriented world of Ruby programs. However, there's a potential bug lurking in many (if not most) Rails applications due to a subtle implication of the fact that sets, and hence database result sets, and not ordered.
Take a simple ActiveRecord call such as Post.first. Ask Rails developers what this does, and most will say that it returns the first row from the posts table. And, most of the time for small to medium size tables, on most database engines, it does. But thats purely a coincidence, because SQL does not define the order of rows in an SQL result set—database engines are free to return rows in an order that is convenient for them unless an explicit order by clause is used. But the SQL generated by ActiveRecord for this query is select `posts`.* from `posts` limit 1.
When talking about select statements, the Mysql reference says: You may have noticed in the preceding examples that the result rows are displayed in no particular order. The Oracle documentation says Without an order_by_clause, no guarantee exists that the same query executed more than once will retrieve rows in the same order. And PostgreSQL says If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.
So that innocent select statement is just returning a row at the whim of the database engine. It could be the first. It could be the 42nd. It could be any row. The same applies to queries using limit and offset, often used to paginate results. Call Post.limit(10).offset(10) and ActiveRecord executes select `posts`.* from `posts` limit 10 offset 10. Again, there's no ordering applied, and no guarantee that the same rows will be returned given the same query.
Does this actually affect us? Not often. In fact, probably you're never seen it happen. I have seen the results of a query change when using Oracle. As a table filled, Oracle decided to reorganize an index. As a result, paginating through a set of orders suddenly stopped displaying orders in date order. Adding an explicit order by fixed it.
The moral? Well, first, this isn't a big deal. But, whenever you use finders that assume an ordering in a result set, make sure you make the order explicit—add an order() call to the ARel chain. If you want first() to be compatible with last(), add order("id") to the call to first() (because, somewhat inconsistently, last() currently does add an order by id clause). If you want your paginated result sets to be consistent, make sure you order them (perhaps by id, or by created_at).




Very good info that really needs some education. As the author of the SQL Server adapter, we have had to go one step further to allow rails'ish queries to work and not return nondeterministic results. The biggest is window or pagination result sets. It is very common to do limits and offset (even in test code) without giving an order. SQL Server will just not execute such a query. So in our adapter and Arel visitor, we have to shim in a primary key column if no order was specified. Else "conventional" code examples and ActiveRecord tests would just fail.
Posted by: Metaskills | March 21, 2012 at 10:50 PM
Nice post. I actually have an open pull request on Rails where I try to make `first` more consistent.
If someone is interested: https://github.com/rails/rails/pull/5153
Posted by: Mhfsilveira | March 22, 2012 at 12:02 AM
This has bugged me since it was added. But I have changed the way I read it.
first = The first record you can find (I do not expect it to be the first record inserted).
Posted by: Stjernstrom | March 22, 2012 at 02:41 AM
As Dave noticed the same things will be applied to Post.limit(10) etc. (not only for first/last) and to any pagination you use if you didn't set order by.
Posted by: Alovak | March 22, 2012 at 03:12 AM
Finally somebody speaks it out loud. Thank you, Dave.
I think that no one who has
(1) ever looked at a SQL query log generated by an ActiveRecord-based application, and
(2) knows a bit of the relational model and its implementation by today's off-the-shelf RDBMSs
should be really surprised to see Dave's post.
[plug shameless="true"]
The absence of ordering guarantees is one of the driving forces behind our work on alternative database bindings for Ruby. The result is called "Switch":
http://db.inf.uni-tuebingen.de/files/publications/off-the-beaten-track.pdf
With Switch, last(), first(), etc. all make sense. In fact, many more order-sensitive operations are supported.
[/plug]
Rails and ActiveRecord shouldn't feel too bad, though. The same caveat/bug applies to .NET and LINQ and the dubious semantics of its order-based query operations (Last(), Take(), Drop(), etc.).
Best wishes,
—Torsten
Posted by: Teggy | March 22, 2012 at 04:49 AM
I myself have occasionally wondered why Rails doesn't sensibly default queries to order by primary key absent an explicit ordering.
Posted by: Donaldball | March 22, 2012 at 09:20 AM
@Donaldball Agreed. Or log a warning if there's a LIMIT query without an ORDER BY.
Posted by: StephenBallNC | March 22, 2012 at 10:51 AM
Yes, coders should "see" Model.all.rand whenever they code Model.first.
Posted by: PaulDacus99 | March 22, 2012 at 12:40 PM
Is assumption here to have Rails actually do Top n query? If not, I am not sure this is a bug. There is a difference between .first and Top 1.
Posted by: Greg Piatigorski | March 23, 2012 at 10:38 AM
Greg:
Nothing to do with top n. But the name "first" implies ordering where there is none.
And in the case of "offset", there's genuine potential for incorrect results.
Posted by: Dave Thomas | March 23, 2012 at 10:50 AM
Dave, I am a long time DBA, going back to early '80s, working with mainframe to mid to personal db engines, and perhaps my take on the subject is so different here. .first, to me, implies ANY record, at least to someone who has been a DBA for so long. SQL is a pretty powerful and flexible language and as with any other language out there one has to code for specific results if one wants specific results. Rails is a binder, not an SQL "cop", IMHO. Should Rails be changed to also issue DISTINCT clause, or GROUP BY on every data request when this ability is already built into SQL from the start, as is ORDER BY? I am not sure what the issue is when .first means ANY record from a result set. Absolutely so. There is a reason SQL has WHERE clause, ORDER BY, DISTINCT, GROUP BY and all other tricks and abilities built in. .first with WHERE clause should do the job just fine?
Posted by: Greg Piatigorski | March 23, 2012 at 12:50 PM
This behavior could be changed to force expected behavior simply by adding a "default_scope order(:id)" to a module that is made available to all your activerecord model classes, no? Good find regardless, because it is technically correct ("the best kind of correct")!
Posted by: A Facebook User | March 26, 2012 at 03:16 PM
default_scope is problematic: see http://pragdave.blogs.pragprog.com/pragdave/2012/03/be-careful-using-default_scope-and-order.html
Posted by: Dave Thomas | March 26, 2012 at 06:50 PM