Ruby symbols have always been immediate objects. That means that, inside the interpreter, they were represented as small integers which reference the corresponding symbol text in a lookup table. This made them fast, but it also left code open to denial of service attacks (particular in the context of web applications)—malicious clients could force server code to create arbitrary numbers of symbol table entries, and these were never garbage collected.
Some recent changes in Ruby 1.9 point to the transition away from symbols being immediate objects. In particular, they lose their integer representation, and hence the methods Fixnum.id2name, Fixnum.to_sym, and Symbol.to_i have been removed. I'm expecting to see symbols migrate to the heap as 1.9 continues to evolve.
Hi Dave,
What sort of effect(s) are anticipated with regards to performance? I am assuming that there will be some decrease in performance due to lookup overhead, but is the current thinking that the performance hit will be more than offset by YARV (or some other VM)?
Regards,
Charles McKnight
Posted by: Charles McKnight | May 12, 2008 at 11:52 PM
Charles:
I can't say, because the implementation isn't available to play with. However, it's possible to imagine ways of implementing this that wouldn't be significantly slower than now, as symbols would still be singletons, and therefore tests for equality could just use their object IDs.
Dave
Posted by: Dave Thomas | May 13, 2008 at 06:17 AM
Since symbols always seem to trip up Ruby beginners, I am in favor of seeing the divide between symbols and strings narrow. I don't really see a problem with how Java handles string interning. If it appears as a literal within your program, that string is automatically interned, making it efficient to use strings as constants.
I know that doesn't handle the problem of people symbolizing user submitted data (eck), but it seems to have worked well enough up to this point for Java right?
Posted by: Collin VanDyck | May 13, 2008 at 08:12 AM
This seems like a terrible idea. If symbols won't be interned strings, what will even be the point of them any more?
Posted by: Stephen Touset | May 13, 2008 at 10:53 AM
Although I concur that using symbols to intern user supplied data is, er, well, not good, I would hate to take any sort of performance hit over it (yes, performance is a concern for one of the projects I'm working on ).
Also, I wonder how much existing code will break? I guess we will just have to wait for the actual implementation to be available and bench it.
Posted by: Charles McKnight | May 13, 2008 at 10:54 PM