Hacker News new | ask | show | jobs
Why Ruby Has Symbols (dmitrytsepelev.dev)
142 points by dmitrytsepelev 1534 days ago
14 comments

Ruby doesn't have symbols because of AST or VM details.

Ruby has symbols in all probability because Lisp and Smalltalk have symbols.

It could get most of the same practical upside of symbols from interned strings - the important thing is being able to compare using pointer equality and look up hash tables without needing to walk a string. What symbols at the type level do is ensure that these string-like things have already been interned, that is, de-duplicated, when they hit lookup points like member access.

But the implementation could do something very similar behind the scenes by setting a bit on interned string values. Besides, symbols aren't enough for the more advanced dynamic language optimization techniques like you see in V8.

I'd say that Ruby has symbols because Ruby has mutable strings.

If your strings are immutable and interned, they are as good as symbols; this is why Python does not have symbols.

ECMASript introduced symbols because JavaScript strings, while immutable, are not necessarily interned. Symbols are much cheaper to compare for equality: you only need to compare the pointers / ids, not actual string bytes.

Lisp has symbols for the same reason: Lisp strings are vectors, which are also mutable.

Lisp has symbols, because they were used in symbolic expressions (s-expressions) as named entities. In the programming language Lisp these symbols serve also as identifiers for functions, variables and other things. Thus a symbol originally had an internal structure made of an association list (a list of keys and values). That association list then had various entries, including a print name -> the thing to print when a symbol gets externalized. Since symbols can serve as function names, these symbols also had functions stored in their association list. Different function types could be stored under different keys.

Since Lisp symbols serve a central role as identifiers and structured objects, they are not like what Ruby uses. Lisp uses symbols also for named interned things, but that is only one purpose.

In Common Lisp symbols have a name, a value, a function, a package and a property list (a list of keys and their values). By default in a call like (mult 1 2 3), the global function will be retrieved from the symbol and the function will be called with the arguments. The property list sometimes will be used by an IDE to store information about the symbol: like where it was defined, what its definition is and similar.

At least in V8 they are last I checked. The symbols feature is a property privacy feature. A symbol can be treated as a private secret owned by a library thus restricting access to a property on a shared object.
This is untrue. Symbols provide no privacy. They provide a mechanism to avoid collisions. You can even ask an object for its symbol properties: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
JS symbols are the opposite of what you are saying.

If you have five instances of `:foo` in Ruby, you can guarantee they will be IDENTICAL.

If you have five instances of `Symbol('foo')` in Javascript, you are guaranteed they will be completely DIFFERENT.

JS symbols are like Common Lisp's `gensym` which it uses to guarantee macro variables won't collide with existing variable names.

I feel quite confiscate that Ruby has Symbols because Strings are mutable which causes issues for when you hold on to something but you also give out a reference.
> It could get most of the same practical upside of symbols from interned strings

I don't think so - unless you mean always interning all strings. The point of symbols is you can do a single address comparison. How can you do that if you could have two strings that are the same but have different addresses?

There is also a down-side of symbols - they by definition always escape the compilation unit since they're interned!

Always interning all strings is what Lua does, and the concept of a symbol in Lua is merely a particular string pattern which the parser will recognize. They aren't syntactically identical, you can replace any .field with ["field"] but you can't say `local ["field"] = value`, but there is no distinction in the types.

I get a lot of use out of both of those decisions (immutable strings and string/symbol identity), they work well together, and I'd (much) rather have the problem of string-builders than the problem of tracking references to strings and copying them if I need both the original and revision.

> Always interning all strings is what Lua does

That'd be catastrophic for performance in Ruby - every string allocation would always have to be reified, and would always need to access a shared data structure.

I'll take your word for that. It's the opposite in Lua, which is several times faster than stock Ruby. If your runtime takes it as a given that every string will be interned, there are all sorts of assumptions this enables which mutation invalidates.
> If your runtime takes it as a given that every string will be interned, there are all sorts of assumptions this enables which mutation invalidates.

Yes but if you're regularly creating strings, which is what Ruby web servers do all the time, then your intern table is going to become a white-hot hotspot, contended by all threads all the time.

> I don't think so - unless you mean always interning all strings. The point of symbols is you can do a single address comparison. How can you do that if you could have two strings that are the same but have different addresses?

You intern all the literals (which includes lexical symbols) and are 99% if the way there.

> and are 99% if the way there

I don't understand - if you're not 100% of the way there then you can't rely on address comparison. 1% of your string comparisons would fail!

That's a better explanation. There is a clear Smalltalk influence in Ruby, especially around the object-oriented aspects of the language. The best example is how the language doesn't call function, but sends a message to a method. And also how everything is an object. Matz talked quite a bit about the various other languages that influenced the design and Smalltalk and Lisp are part of that list (and Perl).
Prolog also have symbols (called "atoms"). Erlang too (influenced by Prolog).
Erlang has symbols because its strings are ridiculously expensive (and kinda shit), so while it does have immutable strings identifying objects based on that would be ridiculously costly.
Interned strings are fine if you don't have mutable strings, but for one Ruby does have mutable strings and two it's nice having that syntactic sugar! Makes it clear that some value is something programmer-written, or at least programmer endorsed. I don't use python much but I do wish there was an alternative syntax for strings I only plan on using like symbols.
> It could get most of the same practical upside of symbols from interned strings

Aren't symbols and interned strings the same thing? Of course you can get all the upside of symbols by having symbols...?

Yeah—to reference a symbol `:foobar` in MRI's C API, you even call `ID2SYM(rb_intern("foobar"))`
> When AST is built, it is validated to make sure it makes sense (that’s called lexing) and converted it to the bytecode.

I've never heard "lexing" used this way, and I believe it's simply incorrect. Lexing (tokenizing) precedes parsing (parse tree and then syntax tree construction). It isn't syntax tree validation.

Or so I thought. Are there other examples (besides this article) of "lexing" also being used to mean something else?

Thank you for catching that! Not sure where my eyes were, feels like it's a remnant of another version of the sentence. Removed
No problem! Same thing happens to me, where my eyes begin to gloss over things I've written (because I know what I mean to say).

Thanks for the submission!

BTW, a good technique for catching those kinds of mistakes is to read the piece out loud. Engaging more of your nervous system makes the visual elision easier to detect.
I think, it's called "elaboration" (In Standard ML) .. into elaborative semantics (AST that makes sense for evaluation phrase).
> validated to make sure it makes sense

Indeed, I always thought that it's called "syntax checking" :)

To be honest, while I've dabbled in Ruby, I've never understood the difference on an intuitive gut level like other foreign constructs unique to individual languages I've gone deep with.

I don't know what it was about this article but I feel slightly more confused now. It jumps to bytecode before explaining how they're useful at the higher level of abstraction that is the developer.

How do symbols help you think about and find solutions to problems and then implement those solutions? I.e., what is the facility they provide that is not present in some more traditional OOP language (say PHP?)

The article is an extremely needlessly complicated explanation of a relatively simple principle: symbols are stored once and referenced by pointer, strings are stored multiple times and not so easily compared. Ergo, for many use-cases symbols are faster and use less memory.

Some of these use cases are little tokens like single words that are used in as values in function arguments, or a switch statement. On the other hand, storing the user's inputted name as a symbol whilst copying that data to your model object is probably not a good idea.

Yes this explanation leaves out a lot of detail.

They seem similar enough to Clojure keywords:

Those are more focused and simpler than strings in say PHP. And they are first class, unlike members in Java.

They are primarily first class names in your program. Think of them as distinct elements in a set, as opposed to arbitrary text to be transformed and parsed.

If I give you a symbol (or keyword) then you know it is a name. If I give you a string, it could be anything really.

In scheme you usually use symbols instead of strings because the whole equality story is simpler and much faster. I once changed a tight loop in some code from dispatching on strings to dispatching on symbols and got a 15% speedup. Why? String equality is expensive. A symbol is basically a readable fixnum, where two similar symbols are always the same objects (errr... Don't quote me on that because it isn't strictly true).

Most places where you can use symbols instead of strings you lose nothing and gain speed.

I am not sure how it is in ruby though.

In Ruby `"Foo" == "Foo"` returns true. Whereas `"Foo".object_id == "Foo".object_id` returns false: They are not the same object, but report being equal.

OTOH, symbols return true for both: they are the exact same object.

It's not just about speed though: symbols are somewhat limited in what they can be made of. They follow the same limitations as methods and variables. So often symbols are used when dynamically calling methods or assigning variables.

"Foo".public_send(:strip!) ¹. Which is slightly different from "Foo".public_send('strip!'). Not in outcome, but in calling. Because this is invalid syntax: "Foo".public_send(:one-two three) whereas this isn't: "Foo".public_send('one-two three'). Technically, I guess Ruby can have a method that is named "one-two three" but that would be really nasty to call. Symbols protect a lot against this.

And therefore are used in this context a lot.

¹ The exclamation mark can be a part of a method and symbol in ruby. As can the question-mark and some other sugar-ish stuff like [].

> symbols are somewhat limited in what they can be made of

Only in the unquoted literal syntax. The :symbol form follows Ruby's usual identifier rules but there's also a :"quoted symbol" syntax. You can also send :to_sym or :intern to any string and it will be converted to a symbol.

https://ruby-doc.org/core/String.html#method-i-to_sym

> This can also be used to create symbols that cannot be represented using the :xxx notation.

  'cat and dog'.to_sym   #=> :"cat and dog"
Ruby inherited that ?! convention fromm scheme. All mutating procedures end with ! and all predicates with !.

Prefix notation has none of those pesky limitations if you can live with it :)

Edit: oh. Scheme is painfully monomorphic. Equality for.strings is string=?. Equality for chars is char=?.

Then there is object equality (eq? ...), eqv? ("Normally eq?") and equal? which is a generic equality predicate that works for all objects (including circular data structures).

Eq? is the one you would use for symbols. Symbols are always (almost, at least) eq?. One string is only eq? To itself, but not a string containing the same content.

I think you meant to say this in your first sentence: "all predicates with ?"
Yup. Thanks. I hate writing on my phone. I make so many mistakes that I frequently wonder if I am literate when I read things I have written.
Yeah, it's an interesting article about symbol implementation, but I think it's headline is wrong, it doesn't really discuss why ruby has symbols.
They're largely a holdover from smalltalk and lisps, which use them as a sort of generalized token. I find them handy mostly because you can express things like slot names, enums, or keys in a way that's unambiguously not user provided. In Elixir for instance (which has Ruby's symbol syntax slapped on Erlang's atoms), you'll often report a failure with {:err, "error message"} so you can pattern match on it. In principle, with immutable strings, you could just have {"err", "error message"}, and it would work the same! But that's hard to distinguish from a list which happens to contain two strings.

Of course, the only thing prohibiting :err from being part of the data is convention. But if you're just looking over the code, the atom stands out almost like a syntactic feature, so it's easier to hold to. Plus, since they're interned strings underneath, you can use them in macros to make things like schemas unambiguously, and convert them into migrations with very little magic. So that's it, nothing you couldn't do with interned strings, variable names, and enums, but all in one handy little first class datatype.

String equality is linear time in the worst case. Symbol equality is constant time. Symbols are strings that act like numeric constants. They're extremely useful in APIs as arguments, return values and generalized tokens. They're also the obvious choice for hash table keys.
A symbol is your string applied as the argument to an algorhtym that returns a deterministic interger. The interger is smaller and easier to sort and compare, making many common operations more efficient.

--andrew

The difference between symbols and strings only exists at the low level. At the high level they are virtually the same thing.
It seems like the difference(As far as I can tell without spending way more time with the article) is deduplication, which other languages already give you with strings.

I think thr syntax is slightly nicer than quotes, but it's also more syntax, and there is a limit to how much you can have if you don't want code to look like Perl(Which Ruby is approaching).

People are saying it can can be used for some kind of better compile time checking, would be interesting to see that as the main focus?

> deduplication, which other languages already give you with strings.

Aren't Ruby's strings mutable? I'm not sure, but I seem to recall they were/are, in which case you can't really intern them. Python and Java have immutable strings, with the ability to optimize allocations being (probably?) one of the reasons.

On the other hand, Symbols in Ruby seem to be immutable, which allows for their interning.

It's optional. By default strings are mutable, but you can freeze them individually, and you can set a directive that makes all string literals as immutable on a file-by-file basis.
You can intern strings without preventing mutability using the copy-on-write principle. PHP does it.
If you copy before you write, you're not mutating, you're making a new thing.

e.g. This pseudo code must stand for mutation to be present.

    a = ...
    b = a
    mutate(a)
    /* b now mirrors a */
They are constants without having to declare them and assign them a globally unique value. Strings are for text (ui/parsing/values)
Elixir (and Erlang) have atoms which are exactly the same thing. They're useful in any dynamic programming language - in a static one, the equivalent is different values of an enum.

Python notably doesn't, and as such you get functions that take arguments that are strings with special meaning, which I always found a bit clunky even before I discovered Ruby.

I think Erlang has them for a more specific reason: in general it eschews all forms of data definition (records are just a fancy syntax for tuples with an atom at the start), which makes hot code reloading and transparent network communication simpler.

Both cases would require some synchronization of data structures (across time or over the network), and with user-defined types this can get complicated, and atoms make the lack of user-defined types much more pleasant (and more performant than strings).

Python has had enums [1] since Python 3.4 (2014). You can easily convert back and forth between enum values and their string and numeric values.

I don't know anything about Ruby or Erlang so I don't know if that's really relevant, just your comment seems to imply it doesn't.

[1] https://docs.python.org/3/library/enum.html

Yes, that's fair. I suppose I was describing my process of "discovery" and learning with Python which was significantly before 2014. Even now though, enums are not usually the normal way of doing things in public APIs, but that's presumably at least in part because of the history.
Also, unlike Ruby, string literals are usually interned in CPython (I think below a certain size), so they have at least some of the performance benefits of symbols in Ruby.
Languages with native symbol types are very helpful for... you name it: Symbolic programming https://en.wikipedia.org/wiki/Symbolic_programming

In the context of computer algebra systems, which are much about manipulating abstract syntax trees, mathematical variables are usually represented as "symbols".

Beyond that, this page gives databases as an example, which is in fact very nice. Beyond being fast and efficient, using symbols allows certain errors to be compile-time instead of runtime, where typos are only detected on an application level and not on a code level. This is where symbols can play out their advantage. Think a bit of ENUMs in other languages.

the "you name it" idiom in english is usually used to mean "anything you want", as in "you pick it"- so your first sentence reads like "native symbols are useful for anything, because (as everyone knows) symbolic programming is useful for anything"

i think the idiom you were going for was perhaps "you guessed it" or "you called it", as if poking fun at how, obviously, native symbols are helpful for symbolic programming, because it's the same word

Thanks for the tip! In fact I'm not a native speaker and this was some kind of "false friend" from the German "du sagst es" (="you say it") :-)
My opinion is that Ruby has symbols for strings that are static - part of the program - and normal strings for dynamic runtime data.

Separating the two is useful semantically because it lets you differentiate between the two - and because these two kinds of string are better off being implemented and optimised in different ways.

So it's both UX and practical mechanical sympathy.

When I started ruby, I thought symbols were weird and dumb.

After a while, it was one of the things I liked the most. Maybe my favorite part is I didn't have to read and write so many damned " characters!

"Converting" "text" "like" "this" :to :this :format, really helps me read it.

I might be the weird guy on this. Wouldn't be the first time.

While other comments have discussed the technical utility of symbols, I believe symbols can also be seen as useful syntactic sugar that helps communicate intent. Strings used for indexes. named args, and other structural purposes can be represented in a way that is visually distinct from strings used as text.

The technical benefits are nice, but this type of ergonomic feature is why ruby has remained my favorite language for over a decade.

I was on the same page, but now moving away from that.

I more and more dislike how Ruby (arbitrarily) allows omitting brackets. but not always. Often making the code harder to read. What is the call-chain in this rspec magic: `expect(something).to be >= 1` (quick: where and how do you add a custom failure message).

And while `attr_accessor :time, :date, :state` are really neat, I more and more dislike constructs like `validates :name, :login, :email, presence: true`. And prefer to write them explicit and unambiguous: `validates_presence_of(:name) etc`. Which is only a very slightly improvement over `validates_presence_of('name')`.

And don't get me started on "saving time" by typing less characters or shorter lines of code: if this is what makes you Go To Market faster, there's something very wrong with your IDE, editor or typing skills. If anything, those short things have cost me time in Rails codebases living years and years.

> validates :name, :login, :email, presence: true`. And prefer to write them explicit and unambiguous: `validates_presence_of(:name) etc`. Which is only a very slightly improvement over `validates_presence_of('name')`.

That's Rails, not Ruby. Although Ruby allows it because of how flexible it is + metaprogramming.

It indeed is convention in Rails. A bad convention IMO.

But it is enabled by Ruby, as you state, by how flexible Ruby is. It may seem a nice touch that Ruby hands you the freedom to choose to e.g. omit brackets. But I think this is a bad freedom. As Rails shows, its a freedom that leads to, IMO, harder to read, and harder to reason about code.

With any language design, the limitations as well as its features, is what make the language. Limitations are an important feature of a language, IMO.

Yeah this threw me off so much that I wrote a blog about this: https://wbk.one/%2Farticle%2Fa463c360%2Fthe-ruby-tutorial-i-... It’s weird how not clear tutorials are about this.
Half agreement on the last part. I never care about typing, but I care a lot about readability.

And fewer characters often is proportionally easier to read.

I wholeheartedly disagree with this correlation.

Intent sometimes becomes clearer with less characters. E.g. "attr_accessor" is, IMO, vastly superior to a large list of getter and setter method definitions. Easier to read, clearer in intent. Especially when there is that one getter or setter: you can be confident its doing more than just setting/getting (which probably is a smell, but I digress).

Details, however, hardly ever become clearer. A single `has_many :tags, :through => :taggings, delete: :cascade` may seem easier to read than explicit method declarations and callback registrations, but its a faux abstraction. It also rapidly falls apart when you continue developing on this for years and end with things like `has_many :posts, :through => :taggings, :source => :taggable, :source_type => 'Post'`

The abstraction remains in tact with a `define_relation(:taggings, DatabaseJoinTable.new(:taggins))` an `delegate :tags, to: :taggins` and a `register_callback(:delete, InlineTagginsRemover.new(self.taggings)`. I just made this up. But I tried to design an interface that is explicit rather than implicit. One that uses dependency injection and common Ruby-isms over a framework DSL.

Point is: behind those seemingly "easy to read" lines, there's a large world of black magick, lurking. I've dug through these forbidden forests on numerous occasions when our Rails app started misbehaving, race-conditions popped up, performance degraded, or even random dataloss. It's only easy to read on the surface. And while that is where we spend a lot of time reading, readability of the underlying stuff is even more important, because that is where the details matter.

Rails sacrifices the readability and understandability of what happens below the hood for readability and understandability on the surface. This seems a deliberate choice. But I dislike it. Severely.

I'm in the same boat, FWIW. I really did not understand the point at first, but I love symbols now. I only vaguely understand the point, even after reading this thread, but I like using them.
In the early days, symbols weren't garbage collected, while strings were mutable, wasted memory and were slow. So there were tradeoffs.

Now you can use frozen string literals and there's no benefit to symbols. Throwing "# frozen_string_literal: true" in the top of the memory benchmark script I get:

    Calculating -------------------------------------
             strings     0.000  memsize (     0.000  retained)
                         0.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)
             symbols     0.000  memsize (     0.000  retained)
                         0.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

    Comparison:
             strings:          0 allocated
             symbols:          0 allocated - same
At this point with no practical difference between them STRINGS AND SYMBOLS BEING DIFFERENT ARE A MISTAKE. When you serialize to something like JSON you lose the distinction (the operation is singular and does not have an inverse transform) and you have to pick either symbols or strings to get back. On a long enough timescale this causes enormous confusion, and leads to the creation of hashes with indifferent access (which helps with the problem, but doesn't fix it).

Ideally it would be good at this point to make symbols and frozen strings completely equivalent ("foo".freeze == :foo being true) but that would likely break too much existing code. The differentiation between strings and symbols though only causes code bugs (mostly biting the new and intermediate level programmers). It is just syntactical sugar with a footgun.

Designing a language from scratch these days, it should have immutable strings by default from the start and should not introduce symbols, unless they are purely syntactic sugar around creating an immutable string.

> At this point with no practical difference between them

Gah no! This isn't true! Even if you turn on frozen string literals comparing two strings is slower because they have to test for a non-frozen and non-interned string also happening to be the same.

https://twitter.com/ChrisGSeaton/status/1514603665801109508

There's a pointer comparison, but behind it is on the failure side is a full-byte-comparison. Atrocious for cache even if the strings are tiny. If they aren't you're checking every byte!

I'm a big fan of explicit symbols, but some syntactic sugar around immutable strings and type inference should let you use "symbols" with little performance penalty. Of course by then you might as well use explicit symbols anyway, but I guess there's some additional flexibility. Of course that hurts macro-writing a bit, but not much.
Okay mutable non-frozen strings shouldn't exist either, and people should use string builders.

Really both features (symbols and mutable strings) aren't worth the literally endless bugs that they cause.

I both like and hate how Rust has `String` and `&str`. Constant juggling between the two (which really is a sign I'm not doing it right). Yet knowing and using the difference is important and powerful.

I somewhat miss this when I go back to Ruby, but then realize that symbols often can be used for `&str`. Often. Not always.

They're like enums but without the risk of colliding across unrelated contexts. And because they're dynamically generated you can have runaway symbol generation that triggers a memory consumption problem.

Newbies try to use strings as enums, because they don't understand enums. Symbols provide tradeoffs compared with both.

This is what you get with a dynamic language where people try to overload functions with "I accept a scalar OR an array!!!"

From the caller's perspective there's little difference between a procedure with variadic arguments and multiple dispatch. There's a lot of difference from the implementor's perspective, though. In Perl 5 you'd see a lot of ref() and wantarray() while in Raku you'd see different signatures for procedures with the same name.
The difference between comparing symbols and strings in ruby.

https://twitter.com/chrisgseaton/status/1514603665801109508?...

This seems to be a dynamic language thing. Could statically typed languages have some uses for it? In Java one might use a lot of constant strings which are keys for property files. You can of course do some dynamic stuff in order to construct them but they are more likely to be used as constant strings. Would using a separate construct for that make optimization easier? Or wouldn’t it make a difference in practice?
Java does use string interning for constants. It can help in any case when you have a lot of instances of the same string (and when you need to compare that strings)
How does this article not mention LISP? Ruby has symbols because it was inspired by LISP which had symbols.
No, Ruby has symbols because it was inspired by Smalltalk which has symbols. Of course, Lisp also has symbols, and it was one of the inspirations for Ruby (and Smalltalk), but the idea of representing message sends (ie. method calls) as symbol + arguments comes from Smalltalk.
It's not a historical article tracing the lineage of the feature.

Ruby took some things the designer liked from Lisp, from Smalltalk, from Perl, from other places. He liked symbols because they're good for performance (and compile-time correctness) at a low cognitive cost, and that's why Ruby has symbols.

Symbols are garbage collected now so it’s just shorter strings

IMHO it’s a style choice

I always thought Ruby has symbols because

   const String ACCOUNT_FUNDS_EXCEEDED = "ACCOUNT_FUNDS_EXCEEDED"
is plain retarded :D
That would be an odd way to do that in a language that doesn't have symbols too though.
Have a look at 90% of corporate codebases.
it makes sure that any typos are caught at compile time, for one