Hacker News new | ask | show | jobs
by kungfufrog 1534 days ago
To be honest, while I've dabbled in Ruby, I've never understood the difference on an intuitive gut level like other foreign constructs unique to individual languages I've gone deep with.

I don't know what it was about this article but I feel slightly more confused now. It jumps to bytecode before explaining how they're useful at the higher level of abstraction that is the developer.

How do symbols help you think about and find solutions to problems and then implement those solutions? I.e., what is the facility they provide that is not present in some more traditional OOP language (say PHP?)

8 comments

The article is an extremely needlessly complicated explanation of a relatively simple principle: symbols are stored once and referenced by pointer, strings are stored multiple times and not so easily compared. Ergo, for many use-cases symbols are faster and use less memory.

Some of these use cases are little tokens like single words that are used in as values in function arguments, or a switch statement. On the other hand, storing the user's inputted name as a symbol whilst copying that data to your model object is probably not a good idea.

Yes this explanation leaves out a lot of detail.

They seem similar enough to Clojure keywords:

Those are more focused and simpler than strings in say PHP. And they are first class, unlike members in Java.

They are primarily first class names in your program. Think of them as distinct elements in a set, as opposed to arbitrary text to be transformed and parsed.

If I give you a symbol (or keyword) then you know it is a name. If I give you a string, it could be anything really.

In scheme you usually use symbols instead of strings because the whole equality story is simpler and much faster. I once changed a tight loop in some code from dispatching on strings to dispatching on symbols and got a 15% speedup. Why? String equality is expensive. A symbol is basically a readable fixnum, where two similar symbols are always the same objects (errr... Don't quote me on that because it isn't strictly true).

Most places where you can use symbols instead of strings you lose nothing and gain speed.

I am not sure how it is in ruby though.

In Ruby `"Foo" == "Foo"` returns true. Whereas `"Foo".object_id == "Foo".object_id` returns false: They are not the same object, but report being equal.

OTOH, symbols return true for both: they are the exact same object.

It's not just about speed though: symbols are somewhat limited in what they can be made of. They follow the same limitations as methods and variables. So often symbols are used when dynamically calling methods or assigning variables.

"Foo".public_send(:strip!) ¹. Which is slightly different from "Foo".public_send('strip!'). Not in outcome, but in calling. Because this is invalid syntax: "Foo".public_send(:one-two three) whereas this isn't: "Foo".public_send('one-two three'). Technically, I guess Ruby can have a method that is named "one-two three" but that would be really nasty to call. Symbols protect a lot against this.

And therefore are used in this context a lot.

¹ The exclamation mark can be a part of a method and symbol in ruby. As can the question-mark and some other sugar-ish stuff like [].

> symbols are somewhat limited in what they can be made of

Only in the unquoted literal syntax. The :symbol form follows Ruby's usual identifier rules but there's also a :"quoted symbol" syntax. You can also send :to_sym or :intern to any string and it will be converted to a symbol.

https://ruby-doc.org/core/String.html#method-i-to_sym

> This can also be used to create symbols that cannot be represented using the :xxx notation.

  'cat and dog'.to_sym   #=> :"cat and dog"
Ruby inherited that ?! convention fromm scheme. All mutating procedures end with ! and all predicates with !.

Prefix notation has none of those pesky limitations if you can live with it :)

Edit: oh. Scheme is painfully monomorphic. Equality for.strings is string=?. Equality for chars is char=?.

Then there is object equality (eq? ...), eqv? ("Normally eq?") and equal? which is a generic equality predicate that works for all objects (including circular data structures).

Eq? is the one you would use for symbols. Symbols are always (almost, at least) eq?. One string is only eq? To itself, but not a string containing the same content.

I think you meant to say this in your first sentence: "all predicates with ?"
Yup. Thanks. I hate writing on my phone. I make so many mistakes that I frequently wonder if I am literate when I read things I have written.
Yeah, it's an interesting article about symbol implementation, but I think it's headline is wrong, it doesn't really discuss why ruby has symbols.
They're largely a holdover from smalltalk and lisps, which use them as a sort of generalized token. I find them handy mostly because you can express things like slot names, enums, or keys in a way that's unambiguously not user provided. In Elixir for instance (which has Ruby's symbol syntax slapped on Erlang's atoms), you'll often report a failure with {:err, "error message"} so you can pattern match on it. In principle, with immutable strings, you could just have {"err", "error message"}, and it would work the same! But that's hard to distinguish from a list which happens to contain two strings.

Of course, the only thing prohibiting :err from being part of the data is convention. But if you're just looking over the code, the atom stands out almost like a syntactic feature, so it's easier to hold to. Plus, since they're interned strings underneath, you can use them in macros to make things like schemas unambiguously, and convert them into migrations with very little magic. So that's it, nothing you couldn't do with interned strings, variable names, and enums, but all in one handy little first class datatype.

String equality is linear time in the worst case. Symbol equality is constant time. Symbols are strings that act like numeric constants. They're extremely useful in APIs as arguments, return values and generalized tokens. They're also the obvious choice for hash table keys.
A symbol is your string applied as the argument to an algorhtym that returns a deterministic interger. The interger is smaller and easier to sort and compare, making many common operations more efficient.

--andrew

The difference between symbols and strings only exists at the low level. At the high level they are virtually the same thing.
It seems like the difference(As far as I can tell without spending way more time with the article) is deduplication, which other languages already give you with strings.

I think thr syntax is slightly nicer than quotes, but it's also more syntax, and there is a limit to how much you can have if you don't want code to look like Perl(Which Ruby is approaching).

People are saying it can can be used for some kind of better compile time checking, would be interesting to see that as the main focus?

> deduplication, which other languages already give you with strings.

Aren't Ruby's strings mutable? I'm not sure, but I seem to recall they were/are, in which case you can't really intern them. Python and Java have immutable strings, with the ability to optimize allocations being (probably?) one of the reasons.

On the other hand, Symbols in Ruby seem to be immutable, which allows for their interning.

It's optional. By default strings are mutable, but you can freeze them individually, and you can set a directive that makes all string literals as immutable on a file-by-file basis.
You can intern strings without preventing mutability using the copy-on-write principle. PHP does it.
If you copy before you write, you're not mutating, you're making a new thing.

e.g. This pseudo code must stand for mutation to be present.

    a = ...
    b = a
    mutate(a)
    /* b now mirrors a */
A language runtime can make that work if it wants to. High-level semantics don't need to constrain the implementation.
They are constants without having to declare them and assign them a globally unique value. Strings are for text (ui/parsing/values)