Ruby doesn't have symbols because of AST or VM details.
Ruby has symbols in all probability because Lisp and Smalltalk have symbols.
It could get most of the same practical upside of symbols from interned strings - the important thing is being able to compare using pointer equality and look up hash tables without needing to walk a string. What symbols at the type level do is ensure that these string-like things have already been interned, that is, de-duplicated, when they hit lookup points like member access.
But the implementation could do something very similar behind the scenes by setting a bit on interned string values. Besides, symbols aren't enough for the more advanced dynamic language optimization techniques like you see in V8.
I'd say that Ruby has symbols because Ruby has mutable strings.
If your strings are immutable and interned, they are as good as symbols; this is why Python does not have symbols.
ECMASript introduced symbols because JavaScript strings, while immutable, are not necessarily interned. Symbols are much cheaper to compare for equality: you only need to compare the pointers / ids, not actual string bytes.
Lisp has symbols for the same reason: Lisp strings are vectors, which are also mutable.
Lisp has symbols, because they were used in symbolic expressions (s-expressions) as named entities. In the programming language Lisp these symbols serve also as identifiers for functions, variables and other things. Thus a symbol originally had an internal structure made of an association list (a list of keys and values). That association list then had various entries, including a print name -> the thing to print when a symbol gets externalized. Since symbols can serve as function names, these symbols also had functions stored in their association list. Different function types could be stored under different keys.
Since Lisp symbols serve a central role as identifiers and structured objects, they are not like what Ruby uses. Lisp uses symbols also for named interned things, but that is only one purpose.
In Common Lisp symbols have a name, a value, a function, a package and a property list (a list of keys and their values). By default in a call like (mult 1 2 3), the global function will be retrieved from the symbol and the function will be called with the arguments. The property list sometimes will be used by an IDE to store information about the symbol: like where it was defined, what its definition is and similar.
At least in V8 they are last I checked. The symbols feature is a property privacy feature. A symbol can be treated as a private secret owned by a library thus restricting access to a property on a shared object.
I feel quite confiscate that Ruby has Symbols because Strings are mutable which causes issues for when you hold on to something but you also give out a reference.
> It could get most of the same practical upside of symbols from interned strings
I don't think so - unless you mean always interning all strings. The point of symbols is you can do a single address comparison. How can you do that if you could have two strings that are the same but have different addresses?
There is also a down-side of symbols - they by definition always escape the compilation unit since they're interned!
Always interning all strings is what Lua does, and the concept of a symbol in Lua is merely a particular string pattern which the parser will recognize. They aren't syntactically identical, you can replace any .field with ["field"] but you can't say `local ["field"] = value`, but there is no distinction in the types.
I get a lot of use out of both of those decisions (immutable strings and string/symbol identity), they work well together, and I'd (much) rather have the problem of string-builders than the problem of tracking references to strings and copying them if I need both the original and revision.
That'd be catastrophic for performance in Ruby - every string allocation would always have to be reified, and would always need to access a shared data structure.
I'll take your word for that. It's the opposite in Lua, which is several times faster than stock Ruby. If your runtime takes it as a given that every string will be interned, there are all sorts of assumptions this enables which mutation invalidates.
> If your runtime takes it as a given that every string will be interned, there are all sorts of assumptions this enables which mutation invalidates.
Yes but if you're regularly creating strings, which is what Ruby web servers do all the time, then your intern table is going to become a white-hot hotspot, contended by all threads all the time.
> I don't think so - unless you mean always interning all strings. The point of symbols is you can do a single address comparison. How can you do that if you could have two strings that are the same but have different addresses?
You intern all the literals (which includes lexical symbols) and are 99% if the way there.
That's a better explanation. There is a clear Smalltalk influence in Ruby, especially around the object-oriented aspects of the language. The best example is how the language doesn't call function, but sends a message to a method. And also how everything is an object. Matz talked quite a bit about the various other languages that influenced the design and Smalltalk and Lisp are part of that list (and Perl).
Erlang has symbols because its strings are ridiculously expensive (and kinda shit), so while it does have immutable strings identifying objects based on that would be ridiculously costly.
Interned strings are fine if you don't have mutable strings, but for one Ruby does have mutable strings and two it's nice having that syntactic sugar! Makes it clear that some value is something programmer-written, or at least programmer endorsed. I don't use python much but I do wish there was an alternative syntax for strings I only plan on using like symbols.
> When AST is built, it is validated to make sure it makes sense (that’s called lexing) and converted it to the bytecode.
I've never heard "lexing" used this way, and I believe it's simply incorrect. Lexing (tokenizing) precedes parsing (parse tree and then syntax tree construction). It isn't syntax tree validation.
Or so I thought. Are there other examples (besides this article) of "lexing" also being used to mean something else?
BTW, a good technique for catching those kinds of mistakes is to read the piece out loud. Engaging more of your nervous system makes the visual elision easier to detect.
To be honest, while I've dabbled in Ruby, I've never understood the difference on an intuitive gut level like other foreign constructs unique to individual languages I've gone deep with.
I don't know what it was about this article but I feel slightly more confused now. It jumps to bytecode before explaining how they're useful at the higher level of abstraction that is the developer.
How do symbols help you think about and find solutions to problems and then implement those solutions? I.e., what is the facility they provide that is not present in some more traditional OOP language (say PHP?)
The article is an extremely needlessly complicated explanation of a relatively simple principle: symbols are stored once and referenced by pointer, strings are stored multiple times and not so easily compared. Ergo, for many use-cases symbols are faster and use less memory.
Some of these use cases are little tokens like single words that are used in as values in function arguments, or a switch statement. On the other hand, storing the user's inputted name as a symbol whilst copying that data to your model object is probably not a good idea.
Those are more focused and simpler than strings in say PHP. And they are first class, unlike members in Java.
They are primarily first class names in your program. Think of them as distinct elements in a set, as opposed to arbitrary text to be transformed and parsed.
If I give you a symbol (or keyword) then you know it is a name. If I give you a string, it could be anything really.
In scheme you usually use symbols instead of strings because the whole equality story is simpler and much faster. I once changed a tight loop in some code from dispatching on strings to dispatching on symbols and got a 15% speedup. Why? String equality is expensive. A symbol is basically a readable fixnum, where two similar symbols are always the same objects (errr... Don't quote me on that because it isn't strictly true).
Most places where you can use symbols instead of strings you lose nothing and gain speed.
In Ruby `"Foo" == "Foo"` returns true. Whereas `"Foo".object_id == "Foo".object_id` returns false: They are not the same object, but report being equal.
OTOH, symbols return true for both: they are the exact same object.
It's not just about speed though: symbols are somewhat limited in what they can be made of. They follow the same limitations as methods and variables. So often symbols are used when dynamically calling methods or assigning variables.
"Foo".public_send(:strip!) ¹. Which is slightly different from "Foo".public_send('strip!'). Not in outcome, but in calling. Because this is invalid syntax: "Foo".public_send(:one-two three) whereas this isn't: "Foo".public_send('one-two three'). Technically, I guess Ruby can have a method that is named "one-two three" but that would be really nasty to call. Symbols protect a lot against this.
And therefore are used in this context a lot.
¹ The exclamation mark can be a part of a method and symbol in ruby. As can the question-mark and some other sugar-ish stuff like [].
> symbols are somewhat limited in what they can be made of
Only in the unquoted literal syntax. The :symbol form follows Ruby's usual identifier rules but there's also a :"quoted symbol" syntax. You can also send :to_sym or :intern to any string and it will be converted to a symbol.
Ruby inherited that ?! convention fromm scheme. All mutating procedures end with ! and all predicates with !.
Prefix notation has none of those pesky limitations if you can live with it :)
Edit: oh. Scheme is painfully monomorphic. Equality for.strings is string=?. Equality for chars is char=?.
Then there is object equality (eq? ...), eqv? ("Normally eq?") and equal? which is a generic equality predicate that works for all objects (including circular data structures).
Eq? is the one you would use for symbols. Symbols are always (almost, at least) eq?. One string is only eq? To itself, but not a string containing the same content.
They're largely a holdover from smalltalk and lisps, which use them as a sort of generalized token. I find them handy mostly because you can express things like slot names, enums, or keys in a way that's unambiguously not user provided. In Elixir for instance (which has Ruby's symbol syntax slapped on Erlang's atoms), you'll often report a failure with {:err, "error message"} so you can pattern match on it. In principle, with immutable strings, you could just have {"err", "error message"}, and it would work the same! But that's hard to distinguish from a list which happens to contain two strings.
Of course, the only thing prohibiting :err from being part of the data is convention. But if you're just looking over the code, the atom stands out almost like a syntactic feature, so it's easier to hold to. Plus, since they're interned strings underneath, you can use them in macros to make things like schemas unambiguously, and convert them into migrations with very little magic. So that's it, nothing you couldn't do with interned strings, variable names, and enums, but all in one handy little first class datatype.
String equality is linear time in the worst case. Symbol equality is constant time. Symbols are strings that act like numeric constants. They're extremely useful in APIs as arguments, return values and generalized tokens. They're also the obvious choice for hash table keys.
A symbol is your string applied as the argument to an algorhtym that returns a deterministic interger. The interger is smaller and easier to sort and compare, making many common operations more efficient.
It seems like the difference(As far as I can tell without spending way more time with the article) is deduplication, which other languages already give you with strings.
I think thr syntax is slightly nicer than quotes, but it's also more syntax, and there is a limit to how much you can have if you don't want code to look like Perl(Which Ruby is approaching).
People are saying it can can be used for some kind of better compile time checking, would be interesting to see that as the main focus?
> deduplication, which other languages already give you with strings.
Aren't Ruby's strings mutable? I'm not sure, but I seem to recall they were/are, in which case you can't really intern them. Python and Java have immutable strings, with the ability to optimize allocations being (probably?) one of the reasons.
On the other hand, Symbols in Ruby seem to be immutable, which allows for their interning.
It's optional. By default strings are mutable, but you can freeze them individually, and you can set a directive that makes all string literals as immutable on a file-by-file basis.
Elixir (and Erlang) have atoms which are exactly the same thing. They're useful in any dynamic programming language - in a static one, the equivalent is different values of an enum.
Python notably doesn't, and as such you get functions that take arguments that are strings with special meaning, which I always found a bit clunky even before I discovered Ruby.
I think Erlang has them for a more specific reason: in general it eschews all forms of data definition (records are just a fancy syntax for tuples with an atom at the start), which makes hot code reloading and transparent network communication simpler.
Both cases would require some synchronization of data structures (across time or over the network), and with user-defined types this can get complicated, and atoms make the lack of user-defined types much more pleasant (and more performant than strings).
Yes, that's fair. I suppose I was describing my process of "discovery" and learning with Python which was significantly before 2014. Even now though, enums are not usually the normal way of doing things in public APIs, but that's presumably at least in part because of the history.
Also, unlike Ruby, string literals are usually interned in CPython (I think below a certain size), so they have at least some of the performance benefits of symbols in Ruby.
In the context of computer algebra systems, which are much about manipulating abstract syntax trees, mathematical variables are usually represented as "symbols".
Beyond that, this page gives databases as an example, which is in fact very nice. Beyond being fast and efficient, using symbols allows certain errors to be compile-time instead of runtime, where typos are only detected on an application level and not on a code level. This is where symbols can play out their advantage. Think a bit of ENUMs in other languages.
the "you name it" idiom in english is usually used to mean "anything you want", as in "you pick it"- so your first sentence reads like "native symbols are useful for anything, because (as everyone knows) symbolic programming is useful for anything"
i think the idiom you were going for was perhaps "you guessed it" or "you called it", as if poking fun at how, obviously, native symbols are helpful for symbolic programming, because it's the same word
My opinion is that Ruby has symbols for strings that are static - part of the program - and normal strings for dynamic runtime data.
Separating the two is useful semantically because it lets you differentiate between the two - and because these two kinds of string are better off being implemented and optimised in different ways.
So it's both UX and practical mechanical sympathy.
While other comments have discussed the technical utility of symbols, I believe symbols can also be seen as useful syntactic sugar that helps communicate intent. Strings used for indexes. named args, and other structural purposes can be represented in a way that is visually distinct from strings used as text.
The technical benefits are nice, but this type of ergonomic feature is why ruby has remained my favorite language for over a decade.
I was on the same page, but now moving away from that.
I more and more dislike how Ruby (arbitrarily) allows omitting brackets. but not always. Often making the code harder to read. What is the call-chain in this rspec magic: `expect(something).to be >= 1` (quick: where and how do you add a custom failure message).
And while `attr_accessor :time, :date, :state` are really neat, I more and more dislike constructs like `validates :name, :login, :email, presence: true`. And prefer to write them explicit and unambiguous: `validates_presence_of(:name) etc`. Which is only a very slightly improvement over `validates_presence_of('name')`.
And don't get me started on "saving time" by typing less characters or shorter lines of code: if this is what makes you Go To Market faster, there's something very wrong with your IDE, editor or typing skills. If anything, those short things have cost me time in Rails codebases living years and years.
> validates :name, :login, :email, presence: true`. And prefer to write them explicit and unambiguous: `validates_presence_of(:name) etc`. Which is only a very slightly improvement over `validates_presence_of('name')`.
That's Rails, not Ruby. Although Ruby allows it because of how flexible it is + metaprogramming.
It indeed is convention in Rails. A bad convention IMO.
But it is enabled by Ruby, as you state, by how flexible Ruby is. It may seem a nice touch that Ruby hands you the freedom to choose to e.g. omit brackets. But I think this is a bad freedom.
As Rails shows, its a freedom that leads to, IMO, harder to read, and harder to reason about code.
With any language design, the limitations as well as its features, is what make the language. Limitations are an important feature of a language, IMO.
Intent sometimes becomes clearer with less characters. E.g. "attr_accessor" is, IMO, vastly superior to a large list of getter and setter method definitions. Easier to read, clearer in intent. Especially when there is that one getter or setter: you can be confident its doing more than just setting/getting (which probably is a smell, but I digress).
Details, however, hardly ever become clearer. A single `has_many :tags, :through => :taggings, delete: :cascade` may seem easier to read than explicit method declarations and callback registrations, but its a faux abstraction. It also rapidly falls apart when you continue developing on this for years and end with things like `has_many :posts, :through => :taggings, :source => :taggable, :source_type => 'Post'`
The abstraction remains in tact with a `define_relation(:taggings, DatabaseJoinTable.new(:taggins))` an `delegate :tags, to: :taggins` and a `register_callback(:delete, InlineTagginsRemover.new(self.taggings)`. I just made this up. But I tried to design an interface that is explicit rather than implicit. One that uses dependency injection and common Ruby-isms over a framework DSL.
Point is: behind those seemingly "easy to read" lines, there's a large world of black magick, lurking. I've dug through these forbidden forests on numerous occasions when our Rails app started misbehaving, race-conditions popped up, performance degraded, or even random dataloss. It's only easy to read on the surface. And while that is where we spend a lot of time reading, readability of the underlying stuff is even more important, because that is where the details matter.
Rails sacrifices the readability and understandability of what happens below the hood for readability and understandability on the surface. This seems a deliberate choice. But I dislike it. Severely.
I'm in the same boat, FWIW. I really did not understand the point at first, but I love symbols now. I only vaguely understand the point, even after reading this thread, but I like using them.
In the early days, symbols weren't garbage collected, while strings were mutable, wasted memory and were slow. So there were tradeoffs.
Now you can use frozen string literals and there's no benefit to symbols. Throwing "# frozen_string_literal: true" in the top of the memory benchmark script I get:
At this point with no practical difference between them STRINGS AND SYMBOLS BEING DIFFERENT ARE A MISTAKE. When you serialize to something like JSON you lose the distinction (the operation is singular and does not have an inverse transform) and you have to pick either symbols or strings to get back. On a long enough timescale this causes enormous confusion, and leads to the creation of hashes with indifferent access (which helps with the problem, but doesn't fix it).
Ideally it would be good at this point to make symbols and frozen strings completely equivalent ("foo".freeze == :foo being true) but that would likely break too much existing code. The differentiation between strings and symbols though only causes code bugs (mostly biting the new and intermediate level programmers). It is just syntactical sugar with a footgun.
Designing a language from scratch these days, it should have immutable strings by default from the start and should not introduce symbols, unless they are purely syntactic sugar around creating an immutable string.
> At this point with no practical difference between them
Gah no! This isn't true! Even if you turn on frozen string literals comparing two strings is slower because they have to test for a non-frozen and non-interned string also happening to be the same.
There's a pointer comparison, but behind it is on the failure side is a full-byte-comparison. Atrocious for cache even if the strings are tiny. If they aren't you're checking every byte!
I'm a big fan of explicit symbols, but some syntactic sugar around immutable strings and type inference should let you use "symbols" with little performance penalty. Of course by then you might as well use explicit symbols anyway, but I guess there's some additional flexibility. Of course that hurts macro-writing a bit, but not much.
I both like and hate how Rust has `String` and `&str`. Constant juggling between the two (which really is a sign I'm not doing it right). Yet knowing and using the difference is important and powerful.
I somewhat miss this when I go back to Ruby, but then realize that symbols often can be used for `&str`. Often. Not always.
They're like enums but without the risk of colliding across unrelated contexts. And because they're dynamically generated you can have runaway symbol generation that triggers a memory consumption problem.
Newbies try to use strings as enums, because they don't understand enums. Symbols provide tradeoffs compared with both.
This is what you get with a dynamic language where people try to overload functions with "I accept a scalar OR an array!!!"
From the caller's perspective there's little difference between a procedure with variadic arguments and multiple dispatch. There's a lot of difference from the implementor's perspective, though. In Perl 5 you'd see a lot of ref() and wantarray() while in Raku you'd see different signatures for procedures with the same name.
This seems to be a dynamic language thing. Could statically typed languages have some uses for it? In Java one might use a lot of constant strings which are keys for property files. You can of course do some dynamic stuff in order to construct them but they are more likely to be used as constant strings. Would using a separate construct for that make optimization easier? Or wouldn’t it make a difference in practice?
Java does use string interning for constants. It can help in any case when you have a lot of instances of the same string (and when you need to compare that strings)
No, Ruby has symbols because it was inspired by Smalltalk which has symbols. Of course, Lisp also has symbols, and it was one of the inspirations for Ruby (and Smalltalk), but the idea of representing message sends (ie. method calls) as symbol + arguments comes from Smalltalk.
It's not a historical article tracing the lineage of the feature.
Ruby took some things the designer liked from Lisp, from Smalltalk, from Perl, from other places. He liked symbols because they're good for performance (and compile-time correctness) at a low cognitive cost, and that's why Ruby has symbols.
Ruby has symbols in all probability because Lisp and Smalltalk have symbols.
It could get most of the same practical upside of symbols from interned strings - the important thing is being able to compare using pointer equality and look up hash tables without needing to walk a string. What symbols at the type level do is ensure that these string-like things have already been interned, that is, de-duplicated, when they hit lookup points like member access.
But the implementation could do something very similar behind the scenes by setting a bit on interned string values. Besides, symbols aren't enough for the more advanced dynamic language optimization techniques like you see in V8.