Hacker News new | ask | show | jobs
by reitzensteinm 2164 days ago
That Clojure is keyword heavy is true, but it's so important to note that objects are essentially never checked for reference equality in Clojure, even when e.g. looking up keys in a hash map (see demo).

With this is mind, you could stop interning keywords and damn near every Clojure program would continue to work just fine - but with a noticeable slowdown.

Or, more sensibly and to bring it back to the theme of the thread, for adding a second non-interning Keyword type which can safely be generated while deserializing user input in a long running process, that you can use interchangeably with standard keywords, but will be garbage collected away with the reset of the deserialized data when you're done.

You do pay a hefty penalty here because you're hiding everything behind interfaces and abstractions. It's totally fine to not like the system, or believe it's not worth the performance hit.

But it does mean that a potentially equal but not identical symbol isn't some off brand low quality replacement as GP suggests, it's just... a symbol.

Pastebin demo: https://pastebin.com/cbWiNyEL

3 comments

I wouldn't write a config file parsing library for C programs without interning, so == between two pointers could be used to test for keyword equality.

Interning is used outside of LIsp. See the XInternAtom function in the X Window system:

  Atom XInternAtom(
    Display *display,
    char *atom_name,
    Bool only_if_exists
  );
 
or RegisterClass in Win32:

  ATOM RegisterClassA(
    const WNDCLASSA *lpWndClass
  );
I wouldn't either :)
For Lisp the following is the usual behavior: Symbols are by default interned and identical symbols are tested with EQ to be T.

  > (eq (read) (read))
  a a 
  T
The default test function is EQL, which is using EQ to test symbols. In Common Lisp #:a would be an uninterned symbol with the name "A".

  > (find 'a '(#:a a))
  A

  > (find 'a '(#:a a) :test #'string-equal)
  #:A
setting the value of a symbol will basically work in all Lisps with symbols in similar fashion like this:

  > (dolist (item '(a b c a))
      (set item (if (and (boundp item)
                         (numberp (eval item)))
                    (1+ (eval item))
                    1)))
  NIL

  > (mapcar 'eval '(a b c a))
  (2 1 1 2)
This last example will for example run unchanged in Emacs Lisp and Common Lisp.
What is the purpose of creating uninterned symbols?
They could be used as symbols which can be GCed.

Though a typical use is in macros, where macros introduce new symbols and these should never clash with any existing symbol and to which there should be no access via the name.

Example: A macro which writes the form, the value and which returns the value. GENSYM generates a named/counted uninterned symbol.

  > (defmacro debugit (form &aux (value-symbol (gensym "value")))
      `(let ((,value-symbol ,form))
         (format t "~%The value of ~a is ~a~%" ',form ,value-symbol)
         ,value-symbol))
  DEBUGIT
If we look at the expanded code of an example, we can see uninterned symbols:

  > (pprint (macroexpand-1 '(debugit (sin pi))))

  (LET ((#:|value1093| (SIN PI)))
    (FORMAT T "~%The value of ~a is ~a~%" '(SIN PI) #:|value1093|)
    #:|value1093|)
We can also let the printer show us the identities of these symbols, labelling objects which are used multiple times in an s-expression:

  > (setf *print-circle* t)
  T

  > (pprint (macroexpand-1 '(debugit (sin pi))))

  (LET ((#2=#:|value1095| #1=(SIN PI)))
    (FORMAT T "~%The value of ~a is ~a~%" '#1# #2#)
     #2#)
Thus we can see above that it's just one uninterned symbol used in three places.

Example run:

  > (debugit (sin pi))

  The value of (SIN PI) is 1.2246063538223773D-16
  1.2246063538223773D-16
That seems like a bad idea, since you've now got two symbols with the same name that'll fail eq? Is this ever actually done?

Interesting that gensym returns uninterned symbols, thanks.

The uninterned symbols don't fail EQ if they are the same identical symbol.
Great, thanks for filling me in. Any idea why the Google guide is against using them for this purpose?
Good points.

It's interesting that despite this keywords are serialized all the time in Clojure land (eg in the transit format that is commonly used for frontend/backend communication).

I think Google's warning definitely applies to Clojure.

Most json libraries will convert string keys to keywords, and they're not weak references.

An attacker can probably just send a few dozen gigabytes of random json to the average Clojure app and it's going to go down.