Hacker News new | ask | show | jobs
by oytis 395 days ago
It's year 2025, why doesn't the author use actual fruit emoji for variable names?
3 comments

When trying to understand complex C codebase I've often found it helpful to rename existing variable as emojis. This makes it much easier to track which variables are used where & to take in the pure structure of the code at one glance. An example I posted previously: https://imgur.com/F27ZNfk

Unfortunately most modern languages like Rust and JS follow the XID_Start/XID_Continue recommendation (not very well-motivated imo) which excludes all emoji characters from identifiers.

wouldn't writing a parser of sorts that would replace emojis with a valid alphabetical string identifier be trivial?
You're right that writing a preprocessor would be straightforward. But while you're actively editing the code, your dev experience will still be bad: the editor will flag emoji identifiers as syntax errors so mass renaming & autocompletion won't work properly. Last time I looked into this in VSCode I got TypeScript to stop complaining about syntax errors by patching the identifier validation with something like `if (code>127) return true` (if non-ascii, consider valid) in isUnicodeIdentifierStart/isUnicodeIdentifierPart [1]. But then you'd also need to patch the transpiler to JS, formatters like Prettier, and any other tool in your workflow that embeds their own version of TypeScript...

[1] https://github.com/microsoft/TypeScript/blob/81c951894e93bdc...

Here's Gemini's (brute force?) solution to the problem in C# using fruit emoji variables:

https://imgur.com/a/cC5QPH0

Nice! Not going to work, but appreciate the emoji.

UPD. This code is like that picture with Disney princesses. The closer you look, the scarier it gets

Yeah, don't deploy this to production :laughing_emoji:

I do use a lot of emojis in logs and status windows and in my comments. I'd not really thought about them in my code, but now I'm going to try to think of a way they could be used to improve code in a sensible way.

While current year is 2025, the year the language was made in probably isnt
Good or not, many languages choose not to support emoji, even newer ones.

The reasons are complicated but, many people prefer unicode normalization so that different forms of what appear to be the same word are considered the same word. People argue whether or not this is important but it can certainly be argued that it would be frustrating to get an error like

    let café = 1;
    café += 1;  // error, unknown identifier 'café'
The error happens in non-normalizing languages because those to idenifiers are not the same unicode.

But, choosing a normalization affects emoji as well. Worse, when new ones are added the normalization rules can change.

I had an idea the other day to use 64-bit character codes. Then each code can be interpreted as an 8x8 bitmap. Every character would be guaranteed to have a unique bitmap representation. The bitmaps wouldn't bet used for rendering, of course -- but they could be used as a fallback if your font does not define a character. Anyway this would somewhat avoid the problem you describe because two characters that look the same visually would have the same value. Nothing I'll ever implement of course, just a thought experiment.
As someone who has worked with 8×8 fonts I can report that you'd have some surprising problems with that idea. Not only would you have problems with there being two distinct forms of letters like "a" and "g", making things over-unique; and not only is it tricky to differentiate forms that actually are not the same, because they are from two different alphabets (especially the 13 extra "mathematical" alphabets); but it's actually quite difficult to make pre-composed forms in that amount of space.

8×8 is a tight squeeze, and 16×16 works a lot better. But that would make your approach vastly more space hungry than a normalization approach using the actual Unicode code points.

* https://github.com/jdebp/unscii/tree/2.1.1f

The level of blind trust that English speakers put on non-ASCII characters support always throw me off, knowing username on Windows 11 still has to be short ASCII sequences. Surely it's not 2010 anymore and you only have to recreate the user account rather than clean re-installing Windows, but still.

Non-ASCII comments in a source code can be scary enough sometimes, unless it's for an all-Unicode system like Android or something HTML based.

The early pages of the Swift docs show you can use emojis as variable names, and iirc that was the very first page in the Swift 1.0 handbook. I have a theory that the language design was originally motivated by emojis, because there were also interesting choices around strings, like originally having no ".length" method.
Here: https://docs.swift.org/swift-book/documentation/the-swift-pr...

Easter egg: The example is named dogcow, after a 90s Mac icon, designed by Susan Kare, which later became a small mascot: https://512pixels.net/dogcow/

Regarding .length: Effectly that is just the result of Unicode, there is no one-to-one equivalent between characters code points, the code units in an encoding and the resulting grapheme clusters. That is in effect a result of the complexity of the world's alphabets, including Emoji.

Yeah I get the justification for the lack of .length, but they eventually added it for a good reason too, which is that anyone calling that doesn't really care and those who do care can use something more specific.

The other aspects of strings are also centered around things being of uncertain length, like how it's O(n) to take the nth character of a string, and how there are rather complicated objects involved in taking substrings. There's a lot more thought and resulting complexity than other languages' default strings. And yes a few languages use extended grapheme clusters, but I feel like emojis were the real motivation.

To clarify, this tradeoff makes sense when you care a lot about complex emojis, but not so much otherwise. Other programming languages' strings can store grapheme clusters too but don't optimize around them. The only other example I found back then was a non-modern alternate Korean script.