Hacker News new | ask | show | jobs
by formerly_proven 1647 days ago
I was a interested in Nim until I learned that getBlochenAngle and get__B_L_O_C_H_E_N_ANGLE are the same identifier.

> https://nim-lang.org/docs/manual.html#lexical-analysis-ident...

5 comments

Agreed. First world problem, though I've tried couple of times to get into nimlang, but this feature is such an anti-pattern (anti-feature) that it drove me crazy. I could not easily and reliably grep/search anything. Not to mention that reading code requires extra mental overhead (especially being new to the language), that `getAttr`, `get_attr`, etc., are actually the same thing.

Why this was implemented into the language itself, instead of left as a suggestion or a standard is beyond me.

quick edit: also, everything is imported globally (if that's the right term? like in python `from package import *`). So when you see function call, you need to look it up all the time where it comes from.

edit2: apparently I am not alone :)

This complain comes from people who haven't used the language.

> Not to mention that reading code requires extra mental overhead (especially being new to the language), that `getAttr`, `get_attr`, etc., are actually the same thing.

On the contrary, the language prevents confusion due to mixing getAttr and get_attr in the same codebase and bugs from using the wrong one.

Unsurprisingly, many safety-critical environments have policies to enforce consistent naming styles.

The linter will convert both to "getAttr" and the compiler will complain if the user tries to define the same proc twice.

There are two kinds of case-insensitivity in identifiers. In either cases identifier cases (and in this case, also underscores) are normalized, but case-agnostic languages would allow any mix of them while case-pedantic languages would disallow any pair of identifiers normalizing into the same name (they may still give helpful errors based on that normalized name though). I don't think case-agnostic languages provide a new value not provided by case-pedantic languages.
Nim provides the benefit of both: when interfacing with C libraries a case-agnostic language helps.

Yet, checks on function definition and types has the same benefit of case-pedantic language, without forcing a specific style on the developer.

It is okay to have a feature (NOT necessarily this feature) that translates other conventions to a single consistent convention; for example a language may convert a name "foo_bar" into "fooBar" when used in C bindings, preferably with an escape hatch. However case-agnostic languages do not try to do that, they give zero indication for what convention to use (cf. function names in PHP, which is a horrible mess). No single convention is better than others, but a single consistent convention does matter.
Try nimgrep: https://nim-lang.org/docs/nimgrep.html

If you install Nim with choosenim[+], nimgrep will be available in ~/.nimble/bin along with the compiler, nimble, and some other helpful executables.

[+] https://github.com/dom96/choosenim#readme

> everything is imported globally `from package import *`

yes, that is the default, but it is also possible to

`from package import nil`

which gives you python's `import package` behaviour or

`from package import symbol`

which is just like python

My personal blocker is that identifiers are all imported globally by convention, so when you see that there is a call to a method called "get", you have to get to the top of the file or mouse over the call to see what lib it is from. A "get" from the http lib is not the same as a "get" from the kv store lib.
There is some logic as to why that is. Here [1] is an explanation for why it makes sense but the tldr is that you don't want to be manually importing functions such as `$` and `+`. In languages like Python, those are defined as methods on the object being imported (e.g. `.__str__()`) so they come along for free. Not so in Nim. If there's a conflict (same name, same signature), the compiler will warn you but it's extremely rare.

[1] https://narimiran.github.io/2019/07/01/nim-import.html

Thank you for the link but it doesn't address the issue I have. It's not about types, or about the compiler being "unsure". It's about me, as a developer, reading code someone else wrote, not knowing directly what package a call is from. I need to leave my current context to have the answer.

I can do `mypackage.mymethod` but it will only be in my own code, because it's not the convention

There are plenty of cases in Python and similar languages where it's not clear where a method is defined, consider `myClassInstance.myMethod`, how do you find its definition? You do not immediately know which class it belongs to, nor where that class is defined. This is especially the case when you've got classes inheriting from multiple levels of other classes.
To put things in context, I don't come from a Python background but from a Go background, where methods are always called with their package (unless it's in the current package). I got used to it because it makes the context clear.
Ah that makes sense. I agree with you; I’m not a huge fan of trying to infer where the types came from myself either when reading code on GitHub since it doesn’t have the inference that my IDE does.
Argument-dependent lookup would solve that in a far less global way.
Don't Haskell and Go also do this?
I have no experience in Haskell, but in Go:

- if it's a builtin or in the current package, you usie it directly

- if it's in another package the identifier is always prefixed with the package

This is quite bad. Relying on uppercase/lowercase equivalences in a Unicode world is by definition a code smell, no matter if they force the comparison to be based on ASCII. This whole ordeal causes any sort of issue if Unicode letters are allowed, because they will pass through `toLowerAscii` untouched and it is bound to cause confusion or to force people to avoid using Unicode in identifiers altogether.

Case insensitivity is a Western-only concept that should die ASAP, it's immensely complicated to pull off right and it opens a massive can of worms that makes no sense (see the Turkey Test for more about this).

Keywords are in English, and 99% of identifiers in all code I met are too, even when comments are in French or russian.

Pascal and old basic (among other languages) have been case insensitive for decades, and that has not been a problem.

In UI, you are by all means correct. But code is a formal language that happens to be expressed in Latin letters. APL is the only real language that doesn’t impose a western character set.

> Pascal and old basic (among other languages) have been case insensitive for decades

... because back then Unicode was still a pipe dream in the mind of some visionary. Everything used 8 bit encoding, and everyone assumed ASCII or at least something compatible with the 7 bit subset of ASCII.

Nowadays files are formatted in UTF-8, and most modern languages actually fully support UTF-8 identifiers. Nim itself supports UTF-8 "letters" in identifiers, and what is "upper case" or "lower case" 100% depends from the current locale. Restricting your case normalization logic to ASCII is __really bad__, because it basically means that non-Latin letters in identifiers won't be normalized, with possibly unexpected consequences.

> APL is the only real language that doesn’t impose a western character set.

- Rust uses UTF-8: https://doc.rust-lang.org/reference/identifiers.html

- Go allows any Unicode letter in identifiers: https://go.dev/ref/spec#Identifiers

- Swift is also famous for allowing you to use emojis in identifiers.

- Python supports non-ASCII identifiers: https://www.python.org/dev/peps/pep-3131/

And the list goes on. Even C++ can optionally support Unicode in identifiers (for instance, Clang and GCC do indeed support things like `constexpr auto 黒 { "lol" };`).

> ... because back then Unicode was still a pipe dream in the mind of some visionary. Everything used 8 bit encoding, and everyone assumed ASCII or at least something compatible with the 7 bit subset of ASCII.

They could still have been case sensitive (C was), so I don't understand how that's relevant to the idea that "case insensitivity is a problem".

> Rust, Go, Swift, Python

All of these languages impose ASCII for their keywords and directives. They allow you to use other characters for identifiers, but impose ascii in everything that has pre-defined semantics. Original APL is the only "real"/practical language that I'm aware of that gave up the "western centric view" of the world to the point that it doesn't have a single English keyword. (Brianfuck, etc. exist as well, but ....)

And they all impose a left-to-right reading order, which is just as western-centric. Arabic/Farsi/Hebrew go right-to-left, and there are languages that can also go top-to-bottom.

I think the outrage about "western centrism" is misguided. This is a formal system, and just like math, it reflects some history by using latin letters and left-to-right for the predefined symbols, and even preferred use of latin characters in identifiers.

> Nim itself supports UTF-8 "letters" in identifiers, and what is "upper case" or "lower case" 100% depends from the current locale.

If that's true, that may be a problem. I'll look into that, thanks for pointing out - from memory, Nim only folds the lower 7-bit by a 32 difference in ascii code, so it is well defined regardless of locale, but I'll check.

The whole idea of utf-8 in identifiers is a minefield, whether you fold case or not; e.g:

"Εхаmрⅼе" and "Example" have no single letter in common (I chose them that way using[0]) and no language that allows utf-8 identifiers is going to warn you about that.

[0] https://www.irongeek.com/-attack-generator.php

> They could still have been case sensitive (C was), so I don't understand how that's relevant to the idea that "case insensitivity is a problem".

The point here is that case insensitivity is only a viable option if you severely limit the encoding allowed in whatever you are using - be it a programming language, filesystem, etc. If the encoding of your files is something is basically akin to ASCII or ISO-whatever (which was what BASIC and Pascal used back in the day) then case insensitivity is trivial and safe.

This whole thing breaks apart as soon as you enter a Unicode world and start accepting identifiers containing more than ASCII, and then the whole concept of "case insensitive" becomes obsolete and outright wrong.

The Unicode equivalent of "case insensitive" is Normalization [0] and it's a big heck of a minefield because it is defined depending on the locale in use. For instance, "FILE.TXT" and "file.txt" are to be considered equivalent under en_US, but not under tr_TR, where the lower case version of "FILE.TXT" is "fıle.txt" and the upper case version of "file.txt" is "FİLE.TXT". This means that normalizing strings can cause to unexpected results depending on the locale, which is especially problematic with filesystems (where a path may exist or not depending on the locale).

> Nim only folds the lower 7-bit by a 32 difference in ascii code, so it is well defined regardless of locale

yes, it is well defined but allowing the entirety of the Unicode letters also means that identifiers may contain glyphs from alphabets that have separate cases, chiefly Greek and Russian, or even accented letters such as `è` or `ö`. Case insensitivity instead of proper normalization makes them potentially confusing, and quite breaks the intent behind allowing Unicode identifiers by making non-US locales second class citizens.

IMHO it is arguably very confusing to non-English speakers that 'mela' is equivalent to 'MELA' but 'tè' isn't equivalent to 'TÈ' while 'Tè' is. It basically means you have to remember what letters are ASCII and what are not, which makes the whole "case insensitive" a potential source of confusion.

I think it is safe to say that in 2021 case insensitivity is an obsolete concept and an obstacle to proper internationalization. Case insensitivity only really works on legacy encodings and with the basic Latin alphabet, and you can rest assured it will be almost always improperly implemented anyway.

[0] https://en.wikipedia.org/wiki/Unicode_equivalence

I understand your point, but still disagree with it. As I see it, the real problem is unicode identifiers, as I demonstrated with "Example" above, and as follows from your demonstrations as well. Unlike the thousands of unicode characters, which are unlikely to be all familiar to any single person, and whose meaning and "conjugation" (casing, conjugation, pre-joined pairs, precomposed versions, etc) are different in different cultures -

The ascii case folding, as employed by Nim and Pascal refers to 26 specific well known characters. It's a non-issue.

Indeed I too find this off-putting. I understand the rationale behind it, but developers are used to extreme attention to detail, and this discards an important aspect of detail in the important area of naming.
>It allows programmers to mostly use their own preferred spelling style, be it humpStyle or snake_style

It's interesting that some languages, like Go, are specifically designed to avoid it (there's a built-in formatting tool for a single coding convention), while in other languages having a zoo of different styles is viewed as a great idea and the language is designed for it. Or is it about linking with existing C libraries? In that case, I'd introduce some sort of name mapping via attributes/annotations as a whitelist, instead of allowing this behavior by default.

I personally still use Python because I miss list and dict comprehensions.

I know there is a `collect` macro in `sugar` module but it is nowhere close to the python comprehensions. The code is too verbose and basically is just the same multiline for loop :-(

(on mobile) You could use Nimssequence iterators

[x.name for x in someList if x.age > 5]

Is the same as:

x.filterIt(it.age > 5).mapIt(it.name)

In a way it reads better.