| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by formerly_proven 1647 days ago
	I was a interested in Nim until I learned that getBlochenAngle and get__B_L_O_C_H_E_N_ANGLE are the same identifier. > https://nim-lang.org/docs/manual.html#lexical-analysis-ident...

5 comments

trymas 1647 days ago

Agreed. First world problem, though I've tried couple of times to get into nimlang, but this feature is such an anti-pattern (anti-feature) that it drove me crazy. I could not easily and reliably grep/search anything. Not to mention that reading code requires extra mental overhead (especially being new to the language), that `getAttr`, `get_attr`, etc., are actually the same thing.

Why this was implemented into the language itself, instead of left as a suggestion or a standard is beyond me.

quick edit: also, everything is imported globally (if that's the right term? like in python `from package import *`). So when you see function call, you need to look it up all the time where it comes from.

edit2: apparently I am not alone :)

goodpoint 1647 days ago

This complain comes from people who haven't used the language.

> Not to mention that reading code requires extra mental overhead (especially being new to the language), that `getAttr`, `get_attr`, etc., are actually the same thing.

On the contrary, the language prevents confusion due to mixing getAttr and get_attr in the same codebase and bugs from using the wrong one.

Unsurprisingly, many safety-critical environments have policies to enforce consistent naming styles.

The linter will convert both to "getAttr" and the compiler will complain if the user tries to define the same proc twice.

lifthrasiir 1647 days ago

There are two kinds of case-insensitivity in identifiers. In either cases identifier cases (and in this case, also underscores) are normalized, but case-agnostic languages would allow any mix of them while case-pedantic languages would disallow any pair of identifiers normalizing into the same name (they may still give helpful errors based on that normalized name though). I don't think case-agnostic languages provide a new value not provided by case-pedantic languages.

goodpoint 1647 days ago

Nim provides the benefit of both: when interfacing with C libraries a case-agnostic language helps.

Yet, checks on function definition and types has the same benefit of case-pedantic language, without forcing a specific style on the developer.

lifthrasiir 1647 days ago

It is okay to have a feature (NOT necessarily this feature) that translates other conventions to a single consistent convention; for example a language may convert a name "foo_bar" into "fooBar" when used in C bindings, preferably with an escape hatch. However case-agnostic languages do not try to do that, they give zero indication for what convention to use (cf. function names in PHP, which is a horrible mess). No single convention is better than others, but a single consistent convention does matter.

michaelsbradley 1647 days ago

Try nimgrep: https://nim-lang.org/docs/nimgrep.html

If you install Nim with choosenim[+], nimgrep will be available in ~/.nimble/bin along with the compiler, nimble, and some other helpful executables.

[+] https://github.com/dom96/choosenim#readme

shirleyquirk 1647 days ago

> everything is imported globally `from package import *`

yes, that is the default, but it is also possible to

`from package import nil`

which gives you python's `import package` behaviour or

`from package import symbol`

which is just like python

rakoo 1647 days ago

My personal blocker is that identifiers are all imported globally by convention, so when you see that there is a call to a method called "get", you have to get to the top of the file or mouse over the call to see what lib it is from. A "get" from the http lib is not the same as a "get" from the kv store lib.

benjamin-lee 1647 days ago

There is some logic as to why that is. Here [1] is an explanation for why it makes sense but the tldr is that you don't want to be manually importing functions such as `$` and `+`. In languages like Python, those are defined as methods on the object being imported (e.g. `.__str__()`) so they come along for free. Not so in Nim. If there's a conflict (same name, same signature), the compiler will warn you but it's extremely rare.

[1] https://narimiran.github.io/2019/07/01/nim-import.html

rakoo 1647 days ago

Thank you for the link but it doesn't address the issue I have. It's not about types, or about the compiler being "unsure". It's about me, as a developer, reading code someone else wrote, not knowing directly what package a call is from. I need to leave my current context to have the answer.

I can do `mypackage.mymethod` but it will only be in my own code, because it's not the convention

dom96 1647 days ago

There are plenty of cases in Python and similar languages where it's not clear where a method is defined, consider `myClassInstance.myMethod`, how do you find its definition? You do not immediately know which class it belongs to, nor where that class is defined. This is especially the case when you've got classes inheriting from multiple levels of other classes.

rakoo 1647 days ago

To put things in context, I don't come from a Python background but from a Go background, where methods are always called with their package (unless it's in the current package). I got used to it because it makes the context clear.

benjamin-lee 1647 days ago

Ah that makes sense. I agree with you; I’m not a huge fan of trying to infer where the types came from myself either when reading code on GitHub since it doesn’t have the inference that my IDE does.

formerly_proven 1647 days ago

Argument-dependent lookup would solve that in a far less global way.

nerdponx 1647 days ago

Don't Haskell and Go also do this?

rakoo 1647 days ago

I have no experience in Haskell, but in Go:

- if it's a builtin or in the current package, you usie it directly

- if it's in another package the identifier is always prefixed with the package

qalmakka 1647 days ago

This is quite bad. Relying on uppercase/lowercase equivalences in a Unicode world is by definition a code smell, no matter if they force the comparison to be based on ASCII. This whole ordeal causes any sort of issue if Unicode letters are allowed, because they will pass through `toLowerAscii` untouched and it is bound to cause confusion or to force people to avoid using Unicode in identifiers altogether.

Case insensitivity is a Western-only concept that should die ASAP, it's immensely complicated to pull off right and it opens a massive can of worms that makes no sense (see the Turkey Test for more about this).

beagle3 1646 days ago

Keywords are in English, and 99% of identifiers in all code I met are too, even when comments are in French or russian.

Pascal and old basic (among other languages) have been case insensitive for decades, and that has not been a problem.

In UI, you are by all means correct. But code is a formal language that happens to be expressed in Latin letters. APL is the only real language that doesn’t impose a western character set.

qalmakka 1644 days ago

> Pascal and old basic (among other languages) have been case insensitive for decades

... because back then Unicode was still a pipe dream in the mind of some visionary. Everything used 8 bit encoding, and everyone assumed ASCII or at least something compatible with the 7 bit subset of ASCII.

Nowadays files are formatted in UTF-8, and most modern languages actually fully support UTF-8 identifiers. Nim itself supports UTF-8 "letters" in identifiers, and what is "upper case" or "lower case" 100% depends from the current locale. Restricting your case normalization logic to ASCII is __really bad__, because it basically means that non-Latin letters in identifiers won't be normalized, with possibly unexpected consequences.

> APL is the only real language that doesn’t impose a western character set.

- Rust uses UTF-8: https://doc.rust-lang.org/reference/identifiers.html

- Go allows any Unicode letter in identifiers: https://go.dev/ref/spec#Identifiers

- Swift is also famous for allowing you to use emojis in identifiers.

- Python supports non-ASCII identifiers: https://www.python.org/dev/peps/pep-3131/

And the list goes on. Even C++ can optionally support Unicode in identifiers (for instance, Clang and GCC do indeed support things like `constexpr auto 黒 { "lol" };`).

beagle3 1644 days ago

> ... because back then Unicode was still a pipe dream in the mind of some visionary. Everything used 8 bit encoding, and everyone assumed ASCII or at least something compatible with the 7 bit subset of ASCII.

They could still have been case sensitive (C was), so I don't understand how that's relevant to the idea that "case insensitivity is a problem".

> Rust, Go, Swift, Python

All of these languages impose ASCII for their keywords and directives. They allow you to use other characters for identifiers, but impose ascii in everything that has pre-defined semantics. Original APL is the only "real"/practical language that I'm aware of that gave up the "western centric view" of the world to the point that it doesn't have a single English keyword. (Brianfuck, etc. exist as well, but ....)

And they all impose a left-to-right reading order, which is just as western-centric. Arabic/Farsi/Hebrew go right-to-left, and there are languages that can also go top-to-bottom.

I think the outrage about "western centrism" is misguided. This is a formal system, and just like math, it reflects some history by using latin letters and left-to-right for the predefined symbols, and even preferred use of latin characters in identifiers.

> Nim itself supports UTF-8 "letters" in identifiers, and what is "upper case" or "lower case" 100% depends from the current locale.

If that's true, that may be a problem. I'll look into that, thanks for pointing out - from memory, Nim only folds the lower 7-bit by a 32 difference in ascii code, so it is well defined regardless of locale, but I'll check.

The whole idea of utf-8 in identifiers is a minefield, whether you fold case or not; e.g:

"Εхаｍрⅼе" and "Example" have no single letter in common (I chose them that way using[0]) and no language that allows utf-8 identifiers is going to warn you about that.

[0] https://www.irongeek.com/-attack-generator.php

qalmakka 1643 days ago

> They could still have been case sensitive (C was), so I don't understand how that's relevant to the idea that "case insensitivity is a problem".

The point here is that case insensitivity is only a viable option if you severely limit the encoding allowed in whatever you are using - be it a programming language, filesystem, etc. If the encoding of your files is something is basically akin to ASCII or ISO-whatever (which was what BASIC and Pascal used back in the day) then case insensitivity is trivial and safe.

This whole thing breaks apart as soon as you enter a Unicode world and start accepting identifiers containing more than ASCII, and then the whole concept of "case insensitive" becomes obsolete and outright wrong.

The Unicode equivalent of "case insensitive" is Normalization [0] and it's a big heck of a minefield because it is defined depending on the locale in use. For instance, "FILE.TXT" and "file.txt" are to be considered equivalent under en_US, but not under tr_TR, where the lower case version of "FILE.TXT" is "fıle.txt" and the upper case version of "file.txt" is "FİLE.TXT". This means that normalizing strings can cause to unexpected results depending on the locale, which is especially problematic with filesystems (where a path may exist or not depending on the locale).

> Nim only folds the lower 7-bit by a 32 difference in ascii code, so it is well defined regardless of locale

yes, it is well defined but allowing the entirety of the Unicode letters also means that identifiers may contain glyphs from alphabets that have separate cases, chiefly Greek and Russian, or even accented letters such as `è` or `ö`. Case insensitivity instead of proper normalization makes them potentially confusing, and quite breaks the intent behind allowing Unicode identifiers by making non-US locales second class citizens.

IMHO it is arguably very confusing to non-English speakers that 'mela' is equivalent to 'MELA' but 'tè' isn't equivalent to 'TÈ' while 'Tè' is. It basically means you have to remember what letters are ASCII and what are not, which makes the whole "case insensitive" a potential source of confusion.

I think it is safe to say that in 2021 case insensitivity is an obsolete concept and an obstacle to proper internationalization. Case insensitivity only really works on legacy encodings and with the basic Latin alphabet, and you can rest assured it will be almost always improperly implemented anyway.

[0] https://en.wikipedia.org/wiki/Unicode_equivalence

beagle3 1641 days ago

I understand your point, but still disagree with it. As I see it, the real problem is unicode identifiers, as I demonstrated with "Example" above, and as follows from your demonstrations as well. Unlike the thousands of unicode characters, which are unlikely to be all familiar to any single person, and whose meaning and "conjugation" (casing, conjugation, pre-joined pairs, precomposed versions, etc) are different in different cultures -

The ascii case folding, as employed by Nim and Pascal refers to 26 specific well known characters. It's a non-issue.

jb1991 1647 days ago

Indeed I too find this off-putting. I understand the rationale behind it, but developers are used to extreme attention to detail, and this discards an important aspect of detail in the important area of naming.

kgeist 1647 days ago

>It allows programmers to mostly use their own preferred spelling style, be it humpStyle or snake_style

It's interesting that some languages, like Go, are specifically designed to avoid it (there's a built-in formatting tool for a single coding convention), while in other languages having a zoo of different styles is viewed as a great idea and the language is designed for it. Or is it about linking with existing C libraries? In that case, I'd introduce some sort of name mapping via attributes/annotations as a whitelist, instead of allowing this behavior by default.

cobby 1647 days ago

I personally still use Python because I miss list and dict comprehensions.

I know there is a `collect` macro in `sugar` module but it is nowhere close to the python comprehensions. The code is too verbose and basically is just the same multiline for loop :-(

budafish 1647 days ago

(on mobile) You could use Nimssequence iterators

[x.name for x in someList if x.age > 5]

Is the same as:

x.filterIt(it.age > 5).mapIt(it.name)

In a way it reads better.