Hacker News new | ask | show | jobs
by wizzwizz4 1684 days ago
Those Unicode characters aren't just there for show. They're part of real scripts that real people use; it would be annoying for people using those scripts.
4 comments

I'm fairly sure this could be arranged for. As in, if there's too many of them belonging to the character set of a particular language, then it's very likely that it's simply a text in that language. But random characters in the middle of ASCII identifiers are probably not something that you want.
Yeah I'm not opposed to adding highlighting to them, and we are investigating how to do it, but it was less clear-cut than the bidi characters (which are totally invisible when rendered). I think we'll want to make it a bit more configurable and probably a separate option to the one which highlights the bidi characters.
Exactly. When we were adding support for non-ASCII identifiers to Rust, and thinking about homoglyphs and confusable characters, we needed to evaluate the tradeoffs between catching such characters and inconveniencing the speakers of various languages who want to write Rust in their language.
This type of attack isn't new. I can't recall the names but there are afair multiple C/C++ coding standards that limit everything to ASCII to avoid precisely this attack, but also others with visually similar but nonequivalent names.
Yes, and they should be in well annotated/marked string/data sections, not in logic code.