Hacker News new | ask | show | jobs
by rurban 558 days ago
Almost nobody supports string search and comparison API functions for unicode. The unicode security tables for unicode identifiers are hopelessly broken.

Not even the simplest tools, like grep do support unicode yet. This didnt happen in the last 15 years, even if there are patches and libs.

1 comments

Wasn't one way to make grep faster setting LANG=C to avoid using language-aware string comparison? If so, shouldn't Unicode be supported by default or what would, say, de_DE.UTF-8 actually compare to make it slower?
yes it should. but the libunistring variant was too slow. And since LANG is run-time evaluated you cannot really provide pre-compiled, better search patterns.

sometime I'll come up with pre-computed optimized tables, but no time.

It's just a grep bug, ripgrep is fast and supports proper regex.