| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rurban 558 days ago
	Almost nobody supports string search and comparison API functions for unicode. The unicode security tables for unicode identifiers are hopelessly broken. Not even the simplest tools, like grep do support unicode yet. This didnt happen in the last 15 years, even if there are patches and libs.

1 comments

ygra 557 days ago

Wasn't one way to make grep faster setting LANG=C to avoid using language-aware string comparison? If so, shouldn't Unicode be supported by default or what would, say, de_DE.UTF-8 actually compare to make it slower?

link

rurban 555 days ago

yes it should. but the libunistring variant was too slow. And since LANG is run-time evaluated you cannot really provide pre-compiled, better search patterns.

sometime I'll come up with pre-computed optimized tables, but no time.

link

JetSpiegel 552 days ago

It's just a grep bug, ripgrep is fast and supports proper regex.

link