Hacker News new | ask | show | jobs
by fauigerzigerk 3232 days ago
But do these "already working" collations include accent-insensitive variants? I didn't think so, and I don't think they are coming in PostgreSQL 10 either. Do you have other information on this?
2 comments

They do. The problem is that equality is still strict binary equality for text, for obscure technical reasons. So, while you can get extremely flexible customizations to collation with icu [1], you can't have case insensitive equality, or equality that ignores accents. This is recognized as a problem, and is probably going to be fixed in v11.

[1] https://www.postgresql.org/message-id/CAH2-Wz=bcgmk97YaZ3rs4...

> The problem is that equality is still strict binary equality for text, for obscure technical reasons

AKA performance.

My misunderstanding seemed to be that "accent-insensitive" turns out to be a precise technical term from MySQL that says "just sort UTF-8 using the American locale and collation, but pretend accented characters are equal to their non-accented versions" which is not a standard collation order in any locale.
I think this is incorrect. In Mysql utf8_spanish_ci and utf8_swedish_ci have different accent rules, for example.
But then what would be the case for using accent-ignoring spanish collation rules, rather than the default spanish collation rules that sort accented chars according to normal spanish sorting? This sounds strictly inferior in any human facing application.