| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by berns 3232 days ago
	One thing that Postgres lacks is accent insensitive collations. Having clients with databases in Spanish, this is one of the reasons I wouldn't consider migrating to Postgres. I know I can use the unaccent extension, but I consider it a poor substitute to proper collations. I guess this isn't a problem for most people because it's never mentioned.

4 comments

claar 3232 days ago

It appears work is active (May 2017) in this area, but accent-insensitive collations are still a ways off: https://blog.2ndquadrant.com/icu-support-postgresql-10/

edit: fulafel corrects me below that you can get accent-insensitive collations working using PostgreSQL 10's new ICU collation support. Note that PostgreSQL 10 is currently pre-release.

link

fulafel 3232 days ago

That post says that they currently work, but there is a new os-independent collation method coming.

Edit re your edit: the already working collations are in PG 9.x - the OS independent collations are coming in 10.x. (Hence the post title, "More robust collations with ICU support in PostgreSQL 10")

link

fauigerzigerk 3232 days ago

But do these "already working" collations include accent-insensitive variants? I didn't think so, and I don't think they are coming in PostgreSQL 10 either. Do you have other information on this?

link

petergeoghegan 3232 days ago

They do. The problem is that equality is still strict binary equality for text, for obscure technical reasons. So, while you can get extremely flexible customizations to collation with icu [1], you can't have case insensitive equality, or equality that ignores accents. This is recognized as a problem, and is probably going to be fixed in v11.

[1] https://www.postgresql.org/message-id/CAH2-Wz=bcgmk97YaZ3rs4...

link

anarazel 3232 days ago

> The problem is that equality is still strict binary equality for text, for obscure technical reasons

AKA performance.

link

fulafel 3232 days ago

My misunderstanding seemed to be that "accent-insensitive" turns out to be a precise technical term from MySQL that says "just sort UTF-8 using the American locale and collation, but pretend accented characters are equal to their non-accented versions" which is not a standard collation order in any locale.

link

berns 3231 days ago

I think this is incorrect. In Mysql utf8_spanish_ci and utf8_swedish_ci have different accent rules, for example.

link

fulafel 3231 days ago

But then what would be the case for using accent-ignoring spanish collation rules, rather than the default spanish collation rules that sort accented chars according to normal spanish sorting? This sounds strictly inferior in any human facing application.

link

fulafel 3232 days ago

The docs say that locales and LC_COLLATE work, are you sure?

link

bmn__ 3230 days ago

> One thing that Postgres lacks is accent insensitive collations.

So what if the functionality is not in core, but in an extension? You make it sound as if it does not exist at all and all we can do is gnash our teeth in frustration.

http://www.public-software-group.org/pg_collkey

first release in 2006 for PostgreSQL 8

link

karmakaze 3229 days ago

Why not just use a functional index? I'd expect it to have much better performance.

link