| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by keithnz 49 days ago
	I don't think it really matters in terms of their question though, given MySql on the same specs doesn't have the problem and postgres does. Quite clearly it has something to do with indexes and what is the wall postgres is running into that causes the drop off on quite low amounts of rows. If the answer is just get more RAM, it kind of implies postgres is not really that scalable. Especially if the drop off is proportional to the number of rows.

1 comments

petergeoghegan 49 days ago

Why are you using hash indexes? They're much less widely used than standard B-Tree indexes. The bucket split code likely isn't very scalable [1].

I suggest testing the same workload with your existing hash indexes replaced with equivalent B-Trees.

[1] https://github.com/postgres/postgres/blob/master/src/backend...

link

zadikian 49 days ago

Last time I almost used a hash index in Postgres, I learned it was an incomplete feature and not crash-safe yet. This was v9.3? At that same time, MySQL had them and they were ok to use.

Later that got fixed, but I haven't tried again since, just been using btree because it seemed like Postgres favored that and it has theoretical advantages too.

link

daneel_w 48 days ago

They are fully stable and perform very well in Postgres today. There are some caveats, but they don't result in any sort of hiccups or unpredictable behavior.

link

zadikian 47 days ago

Yeah that's what I've read too. Just haven't gotten around to trying them.

You've probably already read the Postgres docs on hash indexes, but just in case, it says "hash indexes may not be suitable for tables with rapidly increasing number of rows." I agree with the other commenter that it's worth at least trying without them if you haven't already, even though you're already VACUUMing.

link

daneel_w 48 days ago

> Why are you using hash indexes?

On some data, for certain ways of using and querying that data, it makes a lot more sense than a b-tree. When we use hash indices for some of the columns in our test just to see how Postgres will perform, and run our test towards a quarter of a billion rows where Postgres' 32-bit hash indices have a relatively high chance of colliding, querying these hash indices and returning the single requested row is still instantaneous.

link