Hacker News new | ask | show | jobs
by keithnz 49 days ago
I don't think it really matters in terms of their question though, given MySql on the same specs doesn't have the problem and postgres does. Quite clearly it has something to do with indexes and what is the wall postgres is running into that causes the drop off on quite low amounts of rows. If the answer is just get more RAM, it kind of implies postgres is not really that scalable. Especially if the drop off is proportional to the number of rows.
1 comments

Why are you using hash indexes? They're much less widely used than standard B-Tree indexes. The bucket split code likely isn't very scalable [1].

I suggest testing the same workload with your existing hash indexes replaced with equivalent B-Trees.

[1] https://github.com/postgres/postgres/blob/master/src/backend...

Last time I almost used a hash index in Postgres, I learned it was an incomplete feature and not crash-safe yet. This was v9.3? At that same time, MySQL had them and they were ok to use.

Later that got fixed, but I haven't tried again since, just been using btree because it seemed like Postgres favored that and it has theoretical advantages too.

They are fully stable and perform very well in Postgres today. There are some caveats, but they don't result in any sort of hiccups or unpredictable behavior.
Yeah that's what I've read too. Just haven't gotten around to trying them.

You've probably already read the Postgres docs on hash indexes, but just in case, it says "hash indexes may not be suitable for tables with rapidly increasing number of rows." I agree with the other commenter that it's worth at least trying without them if you haven't already, even though you're already VACUUMing.

> Why are you using hash indexes?

On some data, for certain ways of using and querying that data, it makes a lot more sense than a b-tree. When we use hash indices for some of the columns in our test just to see how Postgres will perform, and run our test towards a quarter of a billion rows where Postgres' 32-bit hash indices have a relatively high chance of colliding, querying these hash indices and returning the single requested row is still instantaneous.