Hacker News new | ask | show | jobs
by giovannibonetti 55 days ago
With some extra admin work, you can greatly increase your insert throughput, as long as the table load is comprised mostly of inserts: 1. Partition your table by range of a monotonic ID or timestamp. Notice the primary key will have to contain this column. A BIGINT id column should work fine; 2. Remove all the other indexes from the partitioned table. Add them to all the partitions, except the latest one. This way, the latest one can endure a tough write load, while the other ones work fine for reads; 3. Create an admin routine (perhaps with pg_cron) to create a new partition whenever the newest one is getting close to the limit. When the load moves to the newer partition, add indexes concurrently to the old one; 4. You'll notice the newest partition will the optimized for writes but not reads. You can offset some of that by replacing BTREE secondary indexes with BRIN [1], particularly the one with bloom operator (not to be confused with Postgres Bloom regular indexes [2]). BRIN is a family of indexes more optimized for writes than reads. If the partition is not too large, it shouldn't be too bad to read from it. 5. Later you can merge partitions to avoid having too many of them. Postgres has commands for that, but I think they lock the whole table, so a safer bet is to copy small partitions into a new larger one and swap them manually.

[1] https://www.postgresql.org/docs/current/brin.html [2] https://www.postgresql.org/docs/current/bloom.html

3 comments

These are good suggestions but I'm apprehensive they might come back and say they have 64 GB (or less) of RAM or they are using PostgreSQL RDS on AWS or something.

I asked them for specifics.

I don't think it really matters in terms of their question though, given MySql on the same specs doesn't have the problem and postgres does. Quite clearly it has something to do with indexes and what is the wall postgres is running into that causes the drop off on quite low amounts of rows. If the answer is just get more RAM, it kind of implies postgres is not really that scalable. Especially if the drop off is proportional to the number of rows.
Why are you using hash indexes? They're much less widely used than standard B-Tree indexes. The bucket split code likely isn't very scalable [1].

I suggest testing the same workload with your existing hash indexes replaced with equivalent B-Trees.

[1] https://github.com/postgres/postgres/blob/master/src/backend...

Last time I almost used a hash index in Postgres, I learned it was an incomplete feature and not crash-safe yet. This was v9.3? At that same time, MySQL had them and they were ok to use.

Later that got fixed, but I haven't tried again since, just been using btree because it seemed like Postgres favored that and it has theoretical advantages too.

They are fully stable and perform very well in Postgres today. There are some caveats, but they don't result in any sort of hiccups or unpredictable behavior.
Yeah that's what I've read too. Just haven't gotten around to trying them.

You've probably already read the Postgres docs on hash indexes, but just in case, it says "hash indexes may not be suitable for tables with rapidly increasing number of rows." I agree with the other commenter that it's worth at least trying without them if you haven't already, even though you're already VACUUMing.

> Why are you using hash indexes?

On some data, for certain ways of using and querying that data, it makes a lot more sense than a b-tree. When we use hash indices for some of the columns in our test just to see how Postgres will perform, and run our test towards a quarter of a billion rows where Postgres' 32-bit hash indices have a relatively high chance of colliding, querying these hash indices and returning the single requested row is still instantaneous.

I like this strategy a lot, but the performance of read queries suffer if they span partitions, correct?

The issue I'm facing is a very large table, that is both write and read heavy, and the reads do not fall into a specific range of values for any particular column, so I don't think partitioning is an option.

Yes, partitioning will decrease a bit the read performance of queries not correlated with the partition key. That's why you need to periodically merge smaller partitions, so that you can keep the overall partition count bounded.

It is a lot of admin work, but if you really need to scale up Postgres write throughput, I don't see many other options without increasing hardware costs.

I assume you have already picked the low-hanging fruit discussed in the neighboring comments - batch writes, make sure you are using COPY instead of INSERT, tune Postgres parameters adequately and use the fastest disk you can grab for the WAL.

Partitioning is not all that expensive. It is definitely worth testing for your specific workload. We use TimescaleDB, which relies heavily on postgres partitions, have a bit under 100 million rows in our active set (last 90 days), across 120 partitions (device*time), and it works nicely. Over 100 partitions is probably a bit many for this workload, but since it works OK we have not changed it.
Thanks for sharing this "tactic". Will look into it.