|
|
|
|
|
by pgaddict
3511 days ago
|
|
I find the recommendation to leave the cost limit alone strange. The problem is this is a global limit, shared by all autovacuum workers. The default (200) means all workers combined should not do more than 8 MB/s reads or 4 MB/s writes, which on current hardware are rather low limits. Increasing the number of workers is good, but the total autovacuum throughput does not change - there will be more workers but they'll go slower. Also a note regarding the delay - the resolution really depends on hardware. Some timers have only 10ms resolution, for example. |
|
Are you sure that the limit is shared by all workers?
> When the accumulated cost reaches a limit (specified by vacuum_cost_limit), the process performing the operation will sleep for a short period of time, as specified by vacuum_cost_delay. Then it will reset the counter and continue execution.
-- https://www.postgresql.org/docs/9.6/static/runtime-config-re...
Since every worker is it's own process, I assume that the limit applies to each worker, and overall vacuum throughput increases when using more workers.
That being said, your observation about the autovacuum_vacuum_cost_limit and autovacuum_vacuum_cost_delay being potentially too low may still apply.
Your throughput estimate for 8 MB/s (I think per worker) only applies for the vacuum_cost_page_hit (default 1) case. For vacuum_cost_page_miss (default 10), the throughput would only be 0.8 MB/s.
If my understanding of the way a page miss is defined is correct, this will be the common case unless one has increased shared_buffers from it's default of 128 MB to something much larger. That's b/c pg will assume a "page miss" if the page is not in its shared buffer, even if it's in the host OS page cache.
I might be wrong about the caches, so I'd love for somebody with more insights to confirm/reject these assumptions.
Edit: Seems like your 8 MB/s estimate is based on the vacuum_cost_page_miss case. But still, that's pretty low :)