Hacker News new | ask | show | jobs
by Epskampie 2234 days ago
I just tried postgres yesterday for the first time after years of being on mysql only.

I immediately got some warnings in pgadmin that some tables need to be vacuumed. Is this really necessary? Can i just vacuum everything at night on a cronjob?

5 comments

In general you don't need to vacuum manually, this happens in the background automatically. If you have a lot of activity on the DB, or fit into certain corner cases you have to tune the autovacuum parameters. The default is generally getting better with each Postgres release.

And if you want to play with the Autovacuum settings, you really should read up on this in detail. There are some counter intuitive aspects to it, e.g. in many cases if you have problems with autovacuuming you should vacuum more, not less.

One exception is after you restore a backup, you might want to run "VACUUM ANALYZE" so that you get some good statistics immediately. That also will happen in the background later, but until then you can get weird query plans.

Normally you only need to adjust autovacuum settings per table if your tables have a very large number of rows:

https://lob.com/blog/supercharge-your-postgresql-performance...

Regarding the last point, I wonder if that will be necessary after the changes in the OP. Example 2 might cover this.
Is auto-vacuum enabled by default?
Just ANALYZE also works
I think that's a false warning from pgAdmin.

In general, if you don't have extremely high UPDATE rates (thousands per second) you don't really need to care about vacuum these days (assuming you correctly end all your transactions and don't have any sessions in "idle in transaction" for a long time)

Adding to that comment: You can tweak the times where automatic VACUUMing happens to a certain degree,

https://www.postgresql.org/docs/12/runtime-config-autovacuum...

> Can i just vacuum everything at night on a cronjob?

It might not be technically pure, but you absolutely can do this, and if your load is really asymmetrical and much quieter at night, then it may well make sense. It may also be unnecessary, as there is an auto-vacuum feature.

You don't generally ever need to manually trigger them, they happen automatically when needed
If you do stuff like drop columns, you totally need to do a vacuum full or you are never getting back disk space...
Actually, you should avoid using vacuum full at all. What that operation does is creating a new table with contents of an old table then deleting the old one. This operation is also requesting an exclusive lock, so any write query will cause lockup.

Another problem is that vacuum full compacts the database, it reduces database size on disk, but doesn't mean it makes database faster, often it might actually reduce performance.

Vacuum full might be useful in scenarios where you want to reclaim space, because autovacuum was not tuned correctly and tables get out of hand, if you need to run vacuum full frequently you should tune autovacuum to run more frequently.

Getting back space is real important. Otherwise you have runaway table growth that never ends.

Not saying do it all the time, but it is important to use surgically. You can easily calculate wasted table space

https://www.citusdata.com/blog/2017/10/20/monitoring-your-bl...

When it gets to be too big, it's a good time to do a vacuum full. I normally do it a few times a year on a few giant tables. Yes it locks up a table for a few minutes, but it is a big win and totally worth it.

I'm not denying that VACUUM FULL is useful when things are out of hand. What I'm trying to say is that you probably don't have autovacuum tuned to run often enough (this is also what your link suggests)
Does it affect performance? I might not want VACUUM to trigger during high workload.
Yes it does, it makes things faster ;)

A common misconception about (auto) vacuum is, that you need to turn it off, to avoid performance problems.

The truth is: you need to make it more aggressive (= run it more frequently) if you have problems with the (additional) I/O load it generates.

Obviously I meant to ask if performance is affected during VACUUM.

@pg-gadfly's link answered it for me:

> VACUUM creates a substantial amount of I/O traffic, which can cause poor performance for other active sessions.

Since a vacuum operation uses processor cycles, it's impossible to say no.

Their wiki is pretty amazing source for postgres's operation: https://www.postgresql.org/docs/12/routine-vacuuming.html

I think you can check n_dead_tup, n_mod_since_analyze from pg_stat_all_tables, and see if the numbers are high, then it would be a good idea to perform vacuum. I found some query on the net which might help you. See https://gist.github.com/lesovsky/b0e4033380b0992789118ab35c4... I think you might need to change this depending on the version of Postgres.