Hacker News new | ask | show | jobs
by ikawe 1404 days ago
I should say though, I think your proposal makes sense as a way to minimize the impact of any individual checkpoint!

Though you still have to occasionally support arbitrarily large WAL files to an extent because there’s no limit to how big any one write transaction is.

1 comments

You correctly identified that with arbitrarily long transactions you can't GC the WAL that it is referring to so you will always have potentially arbitarily large WAL files. My suggestion was indeed not to prevent that (as you can't) but to minimize the impact of each individual WAL GC as you can't even GC data that is in the WAL that is being referenced but not relevant anymore.

I see it as a kind of reference counting for WAL data. A WAL file can only be deleted once the sum of all refcounts of transactions in it are zero (synced to main DB or aborted). So minimal GC impact would be if every transaction had its own WAL file but that brings overhead itself so a sensible tradeoff might be to find a middle ground where you split WAL files once they reach a certain size (configurable).