Hacker News new | ask | show | jobs
by joana035 1895 days ago
My only complain about PostgreSQL is COUNT() being quite slow compared with MySQL.

Everything else is pretty good, MySQL has compressed tables, but in PostgreSQL the same amount of data already takes less space by default.

Pghero/pg_stat_statements are also very handy.

But "hate"? No, no hate here :)

2 comments

You may be interested in this technique, or some of the others in the article: https://www.citusdata.com/blog/2016/10/12/count-performance/...

EDIT: I'm also curious what version of Postgres you've experienced this on? Sounds like there may have been improvements to COUNT (DISTINCT in v11+

If you're not making a lot of writes, that may be good approach since AFAIK the counts will only be updated after the table is ANALYZE'd
just so you're aware, COUNT() on mysql can lie.

Basically it's fetching metadata on the table, which can in some cases not be updated (yet), where as in pg it actually counts entries in the index.

Does it really count entries in the index? For example, in Firebird, it has to fetch rows because of row versioning (which happens in data pages, not in indices), and since PostgreSQL does versioning, too, I would have assumed that it's subject to the same limitation if it wants to return a correct answer for the current transaction.
Index Only Scans are a thing in PostgreSQL, however, they may still need to visit the heap if the visibility map bit for the heap page indicates that the not all tuples on the heap page are visible to all transactions. When a high percentage of pages are marked as "allvisible" then Index Only Scans can give a good boost to performance.
So this "visibility map" is a little bit like Netfrastructure/Falcon in-memory versioning, then? I see.
The visibility map is just 1 bit per page. Vacuum sets these bits to "1" when it sees that all tuples on the page are visible to all transactions. i.e. all tuple xmins are <= the oldest running transaction and none of the tuples have not been marked as deleted by any transaction yet. The visibility map bit will be unset when a new tuple is added to the page or an existing one is "deleted" or more accurately, has the xmax set with the deleting transaction's ID.

The visibility map is stored on-disk as a different fork of the filenode for the table. Two bits are actually stored per page, 1 for visibility and another to mark if the page only contains only frozen tuples. The frozen bit helps reduce the cost of vacuuming the table for transaction wraparound, which is also mentioned in the blog post.

The query planner does not count these bits to determine if it should perform an Index Only Scan vs an Index Scan. An approximate value is stored in pg_class.relallvisible.

Ha, snap. Makes sense that it looks at the dirtiness of the visibility_map while planning.
I believe it can speed it up by using index only scans along with the visibility_map which effectively tells it which entries are “current” in more broad strokes.
Doesn't this happens only when using sql_calc_found_rows?
That's only for MyISAM which sees very little use today. The InnoDB engine on MySQL does a full row count and is also relatively slow.