|
|
|
|
|
by seedless-sensat
899 days ago
|
|
I think this con is very real: > Related tables and indexes are not necessarily stored together, meaning typical operations such as joins and evaluating foreign keys or even simple index lookups might incur an excessive number of internal network hops. The relatively strong transactional guarantees that involve additional locks and coordination can also become a drag on performance. You handwaved this away saying you can just store an entire table on a single node, but that defeats many of the benefits of these sharded SQL databases. Edit: Also, before attacking the author's biases, it seems fair to disclose you appear to work at Yugabyte |
|
- true Index Only Scan. PostgreSQL doesn't store the MVCC visibility in indexes and have to look at the table even in case of Index Only Scan. YugabyteDB has a different implementation of MVCC with no bloat, no vacuum and true Index Only Scan. Here is an example: https://dev.to/yugabyte/boosts-secondary-index-queries-with-... This is also used for reference table (duplicate covering indexes in each regions)
- Batching reads and writes. It is not a problem to add 10ms because you join two tables or check a foreign key. What would be problematic is doing that for each rows. YugabyteDB batches the read/write operations as much as possible. Here are two examples: https://dev.to/franckpachot/series/25365
- Pushdowns to avoid sending rows that are discarded later. Each node can apply PostgreSQL expressions to offload filtering to the storage nodes. Examples: https://dev.to/yugabyte/yugabytedb-predicate-push-down-pbb
- Loose index scan. With YugabyteDB LSM-Tree indexes, one index scan can read multiple ranges, which avoids multiple roundtrips. An example: https://dev.to/yugabyte/select-distinct-pushdown-to-do-a-loo...
- Locality of transaction table. If a transaction touches to only one node, or zone, or region, a local transaction table is used, and is promoted to the right level depending on what the transaction reads and writes.
Most of the times when I've seen people asking to store tables together, it was premature optimization, based on opinions rather than facts. When they try (with the right indexes of course) they appreciate that the distribution is an implementation detail that the application doesn't have to know. Of course, there are more and more optimizations in each release. If you have a PostgreSQL application and see low performance, please open a git issue.
I'm also working for Yugabyte as Developer Advocate. I don't always feel the need to precise it as I'm writing about facts, not marketing opinions, and who pays my salary has no influence on the response time I see in execution plans ;)