Hacker News new | ask | show | jobs
by ignoramous 3553 days ago
A tangential question:

Everyone speaks about InnoDB and how performant and reliable it is... and multiple firms even use it as a KV-store (Uber/Pinterest/AWS) bypassing MySQL entirely. I have never heard much about storage engines in Postgres, why could this be so?

Wikipedia has a (stub) article on InnoDB, but nothing on Postgres' storage engines... just wondering why that is.

9 comments

The PG storage engine is not particularly awesome. It is basically COW (with exceptions) and compaction (called vacuum) has been quite painful for a long time. Every release it is supposedly fixed, but people keep complaining. This not to say PG sucks, their optimizer knows far more about their data than InnoDB and PG can perform far more types of execution plans.

We (Pinterest, I wrote most of the MySQL automation) make heavy use of MySQL replication which is vastly simpler to manage than PG. All queries still flow through SQL and unlike PG, we can force whatever execution plan we need. We do lots of PK lookups, and InnoDB is really good at that. In InnoDB all the data is stored in the PK while in PG it is just a pointer.

>>In InnoDB all the data is stored in the PK while in PG it is just a pointer.

This is just a consequence of the PK being a clustered index in InnoDB which has both pros and cons. One of the big cons is that all of the columns of the PK are implicitly added to every secondary index as the row identifier. That isn't a big problem if your PK is a single column int, but if it's multiple columns, that often results in unnecessary bloat in your secondary indexes. Ideally (as in, dare I say, MS SQL Server), you'd have the option of a clustered or non-clustered PK for your table so you could choose the optimal index structure for your workload on a per-table basis.

If you don't want a clustered index in InnoDB you can define the primary key as an auto incrementing uint.
Yes, you can, but it doesn't change the fact that you still have a clustered index (an index organized table), which is great for PK lookups, but bad if you do a secondary indexes lookup (because you need to lookup through 2 B-trees instead of 1). There is real, and well-known, tradeoff here.
You missed the point of my post. You are going to have one of the two issues, either looking through two index or indexes including the a large PK. At least with InnoDB you can make the choice. The strategy I suggested gets you the desired outcome of not including a large PK in all secondary indexes.
> The strategy I suggested gets you the desired outcome of not including a large PK in all secondary indexes.

For an application in which most queries need a secondary index lookup, using heap organized tables is more efficient because the database needs to traverse only one B-tree (for the secondary index) that gives the physical position of the row in the heap. When using index organized tables, the database needs to traverse 2 B-trees (the secondary index first, then the primary index). Making the primary key short by using an auto incrementing integer helps, but doesn't remove this overhead.

The other part of the tradeoff is that inserts and many other write operations are less expensive in heap tables. A Big Table in InnoDB, measured in "when do I start having to spend a lot of time troubleshooting this table's performance" is about 1% the size of a Big Table in Postgres. TokuDB was introduced for MySQL for a reason.

Heap vs. Index organization is a classic tradeoff of database design.

Now, if you're saying "it would be really nice if Postgres had the option of index-organized tables" I'd agree with you. I'd love to have that, as an option.

> All queries still flow through SQL and unlike PG, we can force whatever execution plan we need. We do lots of PK lookups, and InnoDB is really good at that.

Being able to force the execution plan is more useful in MySQL than PostgreSQL because MySQL's optimizer is not very good at planning queries.

If you do a lot of PK lookups, then you don't need to force the execution plan.

Postgres doesn't provide a store engine API. You can achieve some of those goals by using the foreign data wrapper (fdw) api.

https://www.postgresql.org/docs/9.5/static/postgres-fdw.html

For example, Citus Data provides a column store for Postgres via the fdw api.

https://github.com/citusdata/cstore_fdw

PostgreSQL, for better or worse, doesn't have pluggable storage engines. There's some discussion on their dev mailing list about the possibility of adding that capability in PG10, though: http://postgresql.nabble.com/Pluggable-storage-td5916322.htm...

Some earlier (2013) discussion on the same topic: https://wiki.postgresql.org/wiki/2013UnconfPluggableStorage

tl;dr: Foreign Data Wrappers (FDW) provide 99% of the same functionality, but with even more flexibility incl smart query optimizer support.
That is far too optimistic, IMHO.

FDW are a great way to access external data sources, but it lacks proper support for visibility and transactions, and so on. Also, the FDW API follows the "tuple at a time" execution model, which prevents a lot of optimizations in the upper part of the stact (vectorized execution etc.).

There are several products using FDWs to change storage, but I'd call it a misuse of a feature designed for very different purpose.

IMHO it's hardly a way forward without significant changes/improvements (which may happen, I don't know).

The discussion is moving on lately with a refactoring to create an access method handler for tables: https://www.postgresql.org/message-id/d7e41e76-e565-8bc0-4e9...

Here is as well some documentation on the matter: https://wiki.postgresql.org/wiki/HeapamRefactoring

Having "CREATE ACCESS METHOD [...] ON STORAGE|TABLE" to create a custom access method, or storage engine, and extending CREATE TABLE to be able to pass a storage method with the table definition could become a quite powerful combination. The main challenge is to come up with an interface solid enough to be able to handle problems related to MVCC, like VACUUM cleanup.

> I have never heard much about storage engines in Postgres, why could this be so?

Because PG isn't designed around pluggable storage engines, so its not really as practical to take a storage engine out and use it separately, and doesn't make much sense to talk about the storage engine separately from the whole system.

Postgres has one storage engine: Postgres. It doesn't have a pluggable interface like MySQL does.
FWIW, these solutions rarely bypass MySQL entirely or at all. Although there are ways to access InnoDB without making SQL queries (Memcached API; Handler Socket), the MySQL server is still involved. It just skips the normal protocol, auth, SQL parsing, etc.

Even then, there aren't a lot of published cases of people using these alternative access methods at scale yet. AFAIK, all of the large kv use-cases you've mentioned still go through traditional SQL queries. Despite the overhead of SQL parsing, it provides more control and visibility. The ecosystem around alternative access methods isn't nearly as mature.

>Everyone speaks about InnoDB and how performant and reliable it is

What? Everyone speaks about how unreliable it is and how many major data corruption problems it has.

>I have never heard much about storage engines in Postgres, why could this be so?

Because they didn't take the approach of having multiple storage engines, they just made one that works and is not easily removed from the database.

You may be confusing InnoDB with MyISAM (which is prone to corruption, especially upon crashes) or with running MySQL without a strict SQL mode (which causes bad things like silent truncation of overflowing values).

InnoDB is, and always has been, a very reliable and durable storage engine with solid performance characteristics.

No, I am referring to innodb, which has a number of known reliability problems which are "wontfix".
Care to share examples?
Postgres does not have pluggable storage engines - there is essentially just one way to store things.