Hacker News new | ask | show | jobs
by sbellware 2376 days ago
Yes, we disagree :)

The response given by @ethangarofolo does a good job of addressing the main points.

I would say that the slide deck linked is a bit sneaky. Sooner or later, in anything built with a computer, there's going to be polling. Higher-level libraries abstract the polling so that it appears as push (rather than pull) to the application developer. But under the hood, there's polling.

It should also be pointed out that any message queue with persistent messaging is built on a database. Even RabbitMQ's persistent messaging is built on a database (Mnesia).

In the end, a message store or event store can be used to model message queues, but the opposite is rarely true. One of the creators of the Event Store database once said to me something like, "What's a queue but a degenerate form of a stream".

A message store provides for patterns that are above and beyond message queues, like event sourcing, for example. Like Ethan mentioned, it's an application of "dumb pipes". It's in the same vein as Event Store or Kafka, rather than RabbitMQ, ActiveMQ, etc.

The critical difference is durability of messages. If you have an application that doesn't require durability of messages, then a plain old message queue or message broker technology may be a better choice for the situation.

In the end, polling is totally fine and totally manageable. What matters is that it's not done naively; that batching is intelligent and polling is only optionally triggered in the right circumstances and tuned based on batch processing cycle time and message arrival time.

Polling doesn't mean that the database will be "hammered" unless it's implemented that way.

2 comments

> Sooner or later, in anything built with a computer, there's going to be polling. ... But under the hood, there's polling.

Is that true? I am reminded of this essay that goes into a deep dive of what happens under the hood with http and the underlying network I/O

https://blog.stephencleary.com/2013/11/there-is-no-thread.ht...

And at the lowest level, the network card interrupts the CPU because it has finished reading or writing data.

> Some time after the write request started, the device finishes writing. It notifies the CPU via an interrupt.

Is that polling? It seems more like a push.

Anything that processes a signal checks if a signal has been received. It's no so black-and-white at the level of electricity, but higher-level things at the level of durable message queues check for new signals, even if those signals arrive via "push".
It is still good design to do the polling only at the lowest level where you must. Higher layers should be reactive.
Does a network card "poll" ? it's hardware activated by current flowing into it. Does the CPU poll the card, no, the CPU is interrupted by the network card, again by receiving an electrical signal.

If there's polling, it happens in a matter of a few CPU cycles.

At some point, this is splitting hairs. Single instructions are atomic wrt. interrupts, so surely there must be some sort of check every cycle whether an interrupt has arrived during that cycle.

The magnitude of the time slice or polling interval is immaterial as to whether it is to be considered "polling."

> persistent messaging is built on a database. Even RabbitMQ's persistent messaging is built on a database (Mnesia).

Good point that any system that is "persistent" has a data store by definition. The real question is that if it's a good fit to a SQL relational database.

The right store is the one that has the features needed to implement the targeted patterns, has client libraries for most programming languages, can be tuned and scaled, has a large ecosystem, has numerous managed hosting solutions offered by popular cloud providers, and is approachable by the largest segment of the potential audience.

Postgres checks those boxes, as do others.

Technically, though, Postgres has been an "object-relational" database for some time.

In the end, Message DB uses a table as an append-only log, and leverages Postgres indexes, advisory locks, and JSON documents (and indexes) to implement some of the critical messaging features and patterns. It's not using the "relational" aspects of Postgres. There are no relational tables.

Here's the table schema: https://github.com/message-db/message-db/blob/master/databas...

Given the paucity of Postgres features that the message store leverages, it could have been implemented against the raw Postgres storage engine. Had we done that, though, few people could have understood it and been able to specialize it to their purposes using plain old SQL. And any performance improvements induced by skipping Postgres's tabular data abstractions would have been so negligible for the schema in question that it wouldn't have offered much return on the effort.