| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sonthonax 759 days ago

Wait until you learn that they believe they need Kafka. Their engineers are probably bitter they work at a media company and not a FAANG.

https://www.confluent.io/en-gb/blog/publishing-apache-kafka-...

> The Monolog is our new source of truth for published content. Every system that creates content, when it’s ready to be published, will write it to the Monolog, where it is appended to the end.

> The Monolog contains every asset published since 1851. They are totally ordered according to publication time. This means that a consumer can pick the point in time when it wants to start consuming. Consumers that need all of the content can start at the beginning of time (i.e., in 1851), other consumers may want only future updates, or at some time in-between.

> As an example, we have a service that provides lists of content — all assets published by specific authors, everything that should go on the science section, etc. This service starts consuming the Monolog at the beginning of time, and builds up its internal representation of these lists, ready to serve on request. We have another service that just provides a list of the latest published assets. This service does not need its own permanent store: instead it just goes a few hours back in time on the log when it starts up, and begins consuming there, while maintaining a list in memory.

Absolutely insane. The only reason this works is that the NYT publishes less than 300 articles per day so you can get away with doing un-indexed full table scans of your entire database. But the engineers can put "I created a log based time-series architecture on their resumes".