Hacker News new | ask | show | jobs
by altmind 2435 days ago
>> Each day, during peak charge, our Elasticsearch cluster writes more than 200 000 documents per second

What is this 50M in the title?

2 comments

They state each document has 250 events, 200,000 document/sec x 250 events/document = 50m events/sec.
"Each document stores 250 events in a seperate field."

Curiouser and curiouser.

Good catch! We will add it in the article
It's misleading for sure but they're writing 250 'events' per document.
200k documents per second is a lot less impressive, no?
I don't know. Have you tried it? I worked on a Kafka streams service written in Java that processed "changelog" messages (it involved one query to CosmosDB per message, and logging the result to Kafka for downstream processing by other systems). Now, we had a rather limited number of workers (4 or 8? I don't remmember), but getting to 100k messages per second was rather challenging.
200k documents per second really isn't that much for Elasticsearch. The single-instance setup we have in my small company (around 25 people total) has been sent in excess of 40k/s at times, and even then it doesn't slow down noticeably.
and less catchy.
"200k documents per second" would be less catchy you are right, but it also doesn't reflect the complexity to handle that throughput. Most of the benchmark we can see around like "1m document per second" are using small documents in POC environement. In our setup, each of the 250 fields are store and indexed in ES, making it CPU and I/O intensive, in a production environment.