|
|
|
|
|
by mistrial9
2437 days ago
|
|
it appears in this document: * DataDome is a security company, and gets web traffic in near real-time for clients; a lot of traffic in some cases with very specific numbers given, like daily peak loads. * DataDome only retains records for 30 days, and the most attention is given to the most recent traffic, to detect attacks * an ElasticSearch deployment records all of the traffic records downstream from Apache Flink; a new feature added to ES this year, improves the management of ES indexing, and that solved problems that DataDome was having.. things are better! write an engineering blog post ! * re-indexing is done nightly, and implemented in a cloud environment that can handle the (heavy) work to rebuild the indexing. These numbers are impressive. Earlier criticisms of ES are being addressed, and ES is stable and a cornerstone of the architecture. A company called DataDome is providing real services in near real-time. Congratulations to the team and an interesting read. |
|
> Storing 50 million of events per second
> A few numbers: our cluster stores [...], 15 trillion of events
> We provide up to 30 days of data retention to our customers. The first seven days of data were stored in the hot layer, and the rest in the warm layer.
15e12 / 50 MHz is 3.5 days.
I guess 50 MHz is the peak ingest rate.