Hacker News new | ask | show | jobs
by jrbancel 2154 days ago
Absolutely, the vast majority (95%+) of logs are never read by a human. Therefore, processing it is enormously wasteful. A good architecture will write once and not touch anything until it is needed.

I spent years working on system handling 50+PB/day of logs. No database or ELK can handle that, and even if it did it would be prohibitively expensive.

1 comments

Where did you work? CERN?
It's adorable when people think scientific computing has the same scale as a Google or Microsoft.
Ignoring the fantasy b.s. in the second half of the article, the stuff at the top is exactly what I mean.

A mighty 400 GB/s: i.e. much less than the > 50 PB/day of logs the other person mentioned;

1600 hours of SD video per second: i.e. about 1-2 million concurrent HD streams, or much less than the amount actually served by YouTube.

IBM Summit "world's most powerful supercomputer": < 5000 nodes, i.e. much below the median cell size described in the 2015 Borg paper. Summit is a respectable computer but it would get lost in a corner of a FAANG datacenter.

CERN is a correct example. The LHC reportedly generates 1PB per second: https://home.cern/news/news/computing/cern-data-centre-passe...
If you define “generates” to mean “discards” then yes.
The numbers are not fantasy at all - this will be a huge radio telescope - one square kilometer of pure collecting area and thousands of receiving antennas (For reference: Arecibo has around 0.073 km^2). We are talking data input to the correlator on the terabit/s scale. And technology-demonstration with ASKAP are well under way. ALMA is working quite well by now as well (> 600 Gb/s with just an 50 antenna array).
it’s adorable how proud you are to have worked at FAANG and how angry you get at the idea some other organisation handles equivalent scale
touche
400GB/s is about 35 PB/day

not quite as big a difference

So, Youtube puts whole streams into their logs? Interesting.