Hacker News new | ask | show | jobs
by jiggawatts 1475 days ago
There was a post here recently where a Netflix employee was proudly showing off their log processing system. Which was collecting the equivalent of nearly 2 MB of logs per minute of user streaming time.

In my mind, that's just bonkers, and no amount of handwaving could justify it.

3 comments

> Which was collecting the equivalent of nearly 2 MB of logs per minute of user streaming time.

to clarify could you say the same thing, in a different way?

For every minute that someone streams Netflix, 2 MB of data is logged. So if 1,000 people are using Netflix simultaneously, they're generating 2 GB of logs per minute.
Warning: NPM packages are out of date x 1000000
> 2 MB of logs per minute of user streaming time.

2MB/minute is 33KB/second.

How is that impressive?

I think it's impressive that they somehow found 33KB/second worth of data to log for each stream. I can't even imagine the amount of useless shit that must be logged to get to that number.
This is where I'm at. Like thats honestly not much log data. But what are they actually logging? I imagine there is a LOT of repetitive data.
Detailed logging can function as an on-demand APM. Not a bad idea if you have the bandwidth and storage for it.
I think the impressive thing is how much data that is for each user-minute. What could they possibly be storing in 33KB for each second of Netflix you stream?
That's per user. So a million (or ten, 50...) active users means a lot more per minute.
i think you and the above poster are in vehement agreement. ingesting 2 MBs of logs per minute is impressive in its pluperfect unimpressiveness.

maybe the presentation was called "Timmy's first named pipe" or "Sally explores /etc/logrotate.d"

That’s a LOT of text to describe me sitting on my couch. 2MB per minute is far more than the most detailed biography in existence.
220m users. Let’s imagine 50m are streaming concurrently. That’s 100TB an hour in logs lol. They could be storing an entire petabyte of logs a day. My friend did some data center stuff for the large hadron collider and wasn’t hitting these data ingestion states, and these are just to record me binging the office.
The comment said "log processing system". Sounds more like it's a stream and not stored logs.
2 MB/per minute/per stream at Netflix scale is crazy.