Hacker News new | ask | show | jobs
by samplonius 3598 days ago
It is not that Netflix is being access, but what on Netflix is being accessed.

Netflix does not publish viewership information. But a large ISP could run DPI on Netflix traffic to determine what content is being viewed. And since many ISPs have their own TV product, they would be really interested in that information themselves.

But if Netflix streams at TLSed, then good luck figuring out Archer from The Lust of the Dead.

1 comments

> good luck figuring out Archer from The Lust of the Dead.

Not necessarily that hard. TLS won't hide the sizes of the files being downloaded. You may even be able to estimate the size of each segment as they're downloaded, which should give you a pretty accurate fingerprint.

You'd need to spend resources to play each movie with different devices and bandwidths and record the traffic pattern, which raises the bar a fair bit.

The traffic pattern would be more specific to the playback device than the content. There might be more spread in duration, but even so, television networks like shows to go for specific times for ease of programming. As for encoding rates, it is quite possible they use CBR. Even if they use VBR they may choose different playout sources depending on the consuming devices and network conditions.

On the whole I doubt you would have a high probability of identifying any specific show. Even if you are able to cull 75% of the possibilities (i.e. 4 bits of entropy) that still leaves lots of shows (total ~1000 tv series and ~5000 movies by one source), plus all the noise of people pausing/switching, skipping credits, etc.

How big is a show, sent compressed? A gigabyte an episode? I guess the range is more relevant, say plus or minus ten megs to be conservative. I don't know how big packets are, maybe 4k is too small? 10M/4k=2.5k, not bad, but not great if you want to avoid birthday collisions, you'd only get maybe 60 uniques if they're uniformly distributed.

CBR does kill it, though, and "uniformly distributed" is too big an ask.

Off topic, but an interesting thought: you know the HBO intro? With the static? That static is the hardest thing in the world to compress, and also the thing that viewers care the least about having compressed accurately. That's weird, I wonder how true it is across the board -- certainly artifacts can be jarring in flat shaded cartoons...

900MB for 720p for an episode of TV drama seems typical, but I haven't run wireshark on my Netflix yet.

For DSL typical packet MTU is 1400-something. Why do you ask? It doesn't really matter because the upper level proto is oblivious. You can just use b/s instead of packets/s if you want to compare bitrates.

There are special encoding modes in 264/265 for handling animation. I'm not aware if there being modes for the intro except that it is monochromatic.

> For DSL typical packet MTU is 1400-something. Why do you ask?

I was wondering how finely you can get the total filesize if it's encrypted. Can you just count fixed size packets, or can you figure the length down to the byte? Makes a big difference if you're trying using that to fingerprint the shows.

You can reassemble the https stream and get the exact length. Unless they do something weird like multiplexing, or change the playout rate, or the user pauses, it should be exact.