Hacker News new | ask | show | jobs
by smoldesu 1712 days ago
What's the bug here? It looks like you fooled the container codec with a incorrect timecode and then when it was uploaded to YouTube, the file was rasterized into a sane format. I don't really see an attack here, nor do I see a mitigation.
3 comments

It seems like it sort of counts as an amplification DOS. Enough people uploading smallish videos that unravel into terabytes could probably create an issue. It's bypassing the YouTube limits of 256 GB/12 hours.

I would guess YouTube will do some sort of fix or sanity check.

That makes sense, thank you. I'd assume a data engineer at Google somewhere has a small yellow light that goes off whenever someone exceeds those limits, but FAANG infrastructure never fails to disappoint me.
More like a graph that a single person generally can't hope to move unless they have a following to the level of xcow. If someone burns a tire in the middle of the rain forest.... can anyone tell until its 50,000 people doing it?
What is xcow?
I think they might've meant xqcow https://en.wikipedia.org/wiki/XQc
Strange reference. Is that just meant to be an arbitrary celebrity or does xqcow have some particular relevance here I'm missing?
The issue with "expensive to calculate" values like the duration of media (for example, variable encodings) is that the encoder tries to help others avoid rematerializing these values by saving its calculation in some metadata. The problem is consumers then have to "trust" the encoder; this post demonstrates a non-malicious case, but perhaps there are more malicious cases (like the vulnerability in Android's libstagefreight years ago).

For example, I wrote an iTunes-in-the-browser web app; I needed to know durations of songs to display them. MP3 doesn't include these in metadata IIRC, so I needed to pre-process them with ffmpeg just to have duration data. I wasn't doing anything with that other than displaying it. But it would have been nice to just have that info in the metadata.

> For example, I wrote an iTunes-in-the-browser web app; I needed to know durations of songs to display them. MP3 doesn't include these in metadata IIRC, so I needed to pre-process them with ffmpeg just to have duration data.

This jogged my memory from (part of) the first thing I ever built in a general purpose programming language, all of probably 20 years ago! I was doing exactly this: using ffmpeg to get duration metadata from MP3s.

My memory was fuzzy so I looked it up, which (surprisingly!) confirmed what I remembered. MP3s may include metadata (ID3) which may include duration (or start/end times).

I knew my input source (it was me, my music, my MP3 conversions), so I was able to rely on the metadata directly. IIRC I even processed it on demand in my first naive version, which was “slow” but not nearly as slow as stuff I’d complain about today.

I ran into a similar issue when I tried to generate a podcast RSS feed from a website whose built-in feed didn't go back far enough. I was trying to do HTTP range requests on the mp3 files to save bandwidth and just fetch their metadata. Sure enough, mostly no duration and if the encoder did put it in a custom field it was usually different than what VLC says.
You could do something like a Zip bomb I guess. YouTube would just have to do some validation of the file before adding to pipeline.
zip bomb is a perfectly valid file.

I can set up a broken service, that outputs a gajillion lines of same errors to syslog, creating terrabytes of logs, zip all that into few megabytes, and that'd be a valid zip, that'd fill up most modern laptops and servers.

A surveillance camera video, with a very high frame rate when motion is detected and a very low frame rate when not (high framerate -> timelapse), can be a perfectly valid video, taking a few gigabytes in this format, and a few terrabytes when converted to fixed 60fps.

Zip files that contain themselves are infinitely large when recursively decompressed, so that's much worse than a log file which is merely easy to compress.
Infinitely large doesn't mean anything, when your disk space is limited.

If your drive is 500GB, there is no practical difference between a 10TB log file a 10PB zip file or an infinite zip bomb... once the disk is full, the unzipping stops.

Narrowly true, except it's trivial to scan a very large archive without actually storing the entire thing, whereas if you tried to do the same thing with a zip quine you'll eventually run out of memory. Zip quines are strictly worse.