Hacker News new | ask | show | jobs
by creamyhorror 4862 days ago
> Their software scans it and takes careful note of scenes with high levels of movement, scenes with human shapes moving, scenes with human faces, scene's that match algorithms for water, grass, natural environs etc etc

The funny thing is, this is the sort of thing that H.264 encoders intrinsically do. x264 has some fairly advanced algorithms in it to optimize for human visual perception (thanks DarkShikari, akupenguin, et al).

I assume these guys are basically determining settings to feed to x264. (They can't be modifying x264 itself since they'd then need to submit their source code changes upstream, if I've read the other replies right.)

If all they're doing is turning x264's settings knobs, they'll have to have studied the effects of those knobs in depth. I can hardly see how any analysis they're doing can be usefully turned into settings for x264. I find it a bit difficult to phrase my reasoning, but I'll try:

-----

- Trying to outdo x264's analysis at optimizing for perceptual quality while still depending on it is like trying to optimize a car engine from the driver's seat. It's not likely to happen.

- x264's settings are, generally speaking, macroscopic - they apply to the whole video segment. (You can apply different settings to different segments, but in the end your control is still limited by whatever knobs x264 offers.)

- If there were a way to optimize things better than x264 has, it almost certainly requires working directly within the encoder's analysis code itself rather than carrying out a pre-encoding analysis process and then fiddling with rough-control knobs. I simply doubt the complex interplay of settings within x264 lends itself to mere knob-turning. Many of the settings are mainly for making tradeoffs among the impossible trinity of encoding speed vs output quality vs output bitrate, rather than to allow the encoder to improve the output perceptually (because that's x264's job).

- Even if they managed to build a pre-analysis model that figures out decent settings to feed into x264, it would break to some extent whenever x264's code/algos are changed. That doesn't seem like a stable base to build a business on.

- All the above reasoning is overkill, because the settings the Beamr encoder turned out look outright silly to me: setting b-frames to 0 is just shooting x264 in the foot (b-frames are central to bitrate savings through discarding unnecessary visual data). Plus it turns off mb-tree and psy in 3 out of 4 samples, which basically discards two of x264's more powerful adaptive bitrate-vs-quality features. (mb-tree detects motion and saves bits on and around moving objects; psy is various optimizations for psychological quality perception, e.g. grain level.) It's just plain regressive.

-----

More fundamentally, the claim that "a minimum bitrate for visually lossless encoding of a video can be found" is quite doubtful, because of the fuzziness of the claim and its assumptions. The trouble is, a pure marketing line like this is easy to sell to a non-technical crowd. And x264 is good enough that anyone re-encoding a crappy source with it will find good bitrate savings, even with dumb settings.

Anyone looking to encode video in the cloud should instead use a well-priced, well-tuned service that doesn't overstate its case. Daiz suggested Zencoder, so it's probably a good bet. People who just want to shrink videos should grab Handbrake or any other x264-using encoder package, and use one of the presets (or just stick to the defaults). The result will probably be better than this service, as things stand.

(Thanks for patrolling the video frontier, Daiz.)

1 comments

> If there were a way to optimize things better than x264 has, it almost certainly requires working directly within the encoder's analysis code itself rather than carrying out a pre-encoding analysis process and then fiddling with rough-control knobs. I simply doubt the complex interplay of settings within x264 lends itself to mere knob-turning.

You're treating x264 like some box of black magic.

As far as the perceptual tunings go, x264 already provides the three most important knobs (aq, psy-rd, psy-trellis) that, due to their nature, can't be a "one size fit all" deal. x264 makes no attempt to guess which of those settings would fit your source best and only offer a conservative Default and a series of tunings for marginally more fine-grained control.

A hypothetical, "better" approach would be to have an amazingly intelligent first pass be done to split the movies into scenes and calculate the perceptual weights for each scene. That way, in a movie like Kill Bill, the animated, fast-action, and talking-head scenes would all be perceptually optimized, as they all need vastly differently settings. Splitting the movies into zones also open up a whole plethora of options for fine-tuning quality. Again, you can always reach these options from the command line.

(Also, x264's psy-rd is hilariously unoptimized for high-stress bitrates. I'm not sure if the purported "service" being discussed in this thread can handle those bitrates, but it's an area needing massive tweaks. x264's main role seem to be high-quality archiving, however, so this is merely a tangent.)

> Even if they managed to build a pre-analysis model that figures out decent settings to feed into x264, it would break to some extent whenever x264's code/algos are changed. That doesn't seem like a stable base to build a business on.

One could just not update, or only update the required parts. I assume reading and modifying code is already a prerequisite here.