Hacker News new | ask | show | jobs
by Hupo 4889 days ago
>not very representative clip

I chose the clip based on what Dark_Shikari (x264 developer) had to say about it[1]:

It shouldn't bias too heavily towards any one encoder like many of the other standard test clips will:

a. It's relatively high motion, so it won't bias heavily against encoders without B-frames or qpel (as, say, mobcal does).

b. It's not so high motion that it would cripple video formats that don't support motion vectors longer than 16 pixels (e.g. Theora).

c. It's not something that benefits an unreasonably large amount from some of x264's algorithms (which is why I picked this and not parkrun).

[1] http://forum.doom9.org/showthread.php?t=154430

I could have done multiple test encodes, sure, but the problem in this case was that downloading several gigabytes of raw source material isn't exactly instant. And even if I tested with multiple clips, I doubt the conclusion would be that much different.

2 comments

Because of texture this clip benefits enormously from 8x8 transforms (as well as substantially from an activity masking aware encoder). On an intra frame in prior testing Theora did enormously better than VP8 on this clip for these reasons. If your test was to compare an intraframe between vp8 / baseline h264 / and Theora, you would have concluded Theora was the best by a wide margin. But this would be an erroneous conclusion.

And sure, perhaps you'd get the same result on other clips. Over high-profile H264 the only obvious format feature that come to mind that could really let VP8 get ahead are the 'truemotion' intra-predictor and creative use of the synthetic reference frame (though I suppose the vp8 developers might have other suggestions) and I'd expect those features to only be big wins on a small number of clips so it wouldn't be hard to miss the cases where VP8 really shines over high profile h264.

But you (or I) could have said that without doing the test at all, and there would be 100% fewer clueless people going around claiming that something was proven here that wasn't. Your opinion (or mine) is a fine thing, but it's not proper to launder an opinion as fact by dressing it up in an inadequate test.

>If your test was to compare an intraframe between vp8 / baseline h264 / and Theora, you would have concluded Theora was the best by a wide margin.

But it wasn't. I was comparing the visual quality of the whole video, and provided the full encoded clips for people to download and compare for that reason.

I am willing to do further test encodes, but have no interest in doing something like encoding all 28 HD test clips available on derf's test clip page[1], since as a purely visual comparison, especially with the actual encodes, it would be incredibly exhausting.

EDIT: I added a notice about the downsides of single clip comparison to the top of the post.

[1] http://media.xiph.org/video/derf/

I have no interest in doing something like encoding all 28 HD test clips, since as a purely visual comparison, it would be incredibly exhausting.

Science is exhausting. If you're not working hard, then you're likely to miss the interesting (counter-intuitive) results. In fact, finding counter-intuitive results is the whole point of science. If the truth were intuitive, explanations wouldn't need testing.

One problem is that even if I found that VP8 performed very well at one or two particular clips (out of the 28 HD test clips available), I couldn't say for sure why that is the case. There seems to be no clear information on what clips benefit from what kind of features, and as I'm not an expert on video encoding technology, it'd be hard for me to deduce these things by myself. General conclusions could still be reached, obviously, but if I was going to such lengths it'd suck if I couldn't get more overall detailed results.

Anyway, I brought up the subject to some Xiph folks over at IRC. Maybe in the future the test clips will come equipped with more detailed information to help in testing. It'd also benefit smaller scale tests, since it'd allow one to identify possible biases more easily.

That was in 2005!

Given that people do hundreds of test encodes when they actually use things like x264, I think that if you want to say anything general about these encoders you have to do more than one comparison.