It would be nice if they had an actual video comparison; I can make either of two codecs that are even close in performance look way better with single frame-grabs.
I am skeptical of the quality claims without evidence. Previous third-party studies on VP8 and VP9 have said that both H.264 and HEVC (formerly H.265) outperform VP9 on video quality and encoding speed[1].
Table VI. Encoding Run Times for Equal PSNR_{YUV}, HEVC vs. VP9 (in %): 735.2
I.e., HM is over 7x slower than VP9's slowest settings. VP9 speeds have improved dramatically since then, and it is now being used real-time (Google is working on adding it to WebRTC), while still demonstrating significant quality improvements over prior generations.
In my experience h265 and vp9 are fairly close in encoding speeds (less than 0.1x times the length of video duration).
I would strongly argue that h264 can ever outperform vp9 in video quality (given same number of bits), however, hardware/browser support will decide whether vp9 is adopted or not.
It's still H.265. Like past standards, HEVC/H.265 is a joint effort between ISO's MPEG and the ITU's VCEG. They're different identifiers for different organizations.
It's early days though, e.g. only PSNR for now, but the basic idea of repeatable, open source codec testing so that if someone complains about any aspect they can fork and show the difference, and hopefully get it rolled back in to future versions, is very cool.
I haven't been following codecs for about 5 years now, but objective metrics were mediocre at best for comparing codecs the last time I was into it. The issues with PSNR are myriad and well documented.
I've always thought PSNR gets a bad rap. It's not that PSNR isn't a very blunt tool, it's just that every other tool in it's class is pretty blunt too.
I mean it compares a video codec by treating it as a series of independent still images. That right there is crazy, but the same basic methodology applies to most of the alternatives, even the crazy obscure ones that no-one really uses because they're too new or processor intensive.
The Mozilla/Daala team have written a lot about these topics in regards to their work on a) Daala, b) netvc (new IETF codec project just getting started), c) evaluating improvements to JPEG for MozJPEG, d) evaluating WebP
In the end, like unit tests, performance benchmarks, static analysis or various other software development tools, they're useful if you use them wisely and know their limitations, and dangerous if you abuse them or treat them as if they are magical.
But crappy tools that can be easily automated fill an important part of the toolbox and I feel PSNR has it's place. I'm deeply suspicious of anyone who looks down their nose at PSNR because they've just discovered SSIM for example, which seems to be a common sentiment. They're both just crappy tools that can be used for good or ill if you know what you're doing, running them both (and others too) might help catch more bugs than either alone and if you're automating then why the heck not?
Okay, that sums up how things were about 5 years ago; PSNR and friends can catch some problems, but we still don't have great objective metrics for video quality.
[edit]
Some well-known, but hard to measure objectively psychovisual effects are that some kinds of noise are well tolerated in areas of high detail, and very large loss of detail can be tolerated in high-motion areas.
[1]: http://iphome.hhi.de/marpe/download/Performance_HEVC_VP9_X26...