Hacker News new | ask | show | jobs
by motbob 4095 days ago
Yeah. In general, this is a big problem, not only for VP8/9, but also for h265 (also known as HVEC). My computer struggles to render 1080p HVEC video lag-free. Early adopters should recognize that there are advantages to using h264 that have nothing to do with the quality of the codec itself.
2 comments

Minor correction: H.265 is HEVC - High Efficiency Video Coding.

H.265 and VP9 will (most likely) have a lot of difference when we consider hardware encoders/decoders. I guess H.265 hardware decoders will be present on all platforms in the next couple of years. There are mobile processors that have hardware H.265 decoder. H.265 encoder seems to be a bit far. I am not sure whether someone has a production ready hardware encoder. On the other hand, VP9 is not a priority for most of the companies.

YouTube is only going to offer UHD over VP9, meaning that 4K smart TVs and set-top-boxes will have to support VP9. Google can also use their leverage over Android to influence mobile device makers. At $0.40 a pop, it makes sense for most SoCs to support it.

Netflix has said they are going to use H.265, but they could adopt the same strategy as Google. They could even force their desktop customers to install the VP9 codec, just as they did for Silverlight.

The primary problem is Apple, they simply won't support it. Thankfully, AppleTV hasn't caught fire, relegating their control of the market to the iPhone.

Daala will be more amenable to acceleration via generic GPUs. It probably won't match dedicated hardware but if a mobile device can decode it and the bandwidth savings are significant, the lack of licensing fees will make it a very attractive option.

Hopefully Daala will be significantly better than H.265 and win over Apple and others based on the merit of their codec alone.

>YouTube is only going to offer UHD over VP9, meaning that 4K smart TVs and set-top-boxes will have to support VP9.

Google told us VP8 was the future, and that widespread hardware support was imminent. Then in less than a couple of years they abandoned VP8.

Next week millions of fairly new TVs are going to stop working with YouTube because Google decided to shut down the API: https://support.google.com/youtube/answer/6098135?hl=en

I'm not sure the TV industry is about to invest in supporting a technology that will probably be long deprecated by the time most of their customers will even be able to use it. Fool me once etc

That's baffling. Do they have any reason at all to force people onto the new API?
> Do they have any reason at all to force people onto the new API?

Youtube Ad Support.

The new API has the ability to enforce ads just by moving to it? TOS changes?
By "TV industry" I assume you mean the manufacturers of TVs. Since most of them support Android TV (which requires VP9), or want 4k Youtube as a ready source of content for their 4k TVs, and are often brands that already make Android phones, or re-use chips intended for those that do, I'd guess they'd have to do more work to avoid VP9 then to use it.
Daala is way more exciting technically as well. Of course, being more exciting is not equivalent to being better, but with such a novel approach (lapped transform), Daala may just shake up the state of the art, while the rest merely refine it.
Why would anyone want more than 1080p on a phone?
>Why would anyone want more than 1080p on a phone?

Because 1080p on a 5 - 6 inch device creates visible pixels to the naked eye, and can be improved upon with a higher quality screen.

I have a 1440p 5.5" smartphone and the difference next to 720p is staggering and the difference next to 1080p is still noticeable to the untrained eye. The tests I use to demonstrate to people include well formed text display, comic-book display, and Unreal4 demo. People pick out the 1440p screen as best without much issue in every test.

I get that > 1080p makes sense for text and vector graphics. But really, what are you realistically going to watch on your phone that's been filmed with a 4K camera and optics that match that resolution? The fact that phones are shipping with 4k video capability does not mean the quality is better than the same camera shooting 1080p, especially when you take into account the limit on bandwidth in the encoder chip, so 1080p can be recorded at a higher bitrate.
I remember with previous size jumps, it gets to a certain point when you want to be able to decode 4k video, even if your display (or eyeballs) can't handle it, just because that's easier than transcoding the original file.
I'm not entirely convinced that the minor benefits from increasing resolution so much offset the cost in terms of battery life, especially on devices where screens are already the most power-hungry parts.
>But really, what are you realistically going to watch on your phone that's been filmed with a 4K camera and optics that match that resolution?

You seem to be avoiding the fact that the primary use case of smartphones includes images and text, not video.

You're right that video of sufficiently high enough quality to notice isn't readily available -- but who cares?

1440p makes the text under an app icon easier to read.

It makes webpages easier to read.

It makes "online magazines" crisper. It takes better advantage of a plethora of high resolution iconography and imagery designed to take advantage of "retina" this and "4k" that.

Sure, it maybe a decade before we're streaming >1440p video on our devices, but higher resolution screens making better text was a need ten years ago, not just today.

Did you miss the fact that we are discussing a video codec?
I've only held a galaxy note (2560x1440) once but it was pretty nice. Resolution is one specs race that I've always been fond of. When somebody finally finds a sasquatch you'll be glad for your 4k display
For the same reason that we want more than 640k RAM [1]? More seriously: Economics of cellphone screens are driving prices and specs of heads-up VR displays. Higher resolution and faster rendering help both -- plus benefits of mass-production.

[1] http://www.computerworld.com/article/2534312/operating-syste...

Screens are >1080p in resolution. Do you really want all your videos upsampled?
> Why would anyone want more than 1080p on a phone?

The same reason that flagship phones now tend to have screens with resolution greater than 1920x1080. 1920x1080 isn't the highest useful resolution at the size of many of today's phones.

Merits of their codec ? Seems a bit naive.

This has nothing to do with merits and everything to do with big business politics. That said in the case the best quality codec ie. H.265 is likely to win out pretty comfortably.

I think many thought this a year ago, but it's not the case any more. All of the major SoC vendors are shipping VP9 in their newest system-on-chips. In addition, a new licensing pool for HEVC just formed, and has yet to announce their licensing model or cost [1]. This is all the more reason for companies to look for alternatives.

[1] http://hevcadvance.com/

> I am not sure whether someone has a production ready hardware encoder.

The first was announced few days ago.

http://socionext.com/en/pr/sn_pr20150403_01e.pdf

Is HVEC really used anywhere/by anybody yet?
Anime groups are playing with it. Anime fansub groups, in general, are hungry for the absolute best in codec technology. Hardware and media player compatibility be damned. An example: their use of 10-bit color.

There are some groups in the anime scene whose sole purpose is to take releases from other groups and convert them into standard h.264 video that can play on basically any device.

I'm a colorist. I spend all day looking at color intensely with very expensive monitors. It makes me really excited to see people who actually care about color reproduction over things like resolution.

With that said, I have to ask why these groups are interested in 10-bit when I'm essentially certain they cannot view in 10-bit. Only workstation GPU's (Quadro and FirePro) output 10-bit (consumer GPU's are intentionally crippled to 8-bit) and I can't really think of any monitors that have 10-bit panels under about $1000 (though there are many with 10-bit processing which is nice but doesn't get you to 10-bit monitoring). There are some output boxes intended for video production that can get around the GPU problem, but by the time you've got a full 10-bit environment, you're at $1500 bare minimum which seems excessive for most consumer consumption.

So I guess what I'm asking, are these groups interested in having 10-bit because it's better and more desirable (and a placebo quality effect) or are they actively watching these in full 10-bit environments?

It's worse than that; the input itself is only 8-bit and they scale up, and then back down again to 8-bit on the output side.

But it works, for the same reason that audio engineers use 64-bit signal pipelines inside their DAW even though nearly all output equipment (and much input equipment) is 16-bit which is already at the limit of human perception.

If you have a 16-bit signal path, then every device on the signal path gets 16 bits of input and 16-bits of output. So every device in the path rounds to the nearest 16-bit value, which has an error of +/- 0.5 per device.

However if you do a lot of those back to back they accumulate. If you have 32 steps in your signal path and each is +/- 0.5 then the total is +/- 16. Ideally some of them will cancel out, but in the worst case it's actually off by 16. "off by 16" is equivalent to "off by lg2(16)=4 bits". So now you don't have 16-bit audio, you have 12-bit audio, because 4 of the bits are junk. And 12-bit audio is no longer outside the limit of human perception.

Instead if you do all the math at 64-bit you still have 4 bits of error but they're way over in bits 60-64 where nobody can ever hear them. Then you chop down to 16-bit at the very end and the quality is better. You can have a suuuuuuuper long signal path that accumulates 16 or 32 or 48 bits of error and nobody notices because you still have 16 good bits.

tl;dr rounding errors accumulate inside the encoder

> 16-bit which is already at the limit of human perception

Nitpick: 16-bit fixed point is not at the limit of human perception. It's close, but I think 18-bit is required for fixed point. Floating point is a different issue.

> If you have 32 steps in your signal path and each is +/- 0.5 then the total is +/- 16.

Uncorrelated error doesn't accumulate like that. It accumulates as RSS (root of sum of squares). So, sqrt(32 * (.5 * .5)) which is about 2.82 (about 1-2 bits).

> You can have a suuuuuuuper long signal path that accumulates 16 or 32 or 48 bits of error and nobody notices because you still have 16 good bits.

Generally the thing which causes audible errors are effects like reverb, delay, phasors, compressors, etc. These are non-linear effects and consequently error can multiply and wind up in the audible range. Because error accumulates as RSS, it's really hard to get error to additively appear in the audible range.

tl;dr recording engineers like to play with non-linear effects which can eat up all your bits

> Uncorrelated error [...] accumulates as RSS

Uncorrelated average error, yes. Uncorrelated maximum error accumulates as a linear sum, as he calculated.

But yes, you're absolutely correct in that it tends to be nonlinear effects that cause issues.

> > 16-bit which is already at the limit of human perception

> Nitpick: 16-bit fixed point is not at the limit of human perception. It's close, but I think 18-bit is required for fixed point. Floating point is a different issue.

16-bit with shaped dither should be good enough to cover human perception.

This is the same reason try and keep post production workflows at 10bit or better (my programs are all 32bit floating point). A lot of cameras are capable of 16bit for internal processing but are limited to 8 or 10bit for encoding (outside some raw solutions). An ideal workflow is that raw codec (though it's often a 10bit file instead of raw) going straight to color (me working at 32) and then I deliver at 10bit from which 8bit final delivery files (outside of theatrical releases which work off 16bit files and incidentally use 24bit for the audio) are generated. So all that makes sense to me.

I was mostly curious why people were converting what I assume are 8bit files into 10bit. The responses below about the bandwidth savings and/or quality increase on that final compressed version seem to be what I missing!

where you put "off by sqrt(16)=4 bits" did you mean "off by log2(16)=4 bits"?
Yes, thx
> With that said, I have to ask why these groups are interested in 10-bit when I'm essentially certain they cannot view in 10-bit.

The H.264 prediction loop includes some inherent rounding biases and noise injection. Using 10-bit instead of 8-bit reduces the injected noise by 12 dB, which makes a noticeable difference in quality-per-(encoded)-bit, even when the source and display are both 8-bit. I spent some time talking with Google engineers early on in the VP9 design process, and they commented that they had done some experiments, but did not see similar gains by using higher bit depths with VP9. I don't know if that's still true with the final VP9 design, though.

It is a matter of filesize. Testing revealed that using 10-bit allowed for a better picture at similar target bitrates. For an explanation of why this is so, see http://x264.nl/x264/10bit_02-ateme-why_does_10bit_save_bandw...
I am far from an expert on the matter, but i recall the case was made that acceptable color reproduction could be achieved with lower bitrates by using 10-bit encoding. It was about file size, i don't think anyone in the fansubing community expected people to have professional monitors.
>With that said, I have to ask why these groups are interested in 10-bit

I can explain that - I wrote an extensive post on this matter a year back, so I'll just reuse that with a bit of tweaking. So with that clear, let's talk about the medium we're working with a bit first.

Banding is the most common issue with anime. Smooth color surfaces are aplenty, and consumer products (DVDs/BDs) made by "professionals" have a long history of terrible mastering (and then there's companies like QTEC that take terrible mastering to eleven with their ridiculous overfiltering). As such, the fansubbing scene has a long history with video processing in an effort to increase the perceived quality by fixing the various source issues.

This naturally includes debanding. However, due to the large smooth color surfaces, you pretty much always need to use dithering in order to have truly smooth-looking gradients in 8-bit. And since dithering is essentially noise to the encoder, preserving fine dither and not having the H.264 encoder introduce additional banding at the encoding stage meant that you'd have to throw a lot of extra bitrate at it. But we're talking about digital download end products here, with bitrates usually varying between 1-4 Mbps for TV 720p stuff and 2-12 Mbps for BD 720p/1080p stuff, not encodes for Blu-ray discs where the video bitrate is around 30-40 Mbps.

Because of the whole "digital download end products" thing, banding was still the most common issue with anime encodes back when everyone was doing 8-bit video, and people did a whole bunch of tricks to try to minimize it, like overlaying masked static grain on top of the video (a trick I used to use myself, and incidentally is something I've later seen used in professional BDs as well - though they seem to have forgot to properly deband it first). These tricks worked to a degree, but usually came with a cost in picture quality (not everyone liked the look of the overlaid static grain, for example). Alternatively, the videos just had banding, and that was it.

Over the years, our video processing tools got increasingly sophisticated. Nowadays the most used debanding solutions all work in 16-bit, and you can do a whole bunch of other filtering in 16-bit too. Which is nice and all, but ultimately, you had to dither it down to 8-bit and encode it, at which point you ran into the issue of gradient preservation once again.

Enter 10-bit encoding: With the extra two bits per channel, encoding smooth gradients suddenly got a lot easier. You could pass the 16-bit debanded video to the encoder and get nice and smooth gradients at much lower bitrates than what you'd need to have smooth dithered gradients with 8-bit. With the increased precision, truncation errors are also reduced and compression efficiency is increased (despite the extra two bits), so ultimately, if you're encoding at the same bitrate and settings using 8-bit and 10-bit, the latter will give you smoother gradients and more detail, and you don't really need to do any kind of processing tricks to preserve gradients anymore. Which is pretty great!

Now, obviously most people don't have 10-bit screens or so, so dithering the video down to 8-bit is still required at some point. However, with 10-bit, this job is moved from the encoder to the end-user, which is a much nicer scenario, since you don't need to throw a ton of bitrate for preserving the dithering in the actual encode anymore. The end result is that the video looks like such an encode on a 8-bit (or lower) screen, but without the whole "ton of bitrate" actually being required.

So the bottom line is that even with 8-bit sources and 8-bit (or lower) consumer displays, 10-bit encoding provides notable benefits, especially for anime. And since anime encoders generally don't give a toss about hardware decoder compatibility (because hardware players are generally terrible with the advanced subtitles that fansubbers have used for a long time), there really was no reason not to switch.

Most consumer sources are also-limited to 8-bit precision. 10-bit encodes from these are made using a fancy temporal debanding filter, which only works in certain cases.

Keep in mind that there is a range expansion when converting to monitor sRGB. Also several video players now support dithering. So the debanding effect can often be quite visible.

Also, x264 benefits from less noise in its references, which means 10-bit gets a small compression performance bump.

Didn't know anime enthusiasts have an interest in 10-bit video, but I've always thought it's a great shame most consumer camera codecs are limited to 8 bits.

I have been waiting for h265 to surface in consumer cameras, and hopefully at 10 and 12 bit depths as options. Even if current monitors don't do 10 bits, there is so much information you can pull out of the extra data.

Many camera sensor chips can output 10 or 12 bits, it's a shame it doesn't get recorded on most cameras.

Hopefully Rec2020 on TV's and new blu ray formats will also push cameras.

Well, this was not the first time they adopted something early and left most of the world behind: widespread adaptation of the ogm then the mkv containers, followed by the complete swap to h264 over XviD, and I believe the abandoning of "must fit either onto a CD or DVD exactly" ripping and encoding rule was also a first.
> I'm essentially certain they cannot view in 10-bit. Only workstation GPU's (Quadro and FirePro) output 10-bit (consumer GPU's are intentionally crippled to 8-bit)

That's not what NVidia says:

"NVIDIA Geforce graphics cards have offered 10-bit per color out to a full screen Direct X surface since the Geforce 200 series GPUs"

http://nvidia.custhelp.com/app/answers/detail/a_id/3011/~/10...

> and I can't really think of any monitors that have 10-bit panels under about $1000

$394

http://www.amazon.com/Crossover-27QW-DP-27-Inch-2560x1440/dp...

$450

http://www.amazon.com/Achieva-Shimian-QH2700-IPSMS-Backlight...

I can't find any details on the Crossover, but the Achieva is an 8bit panel with FRC to attain effective 10-bit. It's not true 10 and I wouldn't recommend using it in a professional setting (benchmark being a true 10bit panel with usually 12 bit processing such as Flanders Scientific http://www.flandersscientific.com/index/cm171.php).

And while NVIDIA does output 10bit on GeForce, it's limited to

> full screen Direct X surface

which means no OpenGL (a must with pro apps). I suppose it might be possible to use a DirectX decoder in your video player to output 10bit video but I haven't heard anyone doing it or tried myself.

All that said, I was originally talking about consumer facing 10bit, so the monitors are probably a valid point. As someone who cares about color reproduction, I hope to see more of that - especially as we move towards rec2020.

I believe most, if not all Radeon cards can do 10bit regardless of output type (OpenGL, DX, etc) with a registry hack.

https://www.techpowerup.com/forums/threads/do-10-bit-monitor...

iPhone 6 and 6 Plus use H.265 for FaceTime: https://www.apple.com/iphone-6/specs/ ("FaceTime over cellular uses H.264/H.265")
My Vizio p series handles it fine (4K netflix).
Came here to say "A little company called netflix uses it" but you beat me to it, good sir