Hacker News new | ask | show | jobs
by phoboslab 1226 days ago
Curiosity mostly. As stated in the article, Opus is excellent and better than QOA in every way except in complexity (and as a result, performance). A possible application for QOA is games, where you need to play dozens of audio files immediately.

I haven't done any formal benchmarks, but with a simple `time` on the command line QOA encodes 10x faster and decodes 7x faster than Opus.

QOA should be quite suitable for SIMD optimizations, which would improve performance even more. Still on my todo list.

3 comments

I'd be curious to know if SIMD cares about the big-endian design decision. You can always do SIMD shuffles near where you load and store the bytes, but it'd be simpler and probably faster if you don't.

Similarly, BSWAP / MOVBE might be cheap or free on x86_64 but IIUC RISC-V doesn't guarantee a dedicated instruction for that (and RISC-V is little-endian). "Does [endianness] really matter?" It might, for embedded devices a few years from now. I can't really say without real hardware to get real CPU profiles. But that question is entirely avoidable by just picking little-endian.

Encoder speed in opus is configurable. If you're comparing performance with something less space efficient you probably should be setting the encoder at the fastest setting. :)

Though of course, it's no shock that something simple could be fast. There are also lossless codecs which are much faster to decode than opus.

> (and as a result, performance)

Citation, please.

Opus (at that time it was called CELT) does have more resource requirements in terms of memory than an MP3 decoder. However, I have run the decoder on things as small as a 33MHz ARM7 and still had lots of CPU left over. An MP3 decoder had no hope on that system.

> (and as a result, performance)

> Citation, please

The parent did provide some data. Admittedly very simple.

Couls you try a different simple benchmark that shows the opposite?

That would be interesting.

Opus can encode a packet in real-time at 10ms per packet at 48kHz using a 66MHz MIPS32 chip. Decode is even faster and can be done on a 33MHz ARM7 with CPU left over. Decode of Opus (nee CELT) is sufficiently fast that you can unpack it in real-time on an audio thread callback on Android.

TOA is citing about 300kilobits per second which is roughly 30kilobytes per second which is too much data for a 33MHz ARM7 to be able to process let alone do anything to it.

The reason for "The Triangle of Neglect" is that your chips are either under 100Mhz (often significantly as you are on bare metal) and this is too much data or above 1000MHz (you are running Linux) and nobody cares.

ADPCM was more useful back when chips didn't have hardware multipliers.