Hacker News new | ask | show | jobs
by dylan604 989 days ago
It sounds like a misunderstanding of the MPEG concept. For an encode to be made efficiently, it needs to see more than one frame of video at a time. Sure, I-frame only encoding is possible, but it's not efficient and the result isn't really distributable. Encoding wants to see multiple frames at a time so that the P and B frames can be used. Also, to get the best bang for the bandwidth buck is to use multipass encoding. Can't do that if all of the frames don't exist yet.

You have to remember how old the technology you are trying to use is, and then consider the power of the computers available when they were made. MPEG-2 encoding used to require a dedicated expansion card because the CPUs did have decent instructions for the encoding. Now, that's all native to the CPU which makes the code base archaic.

1 comments

No doubt that my limited understanding of these technologies came with some naive expectations of what's possible and how it should work.

Looking into it, and working through it, part of my experience was a lack of resources at the level of abstraction that I was trying to work in. It felt like I was missing something, with video editors that power billion dollar industries on one end, directly embedding ffmpeg libs into your project and doing things in a way that requires full understanding of all the parts and how they fit together on the other end, and little to nothing in-between.

Putting a glorified powerpoint in an mp4 to distribute doesn't feel to me like it is the kind of task where the prerequisite knowledge includes what the difference between yuv420 and yuv422 is or what Annex B or AVC are.

My initial expectation was that there has to be some in-between solution. Before I set out, what I had thought would happen is that I `npm install` some module and then just create frames with node-canvas, stream them into this lib and get an mp4 out the other end that I can send to disk or S3 as I please.* Worrying about the nitty gritty details like how efficient it is, many frames it buffers, or how optimized the output is, would come later.

Going through this whole thing, I now wonder how Instagram/TikTok/Telegram and co. handle the initial rendering of their video stories/reels, because I doubt it's anywhere close to the process I ended up with.

* That's roughly how my setup works now, just not in JS. I'm sure it could be another 10x faster at least, if done differently, but for now it works and lets me continue with what I was trying to do in the first place.

This sounds like "I don't know what a wheel is, but if I chisel this square to be more efficient it might work". Sometimes, it's better to not reinvent the wheel, but just use the wheel.

Pretty much everyone serving video uses DASH or HLS so that there are many versions of the encoding at different bit rates, frame sizes, and audio settings. The player determines if it can play the streams and keeps stepping down until it finds one it can use.

Edit: >Putting a glorified powerpoint in an mp4 to distribute doesn't feel to me like it is the kind of task where the prerequisite knowledge includes what the difference between yuv420 and yuv422 is or what Annex B or AVC are.

This is the beauty of using mature software. You don't need to know this any more. Encoders can now set the profile/level and bit depth to what is appropriate. I don't have the charts memorized for when to use what profile at what level. In the early days, the decoders were so immature that you absolutely needed to know the decoder's abilities to ensure a compatible encode was made. Now, the decoder is so mature and is even native to the CPU, that the only limitation is bandwidth.

Of course, all of this is strictly talking about the video/audio. Most people are totally unawares that you can put programming inside of an MP4 container that allows for interaction similar to DVD menus to jump to different videos, select different audio tracks, etc.

> This sounds like "I don't know what a wheel is, but if I chisel this square to be more efficient it might work". Sometimes, it's better to not reinvent the wheel, but just use the wheel.

I'm not sure I can follow. This isn't specific to MP4 as far as I can tell. MP4 is what I cared about, because it's specific to my use case, but it wasn't the source of my woes. If my target had been a more adaptive or streaming friendly format, the problem would have still been to get there at all. Getting raw, code-generated bitmaps into the pipeline was the tricky part I did not find a straightforward solution for. As far as I am able to tell, settling on a different format would have left me in the exact same problem space in that regard.

The need to convert my raw bitmap from rgba to yuv420 among other things (and figuring that out first) was an implementation detail that came with the stack I chose. My surprise lies only in the fact that this was the best option I could come up with, and a simpler solution like I described (that isn't using ffmpeg-cli, manually or via spawning a process from code) wasn't readily available.

> You don't need to know this any more.

To get to the point where an encoder could take over, pick a profile, and take care of the rest was the tricky part that required me to learn what these terms meant in the first place. If you have any suggestions of how I could have gone about this in a simpler way, I would be more than happy to learn more.

using the example of ffmpeg, you can use things like -f in front of -i to describe what the incoming format is so that your homebrew exporting can send to stdout piped to ffmpeg where reads from stdin with '-i -' but more specifically '-f bmp -i -' would expect the incoming data stream to be in the BMP format. you can select any format for the codecs installed 'ffmpeg -codecs'