Hacker News new | ask | show | jobs
by shazmosushi 1931 days ago
I love FFMPEG, but it has truly awful handling of timestamps by default. You can't easily extract a clip using an exact timestamp because it rounds to the nearest keyframe, which may be many seconds earlier. It's such a powerful command-line tool, but I find the user-interface far more difficult than it needs to be. In "git" terminology, there's so much plumbing but not enough porcelain

I wish there was a scriptable / command-line interface for HandBrake (which is already based on FFMPEG) [1], where the user just provides the high-level commands: 99% of the time I want to specify high-level commands: extract clip from this timestamp, including SRT subtitles, crop to this geometry, and shift the audio by 3 seconds.

[1] https://en.wikipedia.org/wiki/HandBrake

4 comments

I use ffmpeg for encoding audio and metadata extraction... very powerful and a really sophisticated tool, but handling timestamps I agree with you. Some caveats that I came across along the way:

- Inaccurate time handling (https://github.com/yermak/AudioBookConverter/issues/21#issue...)

- Incorrect handling of mp3 chapters https://github.com/sandreas/m4b-tool/issues/71#issuecomment-...

If anyone who is interested in using ffmpeg in a docker container (without the dependencies / compiling stuff), this alias is pretty useful (with relative paths ;-):

  alias ffmpeg='docker run --rm -u $(id -u):$(id -g) -v "$PWD:$PWD" -w "$PWD" mwader/static-ffmpeg:4.3.2'
ffmpeg seeks accurately when transcoding. [1] Cutting on non-keyframes when stream copying results in broken video until the next keyframe.

Handbrake does have a CLI. [2] I haven't used it and I'm not sure what advantage it might have over ffmpeg. I personally use mkvmerge or ffmpeg for my muxing/cutting and VapourSynth for encoding.

[1] https://trac.ffmpeg.org/wiki/Seeking

[2] https://handbrake.fr/docs/en/latest/cli/cli-options.html

Yeah, it is literally not possible to not seek to keyframe if you are stream copying.

There are however, plenty of great software that can but at any frame and only re-encode the frames that are outside the whole GOP. Most of them are commercial though, I haven't find one that is free and good.

----

Also, seeking in FFMPEG in practice, is actually more complicated than the guide [1] you linked. Below is a note I keep for own reference for keyframe-copy. Hope someone will find it useful.

How to keyframe-cut video properly with FFMPEG

FFMPEG supports "input seeking" and "output seeking". The output seeking is very slow (it needs to decode the whole video until the timestamp of your -ss) so you want to avoid it if unnecessary.

However, while -ss (seek start) works fine with input seeking, "-to/-t" (seek ending) is somehow vastly inaccurate in input seeking for FFMPEG. It could be off by a few seconds, or sometimes straight up does not work (for some mepeg-ts files recorded from TV).

The best of the two worlds is to use input seeking for -ss and then output seeking for -to. However, this way, the timestamp will restart from 0 in output seeking. So instead of using -to, you should calculate -t (duration) yourself by subtracting -ss from -to, and use `-t duration` instead. Below is a quick Python script to do so.

https://gist.github.com/fireattack/9a100c5a200154937babd1823...

(You can also try to use -copyts to keep timestamp, but not recommended because it doesn't work if the video file has non-zero start time.)

>The best of the two worlds is to use input seeking for -ss and then output seeking for -to.

Could you clarify what you mean with "use output seeking for -to"?

From your Python script it seems that you're just using input seeking and then specifying the duration in seconds with `-t`, which is actually the same as using `-to` when doing input seeking.

Also, input seeking should be inaccurate when doing stream copy, so I'm not sure your script actually works as expected?

(And unless I'm missing something, it seems that all of this is well-explained in the ffmpeg guide linked above.)

Thanks!

I think I know where you're confused: from the guide it looks like there is only a difference where you put -ss; but in reality, where you put -t/-to matters too.

In my Python script, I did input seeking for -ss (start point) part, and then output seeking for -t part (end point). As you can see, the -ss part is before -i {inputfile}, and -t is after.

-ss 1:00 -i file -t 5

is NOT the same as

-ss 1:00 -t 5 -i file.

The latter has a bug that happens frequently when I'm trimming MPEG-TS files recorded from HDTV. It literally doesn't stop at the -t/-to timestamp for reason I don't know. And it only happens when stream copy.

Below is a quick showcase: t.ts is the source, and the filenames show how I generate them with FFMPEG (for example, ss_t_i means input seeking -ss first, then -i t.ts, then -t 1:00).

https://i.imgur.com/lLUSzEM.png

As you can see, if I use -t/-to before -i, it doesn't cut the file properly.

>Also, input seeking should be inaccurate when doing stream copy

Yeah, it's not frame accurate, can only cut at keyframes, but enough for my application. By the way the same inaccuracy exists for output seeking if you're doing stream copy.

Edit: I just reported the bug to ffmpeg tracker: https://trac.ffmpeg.org/ticket/9141
I battled with this a lot with https://github.com/umaar/video-everyday and still haven't found a better solution.

What I don't understand is, how can professional video editing tools trim accurately (and very quickly)? What are they doing differently to ffmpeg?

If do things the "fast way" with ffmpeg, the exported video has random black frames which I think is related to the keyframe issue you mention. If I do things the "slow way" (e.g. accurately) with ffmpeg, it takes a huge amount of time (at least with large 4k videos). But I don't understand how I can drop that same 4k video into Screenflow, trim 1 second out of it and export it in a matter of seconds.

All of the proprietary tools I know of for doing frame-perfect cuts (VideoRedo, TMPGEnc, SolveigMM) work by determining (guessing?) the original encoding parameters and then only reencoding the first and last GOP. The rest of the video is just remuxed.
x264 encoded streams have the original encoding parameters included by default.
Encoding settings metatag can be striped.

Regardless, I don't think these software are "matching" anything. TMPGEnc for example has settings to choose what quality you want for these re-encoded frames.

The parameters being matched would be those that maintain decoder config. Usually, bitrate/quantizer values don't come into that.
If you have sufficient scratch disk space you can absolutely use ffmpeg to take input of a h264, h265, vp8 or vp9 file, or just about anything else, and write it out to a y4m format uncompressed, raw yuv420p or yuv422 file. From there you can use just about any industry standard commercial or free GUI based video editing tool (kdenlive, etc) to extract a clip, down to per-frame precision.
But why is that multi-step process even necessary?

There should be an quick command-line utility to concatenate multiple video files according to exactly the timestamps the user has provided. It's such a common operation.

There's no reason that the tool can't simply do a streaming decode of multiple different file formats and concatenate the video and sub-second precision. If input video resolutions are different, scaling the smaller video to the largest resolution is what the user almost always wants.

I get that FFMPEG is a "plumbing" CLI tool, but a "porcelain" wrapper would be amazing!

I understand that you want to do that, but any attempt to do so will be decidedly non-optimal due to how keyframes and lossy encoders work.

Even if your two files were encoded with x265 at exactly the same bitrate. It's a much more complicated problem than it appears at first glance, once you really dig into the command line options and encoding parameters of codecs like x264, x265 and vp9.

It's not as simple as concatenating two files together. You can also select down to per-frame precision using kdenlive and loading different x264,x265,vp8,vp9 files into it and cutting/editing them together. You will then need to re-encode the resulting output. kdenlive is ultimately a nice GUI front end on top of this:

https://www.mltframework.org/

When I ran into this, it came down to if I wanted to splice two videos together, or re-encode them.

Due to how keyframes work, cutting on keyframe boundaries is a lot faster and easier and doesn't require re-encoding in many cases. This is the default for the segment muxer.

Cutting between keyframes is a fair bit more effort, and requires re-encoding, which is why I guess it's not the default.

I'm not a video expert, but one thing I've noticed is that there seems to be a lot of discrepancies between video files and the way programs use them.

I think it might be differences between how the the container describes the video and the video itself, and which one is chosen as the truth during operations.

the "video itself" doesn't really exist. If you're on Windows and use MPC, you can test this yourself - use WMV9 as a renderer, take a PNG screenshot, then switch to NVENC, and take a screenshot at the same timestamp. You'll notice that on most videos, the screenshots are not the same (with NVEnc introducing macroblocking in darkness, which WMV9 gradients out properly). Using the rendered video as a source of truth instead of the source material itself would be as big of a mistake as using Photoshop on a JPEG.