I don’t understand the use case. You go through the trouble of generating checksums when copying videos, but don’t want to regenerate the checksums when modifying the metadata? If you are this concerned about data corruption why not check the metadata also?
Author here. Surprised as any one this on the front page of HN.
> I don’t understand the use case. You go through the trouble of generating checksums when copying videos, but don’t want to regenerate the checksums when modifying the metadata?
Appreciate the Q, but I suppose I really don't understand it. Could be the hour?
I don't want to regenerate checksums once I know the underlying bitstream checksums are correct. I want to know the audio/video/whatever is the same as the day I received it, and I want to perform the exact same check to confirm. If I change the metadata, and I need to regenerate a checksum, I don't know that.
> If you are this concerned about data corruption why not check the metadata also?
But now imagine rewriting a stream to a different container. For instance, MP4 to MKV, or ALAC to FLAC. Wouldn't it be nice to know the bitstreams are the same?
I hope it was a pleasant surprise, I found this from a data archivist perspective. I can't believe that only FLAC had the foresight to checksum large binary data in the media codec space.
I notice that LLM releases will include md5/sha256 for the binary data, while excluding the json metadata. I really wanted MKV to have this functionality.
> Is the idea that there's some inherent mistrust of `-c copy` or that sometimes downstream options affect it basically invalidating it?
Yes, that's one reason.
I suppose the main mistrust along those lines is -- I have all these programs which manipulate my media metadata and sometimes changes the names or locations of my media files. And I'm basically fine with lots of small automated changes to my metadata from programs like `beets`. I'd just like some assurance whatever they spit out is what I started with.
With respect to metadata more specifically, if someone cleans up the metadata on an album or adds additional information, or album art, this shouldn't invalidate any checksum.
Network transfers of media could certainly benefit from this. If I send a ALAC album to someone, and they open it 3 months later, they should be able to know what I sent is what they are listening to, even after they retagged it.
You want to be able to change the container while making sure you do not alter the contained stream.
I've always thought it would be simpler if we used different files for the stream and the meta data, but that's probably just because I never looked more closely into it.
From this perspective it may, but now you have multiple files you need to keep track of and it’s not clear if one is missing depending on the underlying stream structure (i.e. multiple audio or video streams instead of just 1 of each)
This uses the hash muxer in ffmpeg, which consolidates all streams into one. Use the streamhash muxer to emit hashes per-stream, which can isolate any changes to specific streams.
--only=<ONLY>
hash the an input file container's first audio or video stream only, if available. dano will fall back to default behavior, if no stream is available. [possible values: audio, video]
No, 16-bit PCM is the default audio codec. If no `-c` is specified for a stream, ffmpeg will encode using the default codec. But if `-c X` is declared where X=`copy` or something else, then that is honored.