Hacker News new | ask | show | jobs
by mustache_kimono 800 days ago
Author here. Surprised as any one this on the front page of HN.

> I don’t understand the use case. You go through the trouble of generating checksums when copying videos, but don’t want to regenerate the checksums when modifying the metadata?

Appreciate the Q, but I suppose I really don't understand it. Could be the hour?

I don't want to regenerate checksums once I know the underlying bitstream checksums are correct. I want to know the audio/video/whatever is the same as the day I received it, and I want to perform the exact same check to confirm. If I change the metadata, and I need to regenerate a checksum, I don't know that.

> If you are this concerned about data corruption why not check the metadata also?

One should of course. Please use ZFS, etc. There is perhaps no greater ZFS fan than me. See: https://github.com/kimono-koans/httm

But now imagine rewriting a stream to a different container. For instance, MP4 to MKV, or ALAC to FLAC. Wouldn't it be nice to know the bitstreams are the same?

2 comments

I hope it was a pleasant surprise, I found this from a data archivist perspective. I can't believe that only FLAC had the foresight to checksum large binary data in the media codec space.

I notice that LLM releases will include md5/sha256 for the binary data, while excluding the json metadata. I really wanted MKV to have this functionality.

> I hope it was a pleasant surprise

Of course, very pleasant!

Is the idea that there's some inherent mistrust of `-c copy` or that sometimes downstream options affect it basically invalidating it?

Edit: I see the metadata benefit in the README, just curious if there's some additionally pessimistic perspective.

> Is the idea that there's some inherent mistrust of `-c copy` or that sometimes downstream options affect it basically invalidating it?

Yes, that's one reason.

I suppose the main mistrust along those lines is -- I have all these programs which manipulate my media metadata and sometimes changes the names or locations of my media files. And I'm basically fine with lots of small automated changes to my metadata from programs like `beets`. I'd just like some assurance whatever they spit out is what I started with.

With respect to metadata more specifically, if someone cleans up the metadata on an album or adds additional information, or album art, this shouldn't invalidate any checksum.

Network transfers of media could certainly benefit from this. If I send a ALAC album to someone, and they open it 3 months later, they should be able to know what I sent is what they are listening to, even after they retagged it.