Hacker News new | ask | show | jobs
by nonameiguess 1887 days ago
File types have specifications and at least some of them can compose. One of the more bizarre things I ever had to do for work was make a general purpose file comparison tool that could ignore certain classes of differences that were expected in order to validate program outputs against regression test keys. In one cases, xml files had base64 encoded pngs embedded in them, and the pngs had metadata chunks with xml embedded.

Another interesting case is the NGA is required to keep an archive of every geointelligence product created in the US for X years and the library only supports the nitf file format. This format specifies that images are stored as block segments that can either compose as tiles or as overlays, or in some cases can be non-viewable pixel data (i.e. something that can be used for downstream machine processing but isn't meant to be meaningful to humans and if you loaded it into a viewer it would look like nothing - consider something like a map of dead pixels).

One consequence of this is that if you want to disseminate something more lightweight to your customers ordering imagery products, like a simple jpeg so they don't need nitf viewers to see them, we make the jpeg and embed it as a nitf image segment with metadata indicating an actual nitf viewing program should ignore it.

There are Python build tools that get around the fact that their config isn't compatible with toml out of the box by embedding into pyproject.toml by adding a key value pair where the key is "legacy-ini" and the value is a string enclosing the entire config ini for the other tool. Or take secrets in Kubernetes, which specify key value pairs where the value is always a base64 encoded string, allowing you to store anything at all as a secret, including another yaml file, even a Kubernetes manifest. We presently use this where I'm at to get the Anchore Enterprise license into the cluster, and the license is in yaml format. Base64 encoded yaml embedded into another yaml file.

Of course, none of this all that exotic. File types are just a subset of data types, and obviously types compose, both functionally and in terms of building compound types, including recursive types, from simple types. I think the real issue this guy is seeing is due to temporal logic introducing unique challenges, not that specifications in general don't compose. Specifications without temporality compose pretty easily in many cases. This is why regular expressions and context free grammars and compilers are possible.

Nonetheless, the same issue would come up here. Given we can specify programming languages as context free grammars and regular expressions, why can't we compose them and have a compiler that compiles both Rust and Go at the same time and makes a single executable?

The answer is we can, but the syntax doesn't actually fully specify the languages. A full specification needs to include the runtime and the ABI, and the actual behavior of those may not be fully specified in any actual specification other than the code for the respective compilers. It's not that they can't be composed. SWIG is a thing. It's just much harder than composing CFGs. You probably can't do it in a blog post or by hand on a piece of paper.