Hacker News new | ask | show | jobs
by nicklecompte 858 days ago
This comment from kevincox[1] says the answer is a hard "no":

> Worse it seems that for unknown formats it confidently claims that it is one of the known formats. Rather than saying "unknown" or "binary data".

There are other comments in this thread that make me think Google contaminated their test data with training data and the 99% results should not be taken at face value. OTOH I am not particularly surprised that Magika would be better than the other tools at distinguishing semi-unstructured plain text e.g. Java source vs. C++ source or YAMLs versus INIs. But that's a very different use case than many security applications. The comments here suggest Magika is especially susceptible to binary obfuscation.

[1] https://news.ycombinator.com/item?id=39395677