Hacker News new | ask | show | jobs
by Andugal 853 days ago
I have a question: Is something like Magika enough to check if a file is malicious or not?

Example: users can upload PNG file (and only PNG is accepted). If Malika detects that the file is a PNG, does this mean the file is clean?

4 comments

This comment from kevincox[1] says the answer is a hard "no":

> Worse it seems that for unknown formats it confidently claims that it is one of the known formats. Rather than saying "unknown" or "binary data".

There are other comments in this thread that make me think Google contaminated their test data with training data and the 99% results should not be taken at face value. OTOH I am not particularly surprised that Magika would be better than the other tools at distinguishing semi-unstructured plain text e.g. Java source vs. C++ source or YAMLs versus INIs. But that's a very different use case than many security applications. The comments here suggest Magika is especially susceptible to binary obfuscation.

[1] https://news.ycombinator.com/item?id=39395677

If that PNG of yours is not just an example note that you can detect easily if the PNG as any extra data (which may or may not indicate an attempt as mischief) and reject the (rare) PNGs with extra data. I ran a script checking the thousands of PNGs on my system and found three with extra data, all three probably due to the "PNG acropalypse" bug (but mischief cannot be ruled out).

P.S: btw I'm not implying using extra data that shouldn't be there in a PNG is the only way to have a malicious PNG.

The only way to do this reliably is to render the PNG to pixels then render it back to an PNG with a trusted encoder. Of course now you are taking the risk of vulnerabilities in the "render to pixels" step. But the result will be clean.

AKA parse, don't validate.

> does this mean the file is clean?

No.