|
|
|
|
|
by michaelt
858 days ago
|
|
> The model appears to only detect 116 file types [...] Where libmagic detects... a lot. Over 1600 last time I checked As I'm sure you know, in a lot of applications, you're preparing things for a downstream process which supports far fewer than 1600 file types. For example, a printer driver might call on file to check if an input is postscript or PDF, to choose the appropriate converter - and for any other format, just reject the input. Or someone training an ML model to generate Python code might have a load of files they've scraped from the web, but might want to discard anything that isn't Python. |
|
For that matter, the file types I care about are unfortunately misdetected by Magika (which is also an important point - the `file` command at least gives up and says "data" when it doesn't know, whereas the Magika demo gives a confidently wrong answer).
I don't want to criticize the release because it's not meant to be a production-ready piece of software, and I'm sure the current 116 types isn't a hard limit, but I do understand the parent comment's contention.