|
|
|
|
|
by stevepike
857 days ago
|
|
Oh man, this brings me back! Almost 10 years ago I was working on a rails app trying to detect the file type of uploaded spreadsheets (xlsx files were being detected as application/zip, which is technically true but useless). I found "magic" that could detect these and submitted a patch at https://bugs.freedesktop.org/show_bug.cgi?id=78797. My patch got rejected for needing to look at the first 3KB bytes of the file to figure out the type. They had a hard limit that they wouldn't see past the first 256 bytes. Now in 2024 we're doing this with deep learning! It'd be cool if google released some speed performance benchmarks here against the old-fashioned implementations. Obviously it'd be slower, but is it 1000x or 10^6x? |
|
Overall file takes about 6ms (single file) 2.26ms per files when scanning multiples. Magika is at 65ms single file and 5.3ms when scanning multiples.
So Magika is for the worst case scenario about 10x slower due to the time it takes to load the model and 2x slower on repeated detection. This is why we said it is not that much slower.
We will have more performance measurements in the upcoming research paper. Hope that answer the question