|
|
|
|
|
by EdwardRaff
3119 days ago
|
|
At a very high level, yes. But the same could be said for anybody in the AI-AV space. At a more technical level, the approach we take in this paper (and most of my research) is fairly orthogonal to what most AV vendors are doing. Even compared to the AI based solutions. The idea here was to throw away everything we know about the file being a valid Windows PE binary, and try and let the network learn what it needs on its own. Its making the problem harder, but allows us to re-purpose the same code for PDFs, Word Docs, RTF - basically any file format we can get data for. This gives us a lot of potential flexibility that others don't have. |
|
It doesn't seem like you've looked into this. The interesting data in PDFs and Office docs is all encoded, often multiple times. E.g. OOXML docs are ZIP files and store macros in an OLE container, where they're further encoded in streams.
You can kind of get away with not parsing PE files, although you're missing out in that case. For PDFs, Office docs, and most other non-binary, non-script types, though, you have no choice but to parse.