|
|
|
|
|
by cleverfoo
3688 days ago
|
|
Let me see if I can try to simplify the underlying problem here (I dabble in this space): Little bit os background: writing pattern matching signatures is hard, adding a bunch of "known malicious" hashes to your malware database is easy. So, company A with a staff of folks writing pattern matching signatures has its engine added to VirusTotal and virus total shares/sell hashes found by that engine to folks that pay for its API. Company B, without a staff of engineers writing pattern matching signatures, signs up for VirtualTotal API and creates its malware database based purely on the hashes other actual engines create. Two important things to keep in mind, when this happens at the scale of VirusTotal (basically all real engines are participating) the end result "hash database" is, essentially, bullet proof since it's likely that any sample used to test its effectiveness will be run by VirusTotal first. We (I run scanii.com a malware/content detection API service) run into this all the time with folks either abusing or just not understanding the reason VT exists. |
|
Nope. There are lots of situations where exploit kits will automatically re-compile and re-pack malware on-demand in ways sufficiently complex that they eliminate any signatures and evade AV detection.
A lot of companies are using VT as a filter for known bad to prevent even having to deal with such samples, but many unknown bad samples still exist and make it past the VT engine, only to be picked up by behavioral detection.
Conversely, a small number of known bad samples that are caught by VT can slip by behavioral detection engines that are gated by VT, causing infection (when VT is removed) where it would otherwise be prevented. Of course, in these cases, it is the fault of the behavioral vendor for not having sufficient behavioral detection, but relying on VT does make that easier. For instance, many companies have a loop where they can take samples detected by VT, run them constantly through an automated analysis lab, and see whether or not their behavioral analysis detects each sample. In the cases where it fails, that sample has a direct line to analysts who can reverse engineer it, come up with new behavioral patterns, and add it to training sets for any machine learning based detection. In this sense, not having VT support makes everything less safe.
The next issue is that companies like this simply can't be run on VT's platform because they're too heavy, as the article mentions. I think a good middle ground here would be to turn this analysis loop into a feedback loop by adding one more step: in cases where behavioral detects and VT does not, submit the report to VT in a standardized format so it can be added to their corpus.