|
|
|
|
|
by EdwardRaff
2512 days ago
|
|
Thats true in many cases! For our malware work and research, we've gotten a lot of consistent push-back that n-grams needed to be larger. A lot of older work (with much smaller datasets) said 15-20 grams were best, but that couldn't be tested again with a modern large corpus. A common source of that intuition was that x86 assembly is variable length, and a single instruction could be up to 15 bytes long. The primary point of our paper is that now we can circle back and really test all these hypothesis about larger n-grams for malware! And even if you only want smaller 6-grams, we get a nice big speedup too :) |
|