Hacker News new | ask | show | jobs
by burntsushi 2691 days ago
While we're wishing for things, I'd love to see you write up how some of the algorithms (like Teddy) work. Hyperscan is clearly a treasure trove of them, but the effort required to extract its secrets is quite large! For example, I've spent quite a bit of time perusing its code (and your excellent mailing list posts), and I still don't think I could describe its execution flow at a high level.

As far as literals go, I was doing that well before I heard about Hyperscan. I got the idea from hints in the literature. (On mobile or else I'd provide a link.)

1 comments

We have a paper accepted to NSDI, so this will hopefully shed some light on things. A conference paper is, sadly, too small to fit more than a few subsystems so a writeup of Teddy isn't included.

As I don't work for Intel any more, it's unlikely that I will put in the effort for anything significantly more comprehensive. I am considering another building another regex matcher but it wouldn't be the encyclopedic list of optimizations approach. Paul Terra refers to Hyperscan as the "Dwarf Fortress of regular expression matchers". I have some new ideas I'd like to try out, in any case.

I wasn't suggesting that your use of literal factoring comes from Hyperscan, only the Teddy matcher. Literal factoring seems an old technique and I don't know the genuinely first cite of that.

Gotya. Thanks for the clarification! I look forward to reading your paper! I'll hold out hope for the folks working on Hyperscan at Intel to do some more in depth write ups.