Hacker News new | ask | show | jobs
by squarecog 5222 days ago
You are doing regex matching in the Cascading code, but splitting on a character in the pangool code. The latter is obviously much faster. I don't know that that's the reason for the difference you observe, but it certainly can't hurt to fix that and make the user-supplied code more comparable.
2 comments

Indeed that regex was problematic because it had a bug itself. We replaced that line by RegexSplitter and updated the benchmark page. Please shout if you notice something else wrong. Thanks.
Just for clarify, split() java function is using regexp for the split as well. The code of String.split() is:

return Pattern.compile(regex).split(this, limit);

The benchmark seems fair to me.