Hacker News new | ask | show | jobs
by dvt 2248 days ago
What I find most interesting about this bugfix is simply how terrible of a source Wikipedia is and, at the same time, how valuable professional insight can be (e.g. someone that knows what they're talking about). And to add insult to injury, someone quickly edited Wikipedia (for the "street cred" I'm sure):

> The original paper contained errors for computing the table of pattern shifts which was corrected by Wojciech Rytter in 1980 [1]

But that's still actually wrong. Computing the table was not even discussed in the "original paper" -- see the papers linked by @oefrha in this thread. They're two different 1977 papers!

[1] https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-sea...

2 comments

Yeah I wrote that Wikipedia article, as much as a single person can. Used the textbook Strings, Trees, and Sequences by Gusfield; it never made mention of Rytter. I wouldn't say Wikipedia is a "terrible" source, though; compared to what? Papers? Textbooks? During the course of my career I've found major technical errors in about half of CS papers. Now the error is being documented & fixed in the Wikipedia article.

My main takeaway was that people should avoid BM in favor of KMP as much as possible if you're writing a string search function, because you're guaranteed to have some bugs when implementing BM. Writing your own string search function seems nearly on the same level as implementing your own cryptographic functions, though.

Interestingly someone reported a bug in my BM sample code on the wikipedia article talk page some years ago (still not yet fixed) which might be related to this Rytter issue. And someone forked my wikipedia article code repo yesterday, which I thought was very weird; must be related to this.

I guess coming to a further defence of wikipedia: as you read technical books & papers (or philosophy!) you see knowledge is extremely unevenly-distributed in society. Much of it is locked up inside these dense, expensive books (if you're lucky!) and even more of it is sitting undigested in papers written for people with five years of postgrad education, and even more is in expert's brains and never put to paper. These barriers stay up for a long time, decades even. We just witnessed one such barrier falling. Wikipedia is an optimal resting place for such liberated knowledge, and I would go so far as to say knowledge that is not on Wikipedia is not yet liberated.

> you're guaranteed to have some bugs when implementing BM. Writing your own string search function seems nearly on the same level as implementing your own cryptographic functions, though.

And that’s the Curse Of The Standard Library Implementers: we’re the ones that have to implement Boyer-Moore, floating-point string conversions, etc. so everyone else can use them :-)

Why aren't there projects amongst the standard library maintainers to make some standard tests and maybe even APIs? That way you'd know that every library implements each algorithm correctly.
We actually do use libc++'s test suite, which has found several bugs, but not this one (apparently because they don't yet have std::boyer_moore_searcher).
He meant the original paper for calculating delta2 which was merely referenced by the original BM paper.

Sadly this seems to be the pattern for most everything coming from 'academia' ever.