Hacker News new | ask | show | jobs
by matt4711 5605 days ago
An illustration from the book I cited above showing the importance of the alphabet size (y-axis) and the pattern length (x-axis):

http://i.imgur.com/KGOZW.jpg

In the experiment he used patterns of different length on the same text collection. As you can see in the graph, different algorithms perform best for a certain alphabet size.

He describes the text collection as "text corpus taken from wikipedia text dump" so I'm guessing the alphabet size is around 90?

It's also probably not a good thing that all the strings he is searching for are prefixes of the same pattern.

References:

Shift-Or: http://www-igm.univ-mlv.fr/~lecroq/string/node6.html#SECTION...

BNDM: http://www-igm.univ-mlv.fr/~lecroq/string/bndm.html#SECTION0...

BOM: http://www-igm.univ-mlv.fr/~lecroq/string/bom.html#SECTION00...