|
|
|
|
|
by taeric
1888 days ago
|
|
My gut is that "long string" just isn't that long nowadays. Even searching for a full UUID in a log doesn't do much to change the size of the needle relative to cache sizes. Though, maybe it is enough to help with some of the wide instruction choices? |
|
I mean yes of course in specific scenarios certain assumptions can be made. E.g. for UUIDs you probably better off exploiting the fixed length pattern too.
But for the general/theoretical case these intuitions do not pan out usually. I mean even "memory hacks" make assumptions about the data indirectly by common hardware architecture.
That is, most data isn't random strings of course XD
However when BM is used in e.g. bioinformatics on DNA and you are still stuck with firstly finding data against evolutionary noise, when your algorithms must adapt to your hypotheses, theory becomes more relevant I assume.
I think DNA focused "string search problems" are really inspiring and have a lot of potential for mingling with philosophical fundamentals of informatics. There is something about the evolutionary emergence of "data" AFAIK no other field offers. E.g. the overlapping, extending, or contextualizing information meta-layers upon meta-layers in DNA translation and structure "specified" to no more than merely exist. Throwing Boyer-Moore at ASCII encoded sequences in FASTA files almost feels blasphemous, or the arrogantly fallacious human essence.