|
|
|
|
|
by marginalia_nu
1416 days ago
|
|
Might just be to reduce the size of the index. Not sure if they still do, but they used to map keywords to integer identifiers, instead of using the actual string value (string indices get very big). Page and Brin themselves explain it here[1]. I do the same in my search engine. Problem is there are a lot of junk identifiers, so there's a point to reducing the scope by eliminating probable noise-keywords that are unlikely to ever be relevant to any search. UUIDs and hashes would probably fall into that scope, since they have a very large namespace that can very easily gunk up the lexicon with words that are never ever going to be relevant. You'd probably want to keep the word identifier 32 bits if you can get away with it, but maybe 64 bits for a global search engine like Google. [1] http://infolab.stanford.edu/~backrub/google.html (section 4.2.4) |
|
You have a great ability to break things down in a way that makes sense.