|
|
|
|
|
by logicallee
3325 days ago
|
|
There's nothing to index. How could it have found my Shakespeare quote via an index? It consisted entirely of words 'from what it is to a' but produced only the Shakespeare quote. I don't see how it could have indexed anything.... it must have done a join. (Which makes sense given the 30+ seconds I had to sit and wait before it returned its answer, while also reporting the time it took to produce it. What else could it have been doing?) By the way I believe I wanted to know whether it would return the Shakespeare quote at all. If you mean that it might have cached the results of the query, I doubt anyone else queried that exact phrase, other than me. |
|
I'm guessing you're most familiar with btree indexes as present and default in many SQL solutions, which are good for quickly answering exact, greater/less matches. There are dozens of data structures useful for indexing, some of which are built to index full text documents. For an example, check out the gin and gist indexes in Postgres [1].
It's my understanding that database indexing and index compression was a primary differentiator Google excelled at from the beginning. They could beat others at fractions of the typical cost because they didn't need data centers to store and query huge quantities of documents.
Seriously, there's no way even Google could intersect the sets of all crawled web documents containing those individual words in 30 seconds, much less two seconds.
[1] https://www.postgresql.org/docs/current/static/textsearch-in...