|
|
|
|
|
by zasdffaa
1545 days ago
|
|
> For example, SQL Server will create a temporary index on the fly if one is missing err, I may be misunderstanding but can you explain why you feel this? I have never (IME) seen MSSQL do this and it wouldn't make sense because constructing an index needs a table scan plus a lot of work on top. Just doing a hash join is simply the better option. I mean it would be nice at times but there are traps to this which is why (again AFAICS) it's not done and would be unsafe to do without a lot of info about resources and future expected queries which the query planner just doesn't have. Happy to be set right on this. |
|
It's a temporary object, much like a temporary table that exists only in the scope of the query.
As another comment mentioned, this is what a hash-join does internally: it builds a temporary "hash index" of one input, and then uses it to look up rows while scanning through the other input.
If you looks at the query plans in SSMS, you'll occasionally see bitmap indexes as well.
The equivalent of a standard B-Tree index that you would create permanently is the "Index Spool" operator. You'll also see "Table Spool", which is basically a temporary heap.
The example in the original article was the equivalent of this loop:
That's hideously inefficient. Most databases will automatically do something like: The clever part in all of this is that you can do this two ways: build a hashtable of "a" and lookup "b" rows in it, OR build a table of "b" and lookup "a" rows in it. They're equivalent, but the performance can be wildly different.RDMBS query planners have the job of figuring out which to pick. Even if you think you can outperform the database by writing code like the above in Java or C# or whatever, you won't write out every combination and have the statistics available to choose. The database engine can and does.
SQL Server can do both steps in parallel across all CPU cores which is a topic of several PHD-level research papers. For example, hash tables can have performance issues if the same key occurs too often (e.g. many NULL columns). Balancing this across multiple cores is... complicated.