| > not true for row stores because data is stored in such a way that the whole row gets accessed no matter how many columns get specified in the query. This is not true, at least not with the likes of MS SQL Server / postgres / Oracle / etc. If the query can be satisfied with just what is in an index the clustered index or heap (where all the row data is together) does not need to be referenced. Worse: if you have off-page data (NVARCHAR(MAX) columns with values that aren't very short, to give one common example in the case of SQL Server) then not only is the engine reading the CI/heap unnecessarily to serve your query but it is potentially making extra unnecessary per-value page reads on top of that. Also for more complex queries that cause the building of intermediate results in memory or worse on disk, anything where you see a sort step in the query plan for instance, you are increasing the amount of data that needs to be spooled in this way. In some circumstances this might happen to the date more than once during a single run of your query. Or if columns are actually the result of UDFs - they will be getting run per value per row even though the result is not needed, and that can also affect choices the query planner makes wrt parallelism options. You might not know you query is "more complex" either: do you know for sure that you are referencing a simple table or a view that is doing some jiggery-pokery to derive values? It might be a simple table now but that may change in later versions of the model/application, and now you have that extra computation happening for values you might not actually care about. It doesn't have to be something as complex as a table being replaced by a view that is doing extra work: the entity might later have a free-form note record attached to it in the form of a VARHCAR(MAX) column and those extra off-page reads are hitting your query's performance server-side (not just in terms of what travels over the network) when you don't actually need the results of those reads. > There are many other^H^H^H^H^H good reasons why SELECT is acceptable only in development queries* I'd say "to be avoided where possible outside of dev/test queries" as there are always exceptions where it isn't possible due to [bad] design elsewhere, but yes. > but performance is not one of them. This I disagree with. It can definitely affect performance. And anyway, it might not affect performance on this particular query, at least not by a measurable amount, being selective about what you select is a good habit to train into yourself. |
My issue with this advice is that it enforces the idea that "less columns in select list" = "less data accessed" _in all cases_, which as we all agree is not true. Even more so if you have a relatively well designed database, with no crazy amount of views on views (or any for that matter), UDFs, huge columns with binary data, etc, etc.
PS: "...result of UDFs - they will be getting run per value per row..." -> not always.. Inline tvfs get expanded, so you should probably be quite careful with all other user defined functions in SQL Server anyway.