Hacker News new | ask | show | jobs
by rawgabbit 637 days ago
My tips for working with complex Stored Procedures.

1. At the beginning of the proc, immediately copy any permanent tables into temporary tables and specify/limit/filter only for the rows you need.

2. In the middle of the proc, manipulate the temporary tables as needed.

3. At the end of the proc, update the permanent tables enclosed within a transaction. Immediately rollback transaction/exit the proc, if an error is detected. (By following all three steps, this will improve concurrency and lets you restart the proc without manually cleaning up any data messes).

4. Use extreme caution when working with remote tables. Remote tables do not reside in your RDBMS and most likely will not utilize any statistics/indexes your RDBMS has. In many cases, it is more performant to dump/copy the entire remote table into a temporary table and then work with that. The most you can expect from a remote table is to execute a Where clause. If you attempt Joins or something complicated, it will likely timeout.

5. The Query Plan is easily confused. In some cases, the Query Plan will resort to perform row by row processing which will bring performance to a halt. In many cases, it is better to break up a complex stored procedure into smaller steps using temporary tables.

6. Always check the Query Plan to see what the RDBMS is actually doing.

4 comments

I've significantly improved the performance of queries by undoing someone who did #5 when it wasn't strictly needed. Sometimes breaking a query into many smaller queries is significantly less efficient than giving the query optimizer the entire query and letting it find the best route to the data.

If you've done #5 without doing #6 then you'll likely not see that you're doing something not optimal. My advice is avoid premature optimization and do things the most straight forward way first and then only optimize if needed. Most importantly, don't code in SQL procedurally -- you're describing the data you want not giving the engine instructions on how to get it.

I hate having to use a bunch of temp tables, but I regularly run into queries that would never finish if you let the query planner do its thing. Like compilers their ability is highly overrated. Meanwhile, microsoft places constant warnings against trying to even tune their query planner because it supposedly knows best.
And remember that these rules may be completely valid in one vendor's database, but another's may have very different priorities/characteristics/trade offs.

Also, the version of the database can matter, too.

1-3 are nice if you can guarantee your data is reasonably sized, but if it gets too big for your hardware taking copies of large datasets and then doing updates on large datasets can add a lot of overhead.