| There are quite a lot of inaccuracies in this post, as well as hints that don't really mean much. First up, the non-SQL-injection code doesn't specify the parameter types. As such, each length of monster_class will result in a different execution plan being compiled and cached by SQL Server. Granted, there's only 50 different kinds - but just specify the parameter types to avoid it. Sure, you could setup a VLAN, but we're lightyears from having that affect our performance at this point. Parser overhead is not an issue as long as parameter types are specified. The plan will be compiled and cached. The only advantage to stored procedures at this point is reduced network data as well as management. Page faults - Sure, but for inserts the data will most likely be in memory already as we're writing new data. And if there's enough data, the relevant b-tree pages will most certainly be hot. Varchars - Granted, avoid them unless you're storing string data. I have no idea where the "hundreds of bytes of of memory" comes from. SQL Server caches all data as pages, exact copies of what's stored on disk. Variable-length varchars have an overhead of exactly two bytes + 2/#varlengthCOlumns in total. Disk IO - The transaction log is never merged with the main data file. SQL Server uses write ahead logging, writing to the log first while modifying the data pages in memory. Now and then a checkpoint is run, writing all the dirty pages to the main data file. A log is only "merged" with the data file if the server has crashed and recovery is being run automatically during startup. "This increases the number of rows you can insert before SQL Server succumbs to memory pressure" - Since the article only deals with inserts, why are we talking about memory pressure? You'll get disk contention for sure, but memory won't realistically be an issue unless you're reading as well. There's an alternative to keeping the potentially long running transaction open - avoiding both SqlBulkCopy and transferring datatables. The BCL has a CommandSet notion that utilitizes a TDS feature that allows batches to be streamed to SQL Server. Unfortunately you'll have to use reflection to get at it (https://github.com/ayende/rhino-etl/blob/master/Rhino.Etl.Co...), but the code has been stable for several releases and is very simple to interface. It basically allows you to create normal SqlCommands and just send a large batch of them at once, all being executed atomically with minimal locking. Finally, with the micro optimizations being mentioned, I'm surprised he doesn't recommend to "SET NOCOUNT OFF". Overall, some good points with some inaccuracies. I have a hard time with the closing statement "As you can see, you don’t need to abandon SQL Server to get massive performance improvements.". These are decent improvements, but this is such a simple scenario that I find it pretty optimistic to let this define whether to abandon SQL Server or not. The RDBMS problem is rarely with inserting enough data, but with reading data at the same time while avoiding locking. |
That is not the case if you have indexes. Most secondary indexes are high-entropy (if they weren't, you probably wouldn't be storing them, is an emotional proof), so inserts on larger-than-memory data sets almost always incur a disk I/O, in a b-tree, at least.
Shameless plug: I have a talk about how to deal with this: http://www.youtube.com/watch?v=q6BnG74FZMQ