Sunday, July 02, 2006

SQL Server Index Fragmentation

SQL Server, but like most nice tools, it doesn't come without it's short comings. Among the list of short comings is Index Fragmentation. You add an index to a table to make a certain column a good candidate to look up data with. Behind the scenes, this index is really basically just a sorted table with a lot less data than the big table so that you can lookup the record in the sorted table, and then use the key to get record from the big table. This makes things nice and fast... But over time the indexes slowly get fragmented, and selecting data gets slower and slower. Most people think that it gets slow because there is just too much data, but computers are FAST! And they are good at handling lots of data... That's why we made them :) the problem is that these indexes become so fragmented that they are basically useless.

How do you fix it?
Well before you fix it you need to know what's broken. You can use the "DBCC SHOWCONTIG" command (find details on how to use this and all commands at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_tsqlcon_6lyk.asp) this command will tell you all you need to know (and a lot that you don't need to know) about the indexes you ran it on. You can run it across the whole database, a single table, or even a single index. If you are only interested in the fragmentation, you probably want to look at the "Extent Scan Fragmentation" field that it outputs... The closer to 0, the better.

Next, if you determine you have a lot of fragmentation, you have a number of options to resolve the issue. The Naive approach is to drop the index and recreate it, and actually this will give you the best final results, but in the mean time anyone using your DB will be running without an index, and that will cause horrible performance for them. For this reason, SQL Server provides a number of other options.

My favorite is the "DBCC DBREINDEX" you just give this command an index, and it churn for a while and end up giving you a new fast index. There is also a "DBCC INDEXDEFRAG" which keeps the index, but shifts around all the info in it to try to defrag it. I don't use it, but I've heard that it has decent results, and doesn't cause as much data locking as all other solutions. The last option is recreating the index with the DROP_EXISTING clause. I'm no expert but I believe this syntax is supported by most databases, and the DBCC commands are SQL Server specific. But recreating the index, even if you use the DROP_EXISTING clause has most of the same disadvantages of dropping and recreating.

Moral of the story? You have to change the oil in your car from time to time... We don't like to do it, but we have to do it. Indexes are not much different. They require maintenance. :-p