Introduction to Indexes - Part IV

on August 01, 2010

    

Considerations for selecting the clustering key


There are two main schools of thought in what makes a good clustering. One school says to put the clustered index on the column or set of columns that would be most useful for queries, either frequently run queries or ones doing queries of large ranges of data. The other school says to use the clustered index primarily to organise the table and leave data access to the nonclustered indexes.
I hold to the second school, so the rest of this article will be written from that perspective.
There are four main attributes that are desirable for a clustering key. The clustering key should be:
  • Narrow
  • Unique
  • Unchanging
  • Ever increasing
  •  

Narrow

The width of the clustering key affects the depth of the clustered index and hence it’s efficiency. A clustered index with a very deep b-tree requires queries to read more intermediate pages to locate rows. This increased page reads makes deeper clustered indexes less efficient for data access, especially for lookups. Additionally, more pages means more space used on disk and more space used in memory when the index is in the data cache
The width of the clustering key does not, however, only affect the clustered index. The clustering key, being the rows’ address, is located in every single nonclustered index. Hence a wide clustering key increases the size of all nonclustered indexes, reducing their efficiency as well.

Unique

The clustered index has to be unique. It’s used as the address for a row. If the clustered index is not defined as unique, SQL makes it unique by adding a hidden 4 byte integer column. This makes the index wider than it needs to be and makes the nonclustered indexes wider than they should be.
This attribute needs to be considered in relation to the others. If making the cluster unique requires adding several wide columns then it may be better to keep the index narrower and leave it as not unique.

Unchanging

The clustering key defines where a row is found, which page it is in. If the value of the clustering key is changed, the row must be moved from the page where it currently is to a new location. This means that the update is essentially a delete-insert combination.

Ever-increasing

An ever-increasing column is one where every value inserted is higher than all values currently in the table. This is important for a clustered index as, if inserts occurred all over the table, there would be a high number of page splits and the index would become rapidly fragmented.
Since the clustered index is the largest index on a table (as it contains the entire data row) and because it tends to be used for scans more than other indexes, the clustered index is the one that is affected the most when fragmented and the one that, generally, is the most expensive to rebuild.
For this reason, an ever-increasing clustering key is usually a good choice, especially for tables that are subject to frequent inserts.

 In the next part, we will see about Non-clustered indexes.

Related Posts:

Introduction to Indexes - Part I
Introduction to Indexes - Part II
Introduction to Indexes - Part III
Introduction to Indexes - Part IV
Introduction to Indexes - Part V
Introduction to Indexes - Part VI
   

0 comments:

Post a Comment