Introduction to Indexes - Part III

on August 01, 2010
   
In the previous article I looked at the very basics of what indexes are, what types exist in SQL and how they’re used. In this article I’m going to take a closer look at clustered indexes, what makes them different from nonclustered indexes, how SQL uses clustered indexes and some of the recommendations for selecting the clustered index key.

What is a clustered index?

A clustered index is an index where the leaf level of the index contains the actual data rows of the table. Like any other index, a clustered index is defined on one or more columns – the index key. The key columns for a clustered index are often referred to as the clustering key.
The clustered index can be looked at as a balanced tree (b-tree) structure built on top of a table, with the rows in the table stored logically in the order of the clustering key (see figure 1). The clustered index essentially is the table and hence there cannot be more than one on a table.


The clustering key is, in some ways, the ‘address’ of a data row. It can be used, through the tree structure of the clustered index, to locate a row of data. Because it is used to locate a data row, the clustering key must be unique. If, when the clustered index is created it is defined as unique (either by specifying the UNIQUE keyword as part of the create index statement or by defining the clustered index as part of a primary key or unique constraint), then everything is fine. What if the clustered index is not defined as unique? In that case, SQL automatically adds another column to the index, a 4-byte integer column that’s called the Uniquifier. This column is hidden, it is not displayed in the table design and it cannot be queried directly.


Difference between a clustered index and a heap

A table that has no clustered index is referred to as a heap. Whereas a clustered index has the data stored logically in the order of the index key, a heap has no ordering of rows or pages.
When a row is added to a table with a clustered index, that row has a defined page that it must go on to, according to the value of the clustered index key. Using the index depicted in Figure 1 as an example, if a new row is added with a value of 4 for the clustered index key, that row must go onto the second leaf page. If that page is full, a new page gets allocated, joined into the index and some of the rows from the full page are moved to the new one. This is called a page split, it can be an expensive operation and it leads to fragmentation.
When a row is added to a table without a clustered index (a heap) there’s no specific place that the row has to go. A heap has no defined order. The row will be added wherever there’s a space large enough to put the row.

How a clustered index is used

There are three ways that a clustered index can be used by a query to locate – seek, scan and lookup.

Seek

For the clustered index to be used in a seek operation, the query must contain a SARGable1 predicate based on the clustering key or part of the clustering key.
A simple example of a clustered index seek is as follows (based on the AdventureWorks database)
SELECT ProductID, ProductNumber, Name
 FROM Production.Product
 WHERE ProductID = 380
ProductID is the clustering key for the table Production.Product and the filter on ProductID is SARGable.
 
(1)   SARGable is a made-up word, constructed from the phrase Search  ARGument. It refers to a predicate that is of a form that SQL can use  for an index seek.

Scan

A scan of the clustered index is equivalent to a table scan. It’s a read of all of the data pages in the table
Because the clustered index is the table it can be seen as the default access path for queries to get data to retrieve rows. If a query does not have a SARGable predicate1 based on the clustering key  and there are no non-clustered indexes which are appropriate for a query (and what makes a non-clustered index appropriate will be discussed in detail in part 3), then a scan of the clustered index will be done to retrieve the data for the query. This is a full scan of all the data pages, i.e. a table scan.
In this example a clustered index scan is done because there are no indexes on either of the columns in the where clause.
SELECT ProductID, ProductNumber, Name
FROM Production.Product   
WHEREColor = 'Blue' AND DaysToManufacture 
BETWEEN 2 AND 4 
 SARGable is a made-up word, constructed from the phrase Search ARGument. It refers to a predicate that is of a form that SQL can use for an index seek.

Lookups

A lookup occurs when a nonclustered index was used to locate rows for a query, but the nonclustered index did not contain all of the columns required for the query. To fetch the remaining columns, a lookup is done to the clustered index.
A lookup is equivalent to a single row clustered index seek. Lookups are always done one row at a time. For this reason, they are very expensive operations, especially when lots of rows are involved.
SELECT ProductID, ProductNumber, Name
FROM production.Product
WHERE ProductNumber = 'HN-1224'


In the next part, we will see the considerations for selecting the clustering key.

Related Posts:

Introduction to Indexes - Part I
Introduction to Indexes - Part II
Introduction to Indexes - Part III
Introduction to Indexes - Part IV
Introduction to Indexes - Part V
Introduction to Indexes - Part VI
    

   

0 comments:

Post a Comment