Introduction to Indexes - Part I

on August 01, 2010
     
Good indexes are the key to good performance in SQL Server and the key to creating good indexes is to understand what indexes are and how SQL Server uses them to evaluate queries.
In this first part of a three part series, I’m going to look at the very basics of what indexes are, what types exist in SQL and how they’re used.

What is an index?

An index is a structure within SQL that is used to quickly locate specific rows within a table. It can be useful to imaging an index at the back of a textbook when thinking about SQL indexes. They both serve the same purpose – to find specific information quickly.

General Structure

An index is defined on one or more columns, called key columns. The key columns (also referred to as the index key) can be likened to the terms listed in a book index. They are the values that the index will be used to search for. As with the index found at the back of a text book (see figure 1), the index is sorted by the key columns.




If an index is created with more than one key column, it is known as a composite index.
The general structure of an index is that of a balanced tree (b-tree). The index will have a single root page, zero or more intermediate levels and then a leaf level. A page is an 8 kilobyte chunk of the data file, with a header and footer and is identified by a combination of File ID and Page number.




Note: Commonly the root page is shown at the top of the tree diagram and the leaf pages at the bottom. Think of it as an inverted tree.

In the leaf level, there’s one entry for each row in the index1. The entries in the index are ordered logically2 in the order of the index key.
The non-leaf levels of the index contain one row per page of the level below, referencing the lowest index key value on each page.  If all of those rows fit onto a single page, then that page is considered the root and the index is only two levels deep. If all of those rows will not fit on a single page, then one (or more) intermediate levels are added to the index.

The number of levels in an index is referred to as the depth of the index. This is an important consideration for evaluating the efficiency of the index. The index illustrated in figure 2 has a depth of 3.

(1)   With the exception of SQL 2008’s filtered indexes, an index will have the same number of rows at the leaf level as the table.
(2)   I’m using the phrase ‘logically ordered’ because the index does not necessarily define the physical storage of the rows. The rows are stored in a way that SQL can retrieve them ordered.

Clustered and Non-clustered

There are two main types of indexes in SQL Server, the clustered index and the non-clustered index
Clustered indexes define the logical order of the table. The leaf level of the clustered index has the actual data pages of the table. Because of this there can only be one clustered index per table. A table that does not have a clustered index is referred to as a heap.

Non-clustered indexes are separate from the table. The leaf level of a non-clustered index has a pointer as part of each index row. That pointer is either the clustered index key in the cases where the base table has a clustered index or the RID (Row Identifier) in the cases where the table is a heap. The RID is an 8-byte structure comprised of File ID, Page Number and Slot Index and will uniquely identify a row in the underlying heap. Either way, the each row of a non-clustered index has a reference to the complete data row.

Index Limits

There are a number of built-in limitations on indexes,

Key size
The size of an index key is limited to a maximum of 900 bytes and a maximum of 16 columns. This is definitely a limit, not a goal, as the larger the index key gets, the more pages in the index and the deeper the index tree. As the number of pages and the depth of the tree increases so the index becomes less efficient to use. Larger indexes also use more storage space and result in less efficient use of SQL’s data cache.

Number of indexes
In SQL 2005 and earlier there was a limitation of 250 indexes per table, one clustered and 249 non-clustered. In SQL 2008, with the addition of filtered indexes, that limitation was increased to 1000, one clustered and 999 non-clustered indexes.
Both of these limits are very high and there are few circumstances where a well-designed system should approach that limit.

The reason for this is twofold.

·        As the number of indexes increases so the total size occupied by the table (with all of its indexes) increases. Sure, hard drives are cheap and storage is abundant but increasing the size of a database has other effects, Maintenance operations (backups, restores, consistency checks and index rebuilds) all take longer as the size of a database increases.
·        Indexes have to be kept up to date as data changes and the more indexes there are on a table, the more places the data has to be changed. If there are 10 non-clustered indexes on a table, an insert must be done in 11 places (the table and each of those non-clustered indexes). On databases that are mostly read-only (decision support, data warehouses) that may be acceptable. On databases that have frequent inserts, updates and deletes (OLTP systems), the overhead impose by multiple indexes may not be acceptable



 In the next part, we will see how indexes are used by SQL SERVER.

Related Posts:

Introduction to Indexes - Part I
Introduction to Indexes - Part II
Introduction to Indexes - Part III
Introduction to Indexes - Part IV
Introduction to Indexes - Part V
Introduction to Indexes - Part VI
   

0 comments:

Post a Comment