How a nonclustered index is used?
Seek
For SQL to do a seek on a nonclustered index, the query must have a SARGable1 predicate referencing the index key or a left-based subset of the index key. In addition to this, the index must be either covering or return sufficiently small number of rows that the required lookups to the clustered index/heap (to retrieve the remainder of the columns) is not considered too expensive by the optimiser. If the required lookups are considered too expensive then the index will not be used.It usually works out that the number of rows where the optimiser decides that key/RID lookups are too expensive is somewhere around 0.5%-1% of the total rows in the table.
(1) - SARGable is a made-up word, constructed from the phrase Search ARGument. It refers to a predicate that is of a form that SQL can use for an index seek.
Scan
An index scan is a read of all of the leaf pages in the nonclustered index. A scan of a nonclustered index is generally more efficient than a scan of the clustered index, because nonclustered indexes are generally smaller.A nonclustered index scan usually indicates that the index contains all the columns required by the query but they are in the wrong order, the query predicates are not SARGable or there are no query predicates.
Update
When a change is made to a column that is either a key column or an include column of a nonclustered index, that index will be modified as part of the insert statement. Indexes are never left out of sync with the underlying table data.Inserts and deletes will affect all nonclustered indexes on a table.
Considerations for selecting nonclustered indexes
The decision as to what columns should be indexed should be based on the queries that are run against the table. There’s no point in indexing a column that is never used in a query.Selectivity
In general, a nonclustered index should be selective. That is, the values in the column should be fairly unique and queries that filter on it should return small portions of the table.The reason for this is that key/RID lookups are expensive operations and if a nonclustered index is to be used to evaluate a query it needs to be covering or sufficiently selective that the costs of the lookups aren’t deemed to be too high.
If SQL considers the index (or the subset of the index keys that the query would be seeking on) insufficiently selective then it is very likely that the index will be ignored and the query executed as a clustered index (table) scan.
It is important to note that this does not just apply to the leading column. There are scenarios where a very unselective column can be used as the leading column, with the other columns in the index making it selective enough to be used.
Single column vs. Multi-column indexes
In general, wider nonclustered indexes are more useful than single column nonclustered indexes. This is because it is very unusual for SQL to use multiple nonclustered indexes on the same table to evaluate a query. An index defined over more than one column is referred to as a composite index.Consider a hypothetical table (Table1) that has three indexes on it, one on Col1, one on Col2 and one on Col3. If a query with predicates on all three of those columns is executed against that table, it is exceedingly unlikely for SQL to seek on all three indexes and intersect the results. It is far more likely that one of the indexes will be used; the predicate against the column within the selected index evaluated then the rest of the columns retrieved (from the clustered index) and then the remaining two predicates evaluated.
Were as, if there was a single index defined with all three columns, the matching rows could be located as a single seek operation.
Three separate indexes:
One composite index:
Column order
The order of the columns in an index key should be chosen based on several factors: the selectivity of the columns, what % of queries will filter on that column and whether the queries will filter with equality or inequality matches.If the query predicate is based on multiple columns and there is an index where the combination of columns forms a left-based subset of the index key, then retrieving the rows is a single seek operation. If the query predicate is based on multiple columns and there is an index that contains those columns as part of the index key, but they do not all form a left-based subset of the index key, then retrieving those rows is a 2-step process; seek on the columns that do form a left-based subset and then filter out rows that don’t match the other conditions.
Let’s look at a quick example.
Imagine we have a hypothetical table named Table1 and that there is an index on that table with the index key consisting of three columns – Col1, Col2 and Col3.
CREATE INDEX idx_Table1_Demo
ON Table1 (Col1, Col2, Col3)
SELECT Col1 FROM Table1 WHERE Col1 = @Var1 AND Col2 = @Var2
SELECT Col1 FROM Table1 WHERE Col1 = @Var1 AND Col3 = @Var3This query cannot be evaluated by a single seek operation against the two columns. Rather it has to be done as a seek just on one column, Col1, and the rows that are returned from that seek get checked to see if they match the second predicate or not.
The other thing that has to be considered is the type of predicates that the queries will be using.
It is often said that the most selective column should be made the leading column of an index. This is true, but must be considered in light of the other factors. There is little point in making a column the leading column of an index if only 5% of the queries that use that index filter on that column. It would mean that only those queries could seek on the index and the other 95% would either scan or use a different index (if one existed)
Related Posts:
Introduction to Indexes - Part I
Introduction to Indexes - Part II
Introduction to Indexes - Part III
Introduction to Indexes - Part IV
Introduction to Indexes - Part V
Introduction to Indexes - Part VI
0 comments:
Post a Comment