Wyyme Software: SQL Server Performance: Indexing Basics

Introduction

Fundamentals of Indexing

If you search record 3, below search method will be done internally

Demo – Clustered Indexing

Note: Non Clustered Index stores only the key not Data, it has key pointer to that data pages. Eventually Non Clustered Index use Clustered index to identify the records in side of data pages of Tables.

Practical Indexing Implementation Techniques

Demo of Primary Key

Demo of Over Indexing

Demo of Duplicate Indexes

Demo of Clustered Index

Demo of Unique Indexes

Demo of Included Columns

Demo of Filtered Index

Filtered Index creation on table

Demo of Disable Index

Introduction to Columnstore Index

Demo of Column Store Index

Indexing Practical Performance Tips and Tricks

Index and Page Types

Index and Non Deterministic Columns

Index and SET Values

Importance of Clustered Index

Four Indexing Myths

When I examine a SQL Server environment, I very often find that the indexing seems to have been designed under false assumptions about how indexes actually work.

There is a lot of misinformation floating around the web, so I thought that I would address some of the most common myths I encounter.

Myth 1: The Primary Key index must be clustered and unique.

Truth: The index supporting the Primary Key must be unique, but not necessarily clustered.

Explanation: There are often much better places for the clustered indexes than on the PK. A unique integer column is not always the best place for the clustered index from a performance point of view.

Clustered indexes are very good at returning a large range of records efficiently. You don’t often (if ever) query for a range of arbitrary ID numbers. A non clustered index can return an identity value as efficiently as can a clustered index. Why waste your only clustered index on a column that can be served by a non clustered index?

While the optimal place for the clustered index is determined by how the table is accessed, we often find that a sequentially increasing date field is a good place for it. Range searches on dates are common (…where invoicedate between ‘November 1, 2010’ and ‘November 30, 2010’). And sequentially increasing date fields do not incur page splitting or rapid fragmentation.

Myth 2: Stay away from clustered indexes because they cause page splitting and fragmentation.

Truth: There is no way you can get proper performance from most database applications without using clustered indexes.

Explanation: It is true that in some cases clustered indexes will cause page splitting and rapid fragmentation but that is not a reason to avoid them altogether. Rather it is a reason to learn how to use them effectively without causing page-splitting and fragmentation. Here are a couple of tips:

Never place a clustered index on a column with a GUID data type. A GUID is a 16 byte randomly generated value. Its randomness will promote a high level of page splitting and fragmentation of the table. Since the clustered index value of a row becomes part of every nonclustered index created on the table, this 16 byte value added to every index row will bloat the size of your table and slow down performance.

Try to find a naturally sequentially-increasing column ( a date, for example) that is used to filter ranges of data. This can make an important performance difference in many queries.

If you cannot find a column or set of columns where the clustered index would provide a performance boost, place the Primary Key on an IDENTITY column using a clustered index. This will physically organize your table on disk without incurring page splits or rapid fragmentation.

Myth 3: Multiple Column indexes can be used to search on any of the fields contained in the index.

Truth: In a multi-column index, only the first column is truly indexed.

Explanation: If you take a minute and think about how a multi-column index must work, you will realize that the values in the additional columns are grouped by the order of the first column and are not sequential, To find values in the second column without having a filter on the first column you would have to scan the entire table.

For example, let’s assume this index exists on the ORDERDETAIL table:

CREATE INDEX IX_TEST ON ORDERDETAIL(CUSTOMERID, PRODUCTID)

The following query can use this index to perform a very efficient seek:

SELECT * FROM ORDERDETAIL
WHERE CUSTOMERID = 1234
AND PRODUCTID = 34

This query might use a few more page reads but it will also be executed efficiently with an index seek:

SELECT * FROM ORDERDETAIL
WHERE CUSTOMERID = 1234

However, the following query will cause a table scan.

SELECT * FROM ORDERDETAIL
WHERE PRODUCTID = 34

Even though PRODUCTID is included in the index, the index will not be used for searches on PRODUCTID alone.

This does not mean that there is no benefit in multi-column indexes. They are useful in many ways, particularly in multi-column joins and in queries that filter on multiple columns from the same table (as in our first example).

Note:

Covering Indexes

Covering indexes are a special type of multi-column index and they can sometimes work magic on a poorly performing query. Covering indexes contain every column referenced from a specific table by a specific query in any part of the query.
Since all necessary data is contained in the index, the query never has to access the table itself. This is a bit like having another clustered index on the table and the performance improvement can be amazing.
It is important to use restraint in creating covering indexes or you will over-index your table. Here are some criteria to consider:

The query or queries to be optimized should be a serious enough performance problem to justify the existence of what might be a big index.
The number and size of the columns required should be relatively small.
The table should be relatively large.

The query we’ve chosen to optimize gives us the opportunity to explore a special kind of index, the covering index. A covering index is one that contains all the columns referenced in the query for a single table. We will demonstrate just how useful this can be.
Here is our example query:

SET STATISTICS IO ON;
GO
USE AdventureWorks;
SELECT ss.Name, COUNT(*) as OrderCount, SUM(sd.LineTotal) as TotalPurchases
FROM Sales.SalesOrderDetail sd
JOIN Sales.SalesOrderHeader sh on sd.SalesOrderID = sh.SalesOrderID
JOIN Sales.Customer sc on sh.CustomerID = sc.CustomerID
JOIN Production.Product p on sd.ProductID = p.ProductID
LEFT JOIN Sales.Store ss on sc.CustomerID = ss.CustomerID
WHERE ss.Name is not null
-- and sd.SpecialOfferID = 13
GROUP BY ss.Name
ORDER BY SUM(sd.LineTotal) desc

Don’t worry about the commented line in this query. Leave it commented out for now. We will uncomment it in a later example.
This query joins on several tables but almost all of the cost of the query is in the full scan on the largest table, SalesOrderDetail, at a cost of 1175 page reads. the query returns 633 rows.
(We find this information in the Messages tab of the query window in Management Studio after setting STATISTICS IO ON. If you are unfamiliar with this setting, you might want to look at last month's article before proceeding further.)
When we examine the query, there seems to be no way to reduce the cost of the table scan through indexing. In this query, SalesOrderDetail is joined on the SalesHeaderId and ProductID columns, but those columns are already indexed. The only other place that an index might improve things is on a column in the WHERE clause, but there are no search arguments in the where clause that apply to this table. The only search argument is on the Sales.Store table. There is no obvious column that we can index to improve the performance on SalesOrderDetail.
But notice that there are only three columns from SalesOrderDetail referenced in this query. They are SalesOrderID, ProductID and LineTotal. If we create an index that includes all those columns it is said to “cover the query”. That means that the query engine can get all the data it needs from the index without touching the table at all. This can lead to performance improvements that range from moderate to absolutely amazing.

CREATE NONCLUSTERED INDEX nc_cover on Sales.SalesOrderDetail(productid) INCLUDE (linetotal)

Notice that we use the INCLUDE clause to add the LineTotal column to this index. INCLUDE was a new feature of the CREATE INDEX statement in SQL 2005. It allows us to include columns that are not keys without incurring the substantial cost of adding more key values to the index. Columns that have no value for filtering, joining or ordering results can be added to the index at lower cost than simply naming them in the traditional CREATE INDEX syntax.
You might also notice that we have not mentioned SalesOrderID in this CREATE INDEX statement. How can this be a covering index if we leave out a column that is used in the query? The answer is that we have not left it out.
In a table having a clustered index, all nonclustered indexes include the clustering key. SalesOrderID is in the clustering key, therefore it is automatically added to any nonclustered index we create. There is no need to explicitly name it.
When we create this index, the cost of accessing SalesOrderDetail drops from 1175 data pages to 486 pages. This is not an earthshaking improvement, but it does illustrate the principle we are discussing.
In this case there was no filter available to limit the amount of data read from SalesOrderDetail so the optimizer was forced to scan the entire index. But this index contains far fewer pages than the entire table so we got a moderate performance benefit from our new index. In other cases we might see much more impressive results.
Let’s look at another aspect of this problem. Uncomment the line that filters on SpecialOfferID and execute the query again. This time it returns only 101 rows because of the filter. We might expect that the query would be less expensive. However, we would be wrong. It reads 1240 pages from SalesOrderDetail.
The reason is that the covering index does not include SpecialOfferID and no longer covers the query. Our index is not used. The additional logical reads come from the way the new execution plan has to aggregate the count of orders and the sum of the LineTotal.
There are two practical solutions to this problem. We could create an index on SpecialOfferID or we could add SpecialOfferID to the covering index.
In the real world we would probably create the new index on SpecialOfferID and drop the covering index. However, for purposes of this article we are going to drop our covering index and then re-create it, adding SpecialOfferID to it.

DROP INDEX Sales.SalesOrderDetail.nc_cover

GO

CREATE NONCLUSTERED INDEX nc_cover on Sales.SalesOrderDetail(SpecialOfferID, productid) INCLUDE (linetotal)

GO

When we add this column, we make it the first column in the index because SpecialOfferID is a very selective column in this query. If we put it in the second position, the database engine would search first on ProductID, which is not very selective and is not a filter in this query.
As a general rule, you want to make the most selective search argument the first column in a multi-column index.

Myth 4: Create Indexes in descending order if that is how they will be accessed.

Truth: The query engine can read an index backwards as easily as it can read it forward. There is seldom any reason to specify the order for a single column index.

Explanation: In multi-column indexes there are times when it might make sense to sort columns in different orders. For example if the results of queries are ordered or grouped on a set of ascending and descending columns. A clustered index on those columns using the same sort order might improve performance. However, that applies only to clustered indexes.

In my experience, it is extremely rare that you would use a clustered index to support the ORDER BY clause of a query which normally has a minimal cost. There are usually much better places for clustered indexes.

Table Scan, Index Scan and Index Seek

An index scan or table scan is when SQL Server has to scan the data or index pages to find the appropriate records. A scan is the opposite of a seek, where a seek uses the index to pinpoint the records that are needed to satisfy the query. The reason you would want to find and fix your scans is because they generally require more I/O and also take longer to process. This is something you will notice with an application that grows over time. When it is first released performance is great, but over time as more data is added the index scans take longer and longer to complete.

To find these issues you can start by running Profiler or setting up a server side trace and look for statements that have high read values. Once you have identified the statements then you can look at the query plan to see if there are scans occurring.

Here is a simple query that we can run. First use Ctrl+M to turn on the actual execution plan and then execute the query.

SELECT * FROM Person.Contact

Here we can see that this query is doing a Clustered Index Scan. Since this table has a clustered index and there is not a WHERE clause SQL Server scans the entire clustered index to return all rows. So in this example there is nothing that can be done to improve this query.

Logical and Physical read(s)

Normally, When we request a data for a query(i.e: SELECT), The Database engine fetches the data page(s) from the physical disk and loads into the Buffer Cache called "Physical Read".
"Physical Reads" occurrs only when data pages are not there in "Buffer cache".
Then , The data page(s) are retrived from the Buffer cache called "Logical Read"
If the data pages already there in Buffer Cache then, "Logical Reads" will be performed directly, no Physical read will be there.
So, Disk IO reduced by using the "Logical Read", Because, The data pages retrived from Buffer Cache, not from Physical Disk .

Partitioning a SQL Server Database Table

Physical Partition

Looking to optimize the performance of your SQL Server database? If your database contains very large tables, you may benefit from partitioning those tables onto separate filegroups. This technology, introduced in SQL Server 2005, allows you to spread data onto different physical disks, leveraging the concurrent performance of those disks to optimize query performance.

Partitioning a SQL Server database table is a three-step process:

Create the partition function
Create the partition scheme
Partition the table

Logical Partition

We can do logical partition on table by using RANK, DENSE RANK, ROW NUMBER and NTILE functions.

Checklist: Index and Performance

Blogger Labels: Server,Performance,Basics,Introduction,Fundamentals,method,Demo,Note,Index,Data,pointer,Tables,Practical,Implementation,Techniques,Primary,Over,Duplicate,Indexes,Unique,Columns,creation,Disable,Maintenance,Query,Defragmentation,Script,Column,Tips,Tricks,Page,Types,Deterministic,Values,Importance,Four,Myths,environment,assumptions,misinformation,Myth,Truth,Explanation,integer,Range,November,fragmentation,Stay,database,Rather,Here,GUID,size,example,difference,disk,Multiple,ORDERDETAIL,CREATE,IX_TEST,CUSTOMERID,PRODUCTID,SELECT,FROM,WHERE,improvement,criteria,existence,STATISTICS,AdventureWorks,Name,COUNT,OrderCount,LineTotal,TotalPurchases,Sales,SalesOrderDetail,JOIN,SalesOrderHeader,SalesOrderID,Customer,Production,Product,LEFT,SpecialOfferID,GROUP,ORDER,Leave,cost,information,Messages,Management,Studio,article,SalesHeaderId,clause,arguments,argument,engine,improvements,INCLUDE,Notice,statement,syntax,principle,aspect,Uncomment,execution,solutions,purposes,DROP,Table,Scan,Seek,Profiler,statements,Once,Ctrl,Person,Contact,Checklist,byte,multi,nc_cover

Saturday, 15 March 2014

SQL Server Performance: Indexing Basics

Introduction

Fundamentals of Indexing

If you search record 3, below search method will be done internally

Demo – Clustered Indexing

Demo – Clustered Indexing

Practical Indexing Implementation Techniques

Demo of Primary Key

Demo of Over Indexing

Demo of Duplicate Indexes

Demo of Clustered Index

Demo of Unique Indexes

Demo of Included Columns

Demo of Filtered Index

Filtered Index creation on table

Demo of Disable Index

Index Maintenance

Demo of Index Maintenance

Query to find duplicate indexes

Demo of Defragmentation

Index Maintenance Script from OLA

Introduction to Columnstore Index

Demo of Column Store Index

Indexing Practical Performance Tips and Tricks

Index and Page Types

Index and Non Deterministic Columns

Index and SET Values

Importance of Clustered Index

Four Indexing Myths

Myth 1: The Primary Key index must be clustered and unique.

Myth 2: Stay away from clustered indexes because they cause page splitting and fragmentation.

Myth 3: Multiple Column indexes can be used to search on any of the fields contained in the index.

Note:

Covering Indexes

Myth 4: Create Indexes in descending order if that is how they will be accessed.

Table Scan, Index Scan and Index Seek

Logical and Physical read(s)

Partitioning a SQL Server Database Table

Physical Partition

Logical Partition

Checklist: Index and Performance

No comments:

Post a Comment