Monday 23 May 2016

Types of Physical Joins in SQL Server?

1) Nested Join (small or small and large tables)

2)Merge Join (large should sorted before and proper indexes)

3)Hash Join (large without index)

Nested Loops Join

Before digging into the details, let me tell you first what a Nested Loops join is if you are new to the programming world. A Nested Loops join is a logical structure in which one loop (iteration) resides inside another one, that is to say for each iteration of the outer loop all the iterations of the inner loop are executed/processed.

A Nested Loops join works in the same way. One of the joining tables is designated as the outer table and another one as the inner table. For each row of the outer table, all the rows from the inner table are matched one by one if the row matches it is included in the result-set otherwise it is ignored. Then the next row from the outer table is picked up and the same process is repeated and so on.

The SQL Server optimizer might choose a Nested Loops join when one of the joining tables is small (considered as the outer table) and another one is large (considered as the inner table which is indexed on the column that is in the join) and hence it requires minimal I/O and the fewest comparisons.

The optimizer considers three variants for a Nested Loops join:

naive nested loops join in which case the search scans the whole table or index
index nested loops join when the search can utilize an existing index to perform lookups
temporary index nested loops join if the optimizer creates a temporary index as part of the query plan and destroys it after query execution completes.

An index nested loops join performs better than a merge join or hash join if a small set of rows are involved. Whereas, if a large set of rows are involved the Nested Loops join might not be an optimal choice. Nested Loops support almost all join types except right and full outer joins, right semi-join and right anti-semi join.

In Script #1, I am joining the SalesOrderHeader table with SalesOrderDetail table and specifying the criteria to filter the result of the customer with CustomerID = 670. This filtered criteria returns 12 records from the SalesOrderHeader table and hence being the smaller one, this table has been considered as the outer table (top one in the graphical query execution plan) by the optimizer. For each row of these 12 rows of the outer table, rows from the inner table are matched (or the inner table is scanned 12 times each time for each row using the index seek or correlated parameter from the outer table) and 312 matching rows are returned as you can see in the second image. In the second query below, I am using SET STATISTICS PROFILE ON to display profile information of the query execution along with the query result-set.

Script #1 - Nested Loops Join

Script #1 - Nested Loops Join
`SELECT H.CustomerID, H.SalesOrderID, D.ProductID, D.LineTotal FROM Sales.SalesOrderHeader H INNER JOIN Sales.SalesOrderDetail D ON H.SalesOrderID = D.SalesOrderID WHERE H.CustomerID = 670` `SET STATISTICS PROFILE ON SELECT H.CustomerID, H.SalesOrderID, D.ProductID, D.LineTotal FROM Sales.SalesOrderHeader HINNER JOIN Sales.SalesOrderDetail D ON H.SalesOrderID = D.SalesOrderIDWHERE H.CustomerID = 670SET STATISTICS PROFILE OFF`

SELECT H.CustomerID, H.SalesOrderID, D.ProductID, D.LineTotal FROM Sales.SalesOrderHeader H INNER JOIN Sales.SalesOrderDetail D ON H.SalesOrderID = D.SalesOrderID WHERE H.CustomerID = 670

SET STATISTICS PROFILE ON
SELECT H.CustomerID, H.SalesOrderID, D.ProductID, D.LineTotal FROM Sales.SalesOrderHeader HINNER JOIN Sales.SalesOrderDetail D ON H.SalesOrderID = D.SalesOrderIDWHERE H.CustomerID = 670SET STATISTICS PROFILE OFF

If the number of records involved is large, SQL Server might choose to parallelize a nested loop by distributing the outer table rows randomly among the available Nested Loops threads dynamically. It does not apply the same for the inner table rows though. To learn more about parallel scans click here.

Merge Join

The first thing that you need to know about a Merge join is that it requires both inputs to be sorted on join keys/merge columns (or both input tables have clustered indexes on the column that joins the tables) and it also requires at least one equijoin (equals to) expression/predicate.

Because the rows are pre-sorted, a Merge join immediately begins the matching process. It reads a row from one input and compares it with the row of another input. If the rows match, that matched row is considered in the result-set (then it reads the next row from the input table, does the same comparison/match and so on) or else the lesser of the two rows is ignored and the process continues this way until all rows have been processed..

A Merge join performs better when joining large input tables (pre-indexed / sorted) as the cost is the summation of rows in both input tables as opposed to the Nested Loops where it is a product of rows of both input tables. Sometimes the optimizer decides to use a Merge join when the input tables are not sorted and therefore it uses an explicit sort physical operator, but it might be slower than using an index (pre-sorted input table).

In Script #2, I am using a similar query as above, but this time I have added a WHERE clause to get all customer greater than 100. In this case, the optimizer decides to use a Merge join as both inputs are large in terms of rows and they are also pre-indexed/sorted. You can also notice that both inputs are scanned only once as opposed to the 12 scans we saw in the Nested Loops join above.

Script #2 - Merge Join

Script #2 - Merge Join
`SELECT H.CustomerID, H.SalesOrderID, D.ProductID, D.LineTotal FROM Sales.SalesOrderHeader H INNER JOIN Sales.SalesOrderDetail D ON H.SalesOrderID = D.SalesOrderID WHERE H.CustomerID > 100` `SET STATISTICS PROFILE ON SELECT H.CustomerID, H.SalesOrderID, D.ProductID, D.LineTotal FROM Sales.SalesOrderHeader HINNER JOIN Sales.SalesOrderDetail D ON H.SalesOrderID = D.SalesOrderIDWHERE H.CustomerID > 100SET STATISTICS PROFILE OFF`

SELECT H.CustomerID, H.SalesOrderID, D.ProductID, D.LineTotal FROM Sales.SalesOrderHeader H INNER JOIN Sales.SalesOrderDetail D ON H.SalesOrderID = D.SalesOrderID WHERE H.CustomerID > 100

SET STATISTICS PROFILE ON
SELECT H.CustomerID, H.SalesOrderID, D.ProductID, D.LineTotal FROM Sales.SalesOrderHeader HINNER JOIN Sales.SalesOrderDetail D ON H.SalesOrderID = D.SalesOrderIDWHERE H.CustomerID > 100SET STATISTICS PROFILE OFF

A Merge join is often a more efficient and faster join operator if the sorted data can be obtained from an existing B-tree index and it performs almost all join operations as long as there is at least one equality join predicate involved. It also supports multiple equality join predicates as long as the input tables are sorted on all joining keys involved and are in the same order.

The presence of a Compute Scalar operator indicates the evaluation of an expression to produce a computed scalar value. In the above query I am selecting LineTotal which is a derived column, hence it has been used in the execution plan.

Hash Join

A Hash join is normally used when input tables are quite large and no adequate indexes exist on them. A Hash join is performed in two phases; the Build phase and the Probe phase and hence the hash join has two inputs i.e. build input and probe input. The smaller of the inputs is considered as the build input (to minimize the memory requirement to store hash table discussed later) and obviously the other one is the probe input.

During the build phase, joining keys of all the rows of the build table are scanned. Hashes are generated and placed in an in-memory hash table. Unlike the Merge join, it's blocking (no rows are returned) until this point.

During the probe phase, joining keys of each row of the probe table are scanned. Again hashes are generated (using the same hash function as above) and compared against the corresponding hash table for a match.

A Hash function requires significant amount of CPU cycles to generate hashes and memory resources to store the hash table. If there is memory pressure, some of the partitions of the hash table are swapped to tempdb and whenever there is a need (either to probe or to update the contents) it is brought back into the cache. To achieve high performance, the query optimizer may parallelize a Hash join to scale better than any other join, for more details click here.

There are basically three different types of hash joins,

In-memory Hash Join in which case enough memory is available to store the hash table
Grace Hash Join in which case the hash table can not fit in memory and some partitions are spilled to tempdb
Recursive Hash Join in which case a hash table is so large the optimizer has to use many levels of merge joins. For more details about these different types click here.

In Script #3, I am creating two new large tables (from the existing AdventureWorks tables) without indexes. You can see the optimizer chose to use a Hash join in this case. Again unlike a Nested Loops join, it does not scan the inner table multiple times.

Script #3 - Hash Join

Script #3 - Hash Join
`--Create tables without indexes from existing tables --of AdventureWorks database SELECT * INTO Sales.SalesOrderHeader1 FROM Sales.SalesOrderHeader SELECT * INTO Sales.SalesOrderDetail1 FROM Sales.SalesOrderDetail GO SELECT H.CustomerID, H.SalesOrderID, D.ProductID, D.LineTotal FROM Sales.SalesOrderHeader1 H INNER JOIN Sales.SalesOrderDetail1 D ON H.SalesOrderID = D.SalesOrderID WHERE H.CustomerID = 670 GO` `SET STATISTICS PROFILE ON SELECT H.CustomerID, H.SalesOrderID, D.ProductID, D.LineTotal FROM Sales.SalesOrderHeader1 H INNER JOIN Sales.SalesOrderDetail1 D ON H.SalesOrderID = D.SalesOrderID WHERE H.CustomerID = 670 SET STATISTICS PROFILE OFF` `--Drop the tables created for demonstration DROP TABLE Sales.SalesOrderHeader1 DROP TABLE Sales.SalesOrderDetail1`

--Create tables without indexes from existing tables
--of AdventureWorks database SELECT * INTO Sales.SalesOrderHeader1 FROM Sales.SalesOrderHeader SELECT * INTO Sales.SalesOrderDetail1 FROM Sales.SalesOrderDetail 
GO SELECT H.CustomerID, H.SalesOrderID, D.ProductID, D.LineTotal FROM Sales.SalesOrderHeader1 H INNER JOIN Sales.SalesOrderDetail1 D ON H.SalesOrderID = D.SalesOrderID WHERE H.CustomerID = 670 
GO

SET STATISTICS PROFILE ON 
SELECT H.CustomerID, H.SalesOrderID, D.ProductID, D.LineTotal FROM Sales.SalesOrderHeader1 H INNER JOIN Sales.SalesOrderDetail1 D ON H.SalesOrderID = D.SalesOrderID WHERE H.CustomerID = 670 SET STATISTICS PROFILE OFF

--Drop the tables created for demonstration DROP TABLE Sales.SalesOrderHeader1 DROP TABLE Sales.SalesOrderDetail1

Note: SQL Server does a pretty good job in deciding which join operator to use in each condition. Understanding these condition helps you to understand what can be done in performance tuning. It's not recommended to use join hints (using OPTION clause) to force SQL Server to use a specific join operator (unless you have no other way out), but rather you can use other means like updating statistics, creating indexes or re-writing your query.

Wednesday 11 May 2016

What is Index Scans/Table Scans and What is Bookmark/Key/RID lookup and How is Discovering Unused Indexes?

O and also take longer to process. This is something you will notice with an application that grows over time. When it is first released performance is great, but over time as more data is added the index scans take longer and longer to complete.

To find these issues you can start by running Profiler or setting up a server side trace and look for statements that have high read values. Once you have identified the statements then you can look at the query plan to see if there are scans occurring.

Here is a simple query that we can run. First use Ctrl+M to turn on the actual execution plan and then execute the query.

SELECT * FROM Person.Contact

Here we can see that this query is doing a Clustered Index Scan. Since this table has a clustered index and there is not a WHERE clause SQL Server scans the entire clustered index to return all rows. So in this example there is nothing that can be done to improve this query.

In this next example I created a new copy of the Person.Contact table without a clustered index and then ran the query.

SELECT * FROM Person.Contact2

Here we can see that this query is doing a Table Scan, so when a table has a Clustered Index it will do a Clustered Index Scan and when the table does not have a clustered index it will do a Table Scan. Since this table does not have a clustered index and there is not a WHERE clause SQL Server scans the entire table to return all rows. So again in this example there is nothing that can be done to improve this query.

In this next example we include a WHERE clause for the query.

SELECT * FROM Person.Contact WHERE LastName = 'Russell'

Here we can see that we still get the Clustered Index Scan, but this time SQL Server is letting us know there is a missing index. If you right click on the query plan and select Missing Index Details you will get a new window with a script to create the missing index.

Let's do the same thing for our Person.Contact2 table.

SELECT * FROM Person.Contact2 WHERE LastName = 'Russell'

We can see that we still have the Table Scan, but SQL Server doesn't offer any suggestions on how to fix this.

Another thing you could do is use the Database Engine Tuning Advisor to see if it gives you any suggestions. If I select the query in SSMS, right click and select Analyze Query in Database Engine Tuning Advisor the tools starts up and I can select the options and start the analysis.

Below is the suggestion this tool provides and we can see that recommends creating a new index, so you can see that using both tools can be beneficial.

Create New Index

So let's create the recommended index on Person.Contact and run the query again.

USE [AdventureWorks]
GO
CREATE NONCLUSTERED INDEX [IX_LastName]
ON [Person].[Contact] ([LastName])
GO
SELECT * FROM Person.Contact WHERE LastName = 'Russell'

Here we can see the query plan has changed and instead of a Clustered Index Scan we now have an Index Seek which is much better. We can also see that there is now a Key Lookup operation which we will talk about in the next section.

Summary

By finding and fixing your Index Scans and Table Scans you can drastically improve performance especially for larger tables. So take the time to identify where your scans may be occurring and create the necessary indexes to solve the problem. One thing that you should be aware of is thattoo many indexes also causes issues, so make sure you keep a balance on how many indexes you create for a particular table.

Eliminating bookmark /Key & Rid lookups

In the above scenario, if table has clustered index, it is called bookmark lookup (or key lookup); if the table does not have clustered index, but a non-clustered index, it is called RID lookup. This operation is very expensive. To optimize any query containing bookmark lookup or RID lookup, it should be removed from the execution plan to improve performance. There are two different ways to remove bookmark/RID lookup.

The reason you would want to eliminate Key/RID Lookups is because they require an additional operation to find the data and may also require additional I/O. I/O is one of the biggest performance hits on a server and any way you can eliminate or reduce I/O is a performance gain.

So let's take a look at an example query and the query plan. Before we do this we want to first add the nonclustered index on LastName.

USE [AdventureWorks]
GO
CREATE NONCLUSTERED INDEX [IX_LastName]
ON [Person].[Contact] ([LastName])
GO

Now we can use Ctrl+M to turn on the actual execution plan and run the select.

SELECT * FROM Person.Contact WHERE LastName = 'Russell'

If we look at the execution plan we can see that we have an Index Seek using the new index, but we also have a Key Lookup on the clustered index. The reason for this is that the nonclustered index only contains the LastName column, but since we are doing a SELECT * the query has to get the other columns from the clustered index and therefore we have a Key Lookup. The other operator we have is the Nested Loops this joins the results from the Index Seek and the Key Lookup.

So if we change the query as follows and run this again you can see that the Key Lookup disappears, because the index includes all of the columns.

SELECT LastName FROM Person.Contact WHERE LastName = 'Russell'

Here we can see that we no longer have a Key Lookup and we also no longer have the Nested Loops operator.

If we run both of these queries at the same time in one batch we can see the improvement by removing these two operators.

SELECT * FROM Person.Contact WHERE LastName = 'Russell'
SELECT LastName FROM Person.Contact WHERE LastName = 'Russell'

Below we can see that the first statement takes 99% of the batch and the second statement takes 1%, so this is a big improvement.

query plan with index seek and key lookup

This should make sense that since the index includes LastName and that is the only column that is being used for both the SELECTed columns and the WHERE clause the index can handle the entire query. Another thing to be aware of is that if the table has a clustered index we can include the clustered index column or columns as well without doing a Key Lookup.

The Person.Contact table has a clustered index on ContactID, so if we include this column in the query we can still do just an Index Seek.

SELECT ContactID, LastName FROM Person.Contact WHERE LastName = 'Russell'

Here we can see that we only need to do an Index Seek to include both of these columns.

So that's great if that is all you need, but what if you need to include other columns such as FirstName. If we change the query as follows then the Key Lookup comes back again.

SELECT FirstName, LastName FROM Person.Contact WHERE LastName = 'Russell'

Luckily there are a few options to handle this.

Creating a Covering Index

A covering index basically does what it implies, it covers the query by including all of the columns that are needed. So if our need is to always include FirstName and LastName we can modify our index as follows to include both LastName and FirstName.

DROP INDEX [IX_LastName] ON [Person].[Contact]
GO
CREATE NONCLUSTERED INDEX [IX_LastName]
ON [Person].[Contact] ([LastName], [FirstName])
GO

And if we look at the execution plan we can see that we eliminated the Key Lookup once again.

Creating an Index with Included Columns

Another option is to use the included columns feature for an index. This allows you to include additional columns so they are stored with the index, but are not part of the index tree. So this allows you to take advantage of the features of a covering index and reduces storage needs within the index tree. Another benefit is that you can include additional data types that can not be part of a covering index.

The syntax for the the index with included columns is as follows:

DROP INDEX [IX_LastName] ON [Person].[Contact]
GO
CREATE NONCLUSTERED INDEX [IX_LastName]
ON [Person].[Contact] ([LastName]) 
INCLUDE ([FirstName])
GO

Here we can see the exuection plan is the same for both options.

query plan for index with included columns

Another Examples to

Remove Bookmark Lookup (or) Key Lookup(Table have Clustered index):

When a small number of rows are requested by a query, the SQL Server optimizer will try to use a non-clustered index on the column or columns contained in the WHERE clause to retrieve the data requested by the query. If the query requests data from columns not present in the non-clustered index, SQL Server must go back to the data pages to get the data in those columns. Even if the table has a clustered index or not, the query will still have to return to the table or clustered index to retrieve the data.

In the above scenario, if table has clustered index, it is called bookmark lookup (or key lookup); if the table does not have clustered index, but a non-clustered index, it is called RID lookup. This operation is very expensive. To optimize any query containing bookmark lookup or RID lookup, it should be removed from the execution plan to improve performance. There are two different ways to remove bookmark/RID lookup.

let us understand the difference between Seek Predicate and Predicate. Seek Predicate is the operation that describes the b-tree portion of the Seek. Predicate is the operation that describes the additional filter using non-key columns. Based on the description, it is very clear that Seek Predicate is better than Predicate as it searches indexes whereas in Predicate, the search is on non-key columns – which implies that the search is on the data in page files itself.

Let us see an example where we will remove the bookmark lookup first using the covering index. On removing the bookmark lookup, it resulted in Index Scan, which is not good for performance. When Index Scan is converted to Index Seek, it provides a significant improvement in performance.

Run following SELECT, which is based on the database AdventureWorks.

USE AdventureWorks
GO-- CTRL + M Enforces Key Lookup
-- Try 1SELECT NationalIDNumber, HireDate, MaritalStatusFROM HumanResources.EmployeeWHERE NationalIDNumber = 14417807
GO

Let us check the execution plan for the same. The execution plan consists of the key lookup because there are columns that we are trying to retrieve in SELECT as well in WHERE clause.

SQL SERVER - Removing Key Lookup - Seek Predicate - Predicate - An Interesting Observation Related to Datatypes klseek1

Let us create a covering index on this table HumanResources.Employee.

-- Create Non clustered IndexCREATE NONCLUSTERED INDEX [IX_HumanResources_Employee_Example] ON HumanResources.Employee(NationalIDNumber ASC, HireDate, MaritalStatus) ON [PRIMARY] GO

After creating above index, let us run the same SELECT statement again.

-- CTRL + M Removes Key Lookup, but it still enforces Index Scan
-- Try 2SELECT NationalIDNumber, HireDate, MaritalStatusFROM HumanResources.EmployeeWHERE NationalIDNumber = 14417807
GO

Due to non-clustered index, Key Lookup is removed along with Nested Loops but Index Scan still remains, and this is not good for performance.

SQL SERVER - Removing Key Lookup - Seek Predicate - Predicate - An Interesting Observation Related to Datatypes klseek2

I tried to remove the scan, but my attempts were unsuccessful. I finally looked at the datatype of the NationalIDNumber. All this time, I was assuming that this datatype is INT, but on a careful check, I found that the datatype of NationalIDNumber is nvarchar(15).

SQL SERVER - Removing Key Lookup - Seek Predicate - Predicate - An Interesting Observation Related to Datatypes klseekdt

In the SELECT statement, we were comparing the datatype of NVARCHAR to INT, and this was forcing thepredicate operation while executing the query.

SQL SERVER - Removing Key Lookup - Seek Predicate - Predicate - An Interesting Observation Related to Datatypes klseekpre

As discussed earlier, due to predicate operation, there has to be explicit conversion on the side of NationalIDNumber, which forces the query optimizer to not use the index and instead it has to scan the complete data in table to get the necessary data. This is not the desired solution. Index Scan reduces performance. The reason for this conversion is because I am using INT value in WHERE clause instead of NVARCHAR.

I changed my WHERE clause and passed STRING as the parameter instead of INT.

-- CTRL + M Removes Key Lookup and it enforces Index Seek
-- Try 3SELECT NationalIDNumber, HireDate, MaritalStatusFROM HumanResources.EmployeeWHERE NationalIDNumber = '14417807'GO

After running the query with the changed WHERE clause, the Index Scan is now converted into Index Seek.

SQL SERVER - Removing Key Lookup - Seek Predicate - Predicate - An Interesting Observation Related to Datatypes klseek3

Index Seek is definitely the most optimal solution in this particular scenario.

SQL SERVER - Removing Key Lookup - Seek Predicate - Predicate - An Interesting Observation Related to Datatypes klseekpre1

When the detail execution plan was checked, I found the following two notable points. First, the predicate is converted to seek predicate, which is the reason for performance improvement, as described earlier. Instead of scanning data in the table, Index Seek is performed. Second, as the datatype of the NationalIDNumber is NVARCHAR and the parameters are passed as VARCHAR, the conversion happens on the parameters instead of NationalIDNumber column, and this forces Index Scan to Index Seek.

If we pass the parameter as NVARCHAR instead of VARCHAR, the execution plan will remain the same, but CONVERT_IMPLICIT will not be required any more. Let us run the following query, which has NVARCHAR as the parameter.

-- CTRL + M Removes Key Lookup and it enforces Index Seek and no CONVERT_IMPLICIT
-- Try 4SELECT NationalIDNumber, HireDate, MaritalStatusFROM HumanResources.EmployeeWHERE NationalIDNumber = N'14417807'GO

The execution plan of the above query is very similar to that in which we had passed the parameter as VARCHAR.

Now let us check the execution plan for the same.

SQL SERVER - Removing Key Lookup - Seek Predicate - Predicate - An Interesting Observation Related to Datatypes klseekpre2

In the WHERE condition, the operators we have on both the sides of “=” are of NVARCHAR. NationalIDNumber and parameter passed – both are NVARCHAR, which has removed CONVERT_IMPLICIT. However, there are no changes in the execution plan.

In summary, when Key Lookup is removed and index seek replaces index scan, the performance is tuned up. Let us quickly compare the execution plan of the above four options. I have included the complete code here for easy reference.

USE AdventureWorks
GO-- CTRL + M Enforces Key Lookup
-- Try 1SELECT NationalIDNumber, HireDate, MaritalStatusFROM HumanResources.EmployeeWHERE NationalIDNumber = 14417807
GO-- Create Non clustered IndexCREATE NONCLUSTERED INDEX [IX_HumanResources_Employee_Example] ON HumanResources.Employee(NationalIDNumber ASC, HireDate, MaritalStatus) ON [PRIMARY] GO--WAITFOR DELAY '00:00:30'
-- CTRL + M Removes Key Lookup, but it still enforces Index Scan
-- Try 2SELECT NationalIDNumber, HireDate, MaritalStatusFROM HumanResources.EmployeeWHERE NationalIDNumber = 14417807
GO-- CTRL + M Removes Key Lookup and it enforces Index Seek
-- Try 3SELECT NationalIDNumber, HireDate, MaritalStatusFROM HumanResources.EmployeeWHERE NationalIDNumber = '14417807'GO-- CTRL + M Removes Key Lookup and it enforces Index Seek and no CONVERT_IMPLICIT
-- Try 4SELECT NationalIDNumber, HireDate, MaritalStatusFROM HumanResources.EmployeeWHERE NationalIDNumber = N'14417807'GO/* What is the reason for difference between Try 2 and Try 3?
Check the exeuction plan
*/
-- Drop IndexDROP INDEX [IX_HumanResources_Employee_Example] ON HumanResources.Employee WITH ( ONLINE = OFF )GO

Let us look at the execution plan.

SQL SERVER - Removing Key Lookup - Seek Predicate - Predicate - An Interesting Observation Related to Datatypes klseek

Summary points:

In general, Index Seek is better than Index Scan (I am not talking about it depends conditions)
Understand Predicates and Seek Predicates and see if you have only Seek Predicates.
In case of key lookup or bookmark lookup, see if you can create a covering index or included column index.
Use the datatype wisely even though there is no change in the resultset.

When a small number of rows are requested by a query, the SQL Server optimizer will try to use a non-clustered index on the column or columns contained in the WHERE clause to retrieve the data requested by the query. If the query requests data from columns not present in the non-clustered index, SQL Server must go back to the data pages to get the data in those columns. Even if the table has a clustered index or not, the query will still have to return to the table or clustered index to retrieve the data.

In the above scenario, if table has clustered index, it is called bookmark lookup (or key lookup); if the table does not have clustered index, but a non-clustered index, it is called RID lookup. This operation is very expensive. To optimize any query containing bookmark lookup or RID lookup, it should be removed from the execution plan to improve performance. There are two different ways to remove bookmark/RID lookup.

Before we understand these two methods, we will create sample table without clustered index and simulate RID lookup. RID Lookup is a bookmark lookup on a heap that uses a supplied row identifier (RID).

USE tempdb
GO-- Create Table OneIndex with few columnsCREATE TABLE OneIndex (ID INT,FirstName VARCHAR(100),LastName VARCHAR(100),City VARCHAR(100))GO-- Insert One Hundred Thousand RecordsINSERT INTO OneIndex (ID,FirstName,LastName,City)SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,'Bob',CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'ELSE 'Brown' END,CASEWHEN ROW_NUMBER() OVER (ORDER BY a.name)%1000 = 1 THEN 'Las Vegas'WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York'WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino'WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles'ELSE 'Houston' END
FROM sys.all_objects aCROSS JOIN sys.all_objects b
GO

Now let us run following select statement and check the execution plan.

SELECT ID, FirstNameFROM OneIndexWHERE City = 'Las Vegas'GO

SQL SERVER - Query Optimization - Remove Bookmark Lookup - Remove RID Lookup - Remove Key Lookup keylook1

As there is no index on table, scan is performed over the table. We will create a clustered index on the table and check the execution plan once again.

-- Create Clustered IndexCREATE CLUSTERED INDEX [IX_OneIndex_ID] ON [dbo].[OneIndex] ([ID] ASC) ON [PRIMARY] GO

Now, run following select on the table once again.

SELECT ID, FirstNameFROM OneIndexWHERE City = 'Las Vegas'GO

SQL SERVER - Query Optimization - Remove Bookmark Lookup - Remove RID Lookup - Remove Key Lookup keylook2

It is clear from execution plan that as a clustered index is created on the table, table scan is now converted to clustered index scan. In either case, base table is completely scanned and there is no seek on the table.

Now, let us see the WHERE clause of our table. From our basic observation, if we create an index on the column that contains the clause, a performance improvement may be obtained. Let us create non-clustered index on the table and then check the execution plan.

-- Create Index on Column City As that is used in where conditionCREATE NONCLUSTERED INDEX [IX_OneIndex_City] ON [dbo].[OneIndex] ([City] ASC) ON [PRIMARY] GO

After creating the non-clustered index, let us run our select statement again and check the execution plan.

SELECT ID, FirstNameFROM OneIndexWHERE City = 'Las Vegas'GO

SQL SERVER - Query Optimization - Remove Bookmark Lookup - Remove RID Lookup - Remove Key Lookup keylook3

As we have an index on the WHERE clause, the SQL Server query execution engine uses the non-clustered index to retrieve data from the table. However, the columns used in the SELECT clause are still not part of the index, and to display those columns, the engine will have to go to the base table again and retrieve those columns. This particular behavior is known as bookmark lookup or key lookup.

There are two different methods to resolve this issue. I have demonstrated both the methods together; however, it is recommended that you use any one of these methods for removing key lookup. I prefer Method 2.

Method 1: Creating non-clustered cover index.

In this method, we will create non-clustered index containing the columns, which are used in the SELECT statement, along with the column which is used in the WHERE clause.

CREATE NONCLUSTERED INDEX [IX_OneIndex_Cover] ON [dbo].[OneIndex] (City, FirstName, ID) ON [PRIMARY] GO

Once the above non-clustered index, which covers all the columns in query, is created, let us run the following SELECT statement and check our execution plan.

SELECT ID, FirstNameFROM OneIndexWHERE City = 'Las Vegas'GO

From the execution plan, we can confirm that key lookup is removed the only index seek is happening. As there is no key lookup, the SQL Server query engine does not have to go to retrieve the data from data pages and it obtains all the necessary data from index itself.

SQL SERVER - Query Optimization - Remove Bookmark Lookup - Remove RID Lookup - Remove Key Lookup keylook4

Method 2: Creating an included column non-clustered index.

Here, we will create non-clustered index that also includes the columns, which are used in the SELECT statement, along with the column used in the WHERE clause. In this method, we will use new syntax introduced in SQL Server 2005. An index with included nonkey columns can significantly improve query performance when all columns in the query are included in the index.

CREATE NONCLUSTERED INDEX [IX_OneIndex_Include] ON [dbo].[OneIndex] (City) INCLUDE (FirstName,ID) ON [PRIMARY] GO

From the execution plan, it is very clear that this method also removes the key lookup as well.

SQL SERVER - Query Optimization - Remove Bookmark Lookup - Remove RID Lookup - Remove Key Lookup keylook5

In summary, Key lookup, Bookmark lookup or RID lookup reduces the performance of query, and we can improve the performance of query by using included column index or cover index.

Discovering Unused Indexes

When SQL Server 2005 was introduced it added Dynamic Management Views (DMVs) that allow you to get additional insight as to what is going on within SQL Server. One of these areas is the ability to see how indexes are being used. There are two DMVs that we will discuss. Note that these views store cumulative data, so when SQL Server is restated the counters go back to zero, so be aware of this when monitoring your index usage.

DMV - sys.dm_db_index_operational_stats

This DMV allows you to see insert, update and delete information for various aspects for an index. Basically this shows how much effort was used in maintaining the index based on data changes.

If you query the table and return all columns, the output may be confusing. So the query below focuses on a few key columns. To learn more about the output for all columns you can check out Books Online.

SELECT OBJECT_NAME(A.[OBJECT_ID]) AS [OBJECT NAME], 
       I.[NAME] AS [INDEX NAME], 
       A.LEAF_INSERT_COUNT, 
       A.LEAF_UPDATE_COUNT, 
       A.LEAF_DELETE_COUNT 
FROM   SYS.DM_DB_INDEX_OPERATIONAL_STATS (db_id(),NULL,NULL,NULL ) A 
       INNER JOIN SYS.INDEXES AS I 
         ON I.[OBJECT_ID] = A.[OBJECT_ID] 
            AND I.INDEX_ID = A.INDEX_ID 
WHERE  OBJECTPROPERTY(A.[OBJECT_ID],'IsUserTable') = 1

Below we can see the number of Inserts, Updates and Deletes that occurred for each index, so this shows how much work SQL Server had to do to maintain the index.

SYS.DM_DB_INDEX_OPERATIONAL_STATS output

DMV - sys.dm_db_index_usage_stats

This DMV shows you how many times the index was used for user queries. Again there are several other columns that are returned if you query all columns and you can refer to Books Online for more information.

SELECT OBJECT_NAME(S.[OBJECT_ID]) AS [OBJECT NAME], 
       I.[NAME] AS [INDEX NAME], 
       USER_SEEKS, 
       USER_SCANS, 
       USER_LOOKUPS, 
       USER_UPDATES 
FROM   SYS.DM_DB_INDEX_USAGE_STATS AS S 
       INNER JOIN SYS.INDEXES AS I ON I.[OBJECT_ID] = S.[OBJECT_ID] AND I.INDEX_ID = S.INDEX_ID 
WHERE  OBJECTPROPERTY(S.[OBJECT_ID],'IsUserTable') = 1
       AND S.database_id = DB_ID()

Here we can see seeks, scans, lookups and updates.

The seeks refer to how many times an index seek occurred for that index. A seek is the fastest way to access the data, so this is good.
The scans refers to how many times an index scan occurred for that index. A scan is when multiple rows of data had to be searched to find the data. Scans are something you want to try to avoid.
The lookups refer to how many times the query required data to be pulled from the clustered index or the heap (does not have a clustered index). Lookups are also something you want to try to avoid.
The updates refers to how many times the index was updated due to data changes which should correspond to the first query above.

Identifying Unused Indexes

So based on the output above you should focus on the output from the second query. If you see indexes where there are no seeks, scans or lookups, but there are updates this means that SQL Server has not used the index to satisfy a query but still needs to maintain the index. Remember that the data from these DMVs is reset when SQL Server is restarted, so make sure you have collected data for a long enough period of time to determine which indexes may be good candidates to be dropped.

Counter of Festivals

Ashok Blog for SQL Learners and Beginners and Experts

Monday 23 May 2016

Types of Physical Joins in SQL Server?

Types of Physical Joins in SQL Server?

1) Nested Join (small or small and large tables)

2)Merge Join (large should sorted before and proper indexes)

3)Hash Join (large without index)

Nested Loops Join

Merge Join

Hash Join

Wednesday 11 May 2016

What is Index Scans/Table Scans and What is Bookmark/Key/RID lookup and How is Discovering Unused Indexes?

Create New Index

Summary

Eliminating bookmark /Key & Rid lookups

Creating a Covering Index

Creating an Index with Included Columns

Discovering Unused Indexes

DMV - sys.dm_db_index_operational_stats

DMV - sys.dm_db_index_usage_stats

Identifying Unused Indexes

Recent Comments on this Blog

SQL Question

Visitors Location @ Globe

Featured post

How to Verify Database Backup file is not corrupted one in SQL Server? How to Validate Database Backup in SQL Serer?