Counter of Festivals

Ashok Blog for SQL Learners and Beginners and Experts

Wednesday 2 December 2015

Improvements in Table Variables and Temporary Tables in SQL Server 2014

SQL Server 2014 introduces a number of gems that can make your solutions faster: support for inline index definitions, memory optimized table types and table valued parameters (TVPs), parallel SELECT INTO, relaxed eager writes and improved cardinality estimates for table variables. The last two improvements were also "backported" to SQL Server 2012. Some of the new features target specifically temporary objects, whereas others are more general and just happen to effect temporary objects as well.
The examples in this articles demonstrating the new features use a sample database called PerformanceV3. You can find the source code to create this database here:http://tsql.solidq.com/books/source_code/PerformanceV3.zip.
After running this script, make sure you set your database context to PerformanceV3:
                              
--------------------------------------------------------------------- -- T-SQL Black-Belt 2015-04, Itzik Ben-Gan -- Improvements in Table Variables and Temporary Tables in SQL Server 2014 --------------------------------------------------------------------- -- Make sure you have database PerformanceV3 installed -- Installation script: http://tsql.solidq.com/books/source_code/PerformanceV3.zip USE PerformanceV3; GO

In some of the examples I use a helper table function called GetNums, which accepts integer inputs called @low and @high, and returns a sequence of integers in the requested range. Use the code in Listing 1 to create the GetNums functions the sample database.
                              
IF OBJECT_ID(N'dbo.GetNums', N'IF') IS NOT NULL DROP FUNCTION dbo.GetNums; GO CREATE FUNCTION dbo.GetNums(@low AS BIGINT, @high AS BIGINT) RETURNS TABLE AS RETURN   WITH         L0   AS (SELECT c FROM (SELECT 1 UNION ALL SELECT 1) AS D(c)),     L1   AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),     L2   AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),     L3   AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),     L4   AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),     L5   AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),     Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum              FROM L5) SELECT TOP(@high - @low + 1) @low + rownum - 1 AS n FROM Nums ORDER BY rownum; GO
[Listing 1: GetNums helper function]

 

Inline Index Definition

Prior to SQL Server 2014, when it came to indexing, table variables were shortchanged compared to both regular and temporary tables. Once you declare a table variable, you cannot alter its definition. This meant that if you wanted indexes in your table variables, your only option was to define those indirectly via inline PRIMARY KEY and UNIQUE constraint definitions. You couldn’t, for example, define non-unique indexes.
As you probably know, one of the bigger features added in SQL Server 2014 is the In-Memory OLTP engine, with its support for memory optimized tables, hash and BW-Tree indexes and natively compiled stored procedures. The initial implementation precludes the ability to alter the definition of a memory optimized table once you created it. This restriction required Microsoft to introduce support for inline index definitions as part of the table creation syntax. Since they already did the work in the parser, Microsoft decided to extend the support for such syntax to also disk-based tables, including table variables. Consequently, in SQL Server 2014, as part of a table variable’s definition, you can define inline non-unique indexes, like so:
                              
DECLARE @T1 AS TABLE (   keycol INT NOT NULL     PRIMARY KEY NONCLUSTERED,   col1 INT NOT NULL     INDEX idx_cl_col1 CLUSTERED, -- column index   col2 INT NOT NULL,   col3 INT NOT NULL,   INDEX idx_nc_col2_col3 NONCLUSTERED (col2, col3) -- table index );
Just like with column level and table level constraints, you can define column level and table level indexes. You will typically use the former syntax when the index has a single key column as is the case in the above example with the index on col1. You have to use the latter syntax when the index is composite, as is the case with the index on col2 and col3. As you can see, you can indicate whether the index is clustered or nonclustered. Currently, inline indexes do not support the options UNIQUE, INCLUDE and WHERE. The lack of support for the UNIQUE option is no big deal since you can always define a PRIMARY KEY or UNIQUE constraint, which create a unique index underneath the covers. Hopefully, we will see support for the INCLUDE and WHERE options in the future.

Memory optimized table types and TVPs

SQL Server 2008 introduced support for table types and table valued parameters (TVPs). Prior to SQL Server 2014, a table variable based on a table type was always represented as a set of pages in tempdb. With SQL Server’s support for memory optimized tables and natively compiled stored procedures as part of the In-Memory OLTP feature, Microsoft also adds support for memory optimized table types and TVPs. The original thinking was to allow you to declare a table variable of a memory optimized table type, fill it with rows, and pass it as a TVP to a natively compiled procedure. However, nothing prevents you from creating table variables based on memory optimized table types and use those for other purposes, including passing them as TVPs to regular procedures. This way you can leverage the performance benefits of the memory optimized structures and avoid the disk-based representation in tempdb. Just bear in mind that in the initial implementation of the In-Memory OLTP feature in SQL Server 2014, using a memory optimized table in the query is a parallelism inhibitor. So make sure you do some testing to compare the use of disk-based table types and TVPs with memory optimized ones to decide which ones work better for you.
To use this feature, you need to add to your database a filegroup for memory optimized data (marked as CONTAINS MEMORY_OPTIMIZED_DATA) and a container pointing to a folder in the file system similar to what you use with the FILESTREAM feature. The parent folder must already exist and the child folder must not exist when you add the container. So before you execute the following code, make sure that the folder C:\IMOLTP\ exists and that the folder C:\IMOLTP\PerformanceV3_dir\ doesn't exist (or replaced with alternative folder names of your preference):
                              
ALTER DATABASE PerformanceV3   ADD FILEGROUP PerformanceV3_MO CONTAINS MEMORY_OPTIMIZED_DATA; GO ALTER DATABASE PerformanceV3   ADD FILE ( NAME = PerformanceV3_dir,              FILENAME = 'C:\IMOLTP\PerformanceV3_dir' )     TO FILEGROUP PerformanceV3_MO; GO
To create a memory optimized table type, add the option MEMORY_OPTIMIZED = ON to the type definition. As an example, the following code creates a table type called OrderIDs that represents a set of order IDs:
                              
IF TYPE_ID(N'dbo.OrderIDs') IS NOT NULL DROP TYPE dbo.OrderIDs; CREATE TYPE dbo.OrderIDs AS TABLE (         orderid INT NOT NULL PRIMARY KEY NONCLUSTERED ) WITH (MEMORY_OPTIMIZED = ON);

A memory optimized table has to have at least one index to enable access to the rows in memory. It can be either a BW-Tree index (a lock free, latch free, variation of a B-tree index) like the one in our example, or a hash one. The former is efficient for range and order-based activities. The latter is efficient for point queries. To make the index in our example a hash one, right after the keyword NONCLUSTERED, add HASH WITH (BUCKET_COUNT = <count>). Microsoft recommends to specify a bucket count that is one to two times the number of distinct values that you expect in the column.
With the table type defined, you can now use it as a type for table variables, like so:

                              
DECLARE @MyOrderIDs AS dbo.OrderIDs; INSERT  INTO @MyOrderIDs         (orderid) VALUES  (1759),         (1760),         (1761); SELECT  * FROM    @MyOrderIDs;

As mentioned, you can also use a memory optimized table type as a type for TVPs. As an example, the following code creates a procedure that accepts a parameter called @OrderIDs of the OrderIDs type with an input set of order IDs, and returns information about the requested orders:
                              
IF OBJECT_ID(N'dbo.GetOrders', N'P') IS NOT NULL      DROP PROC dbo.GetOrders; GO CREATE PROC dbo.GetOrders     (      @OrderIDs AS dbo.OrderIDs READONLY      ) AS      SELECT  O.orderid           , O.orderdate           , O.custid           , O.empid           , O.filler     FROM    dbo.Orders AS O             INNER JOIN @OrderIDs AS I ON O.orderid = I.orderid     ORDER BY I.orderid; GO
Use the following code to execute the procedure with the set of order IDs {1759, 1760, 1761} as input:
                              
DECLARE @MyOrderIDs AS dbo.OrderIDs; INSERT  INTO @MyOrderIDs         (orderid) VALUES  (1759),         (1760),         (1761); EXEC dbo.GetOrders @OrderIDs = @MyOrderIDs;
Table 1 shows the output of this execution.
Table 1: Output of GetOrders procedure
orderid     orderdate  custid      empid       filler
----------- ---------- ----------- ----------- -------
1759        2011-01-03 C0000006044 117         a
1761        2011-01-03 C0000014937 111         a
1760        2010-12-24 C0000000402 355         a

When you’re done, run the following code for cleanup:
                              
IF OBJECT_ID(N'dbo.GetOrders', N'P') IS NOT NULL      DROP PROC dbo.GetOrders; IF TYPE_ID(N'dbo.OrderIDs') IS NOT NULL      DROP TYPE dbo.OrderIDs;
SQL Server 2014 doesn’t support removing a memory optimized filegroup. To remove it, you will need to drop and recreate the sample database.

Parallel SELECT INTO

Prior to SQL Server 2014 a SELECT INTO statement could not be processed with parallelism. More specifically, the actual insertion part (Table Insert operator) was always handled in a serial zone. SQL Server 2014 introduces support for parallel treatment of SELECT INTO. The beauty in this features is that it doesn’t require any changes on your part, other than using SQL Server 2014. With parallel processing you can see some significant performance improvements. This feature is not limited to using temporary tables as the target, but it’s just very common to use SELECT INTO to create and populate a target temporary table.
As an example, I ran the following code in both SQL Server 2012 and 2014:
                              
IF OBJECT_ID(N'tempdb..#MyOrders', N'U') IS NOT NULL DROP TABLE #MyOrders; GO SELECT * INTO #MyOrders FROM PerformanceV3.dbo.Orders; GO
Figure 1 shows the plan I got in SQL Server 2012 (with the serial Table Insert operator), and Figure 2 shows the plan that I got in SQL Server 2014 (with the parallel Table Insert operator).
Figure 1 (Plan Using SQL Server 2012)

Figure 2 (Plan Using SQL Server 2014)

 

Eager Writes

Thanks to Bob Ward and Paul White for sharing information about this feature.
SQL Server employs a feature called eager writes to ensure that bulk operations don’t flood the memory with new pages at the account of existing cached pages. The idea is to keep track of a circular list of 32 dirty pages. When the list is full, eager writes write them to disk to free the list for a new set of pages. The problem with this feature is that with short lived bulk operations in tempdb (e.g., SELECT INTO #TempTable, CREATE INDEX WITH SORT_IN_TEMPDB, and others), you end up with lots of physical disk activity that hurts the performance of the operation. SQL Server 2014 introduces a new behavior that relaxes the eager writes for any page that is written to by a bulk operation and associated with tempdb. For example, if you have a stored procedure that creates and populates a temporary table with a SELECT INTO statement, and finishes quickly, chances are that you won’t have any physical disk activity.
To get information about eager writes behavior you can enable trace flag 3917 in your session. You will also need to enable trace flag 3604 to get the output to the client or 3605 to get the output to the error log.
The nice thing about this feature is that, like the parallel SELECT INTO improvement, no code changes are required on your part. Things just run faster as long as you use the right version and build number of the product that support this improvement. This feature was initially introduced in SQL Server 2014, but later backported to SQL Server 2012 SP1 CU10 and SQL Server 2012 SP2 CU1.
For more information about the eager writes improvement see the following blog entry from the CSS SQL Server Engineers.

Cardinality Estimates with TF 2453

The last improvement that I’ll discuss in this article concerns table variables and is available in SQL Server 2014 RTM CU3 and in SQL Server 2012 SP2. You can find the support entry describing it here.
It’s a well-known fact that SQL Server doesn’t maintain statistics like histograms and density vectors for table variables. However, SQL Server does maintain a count of rows in the table, which in some cases can go a long way in helping the optimizer make optimal choices. A good example is when you need to store a set of keys like order IDs in a table variable, and then join the table variable with a user table to get data from the related rows. With a small count of values in the table variable, the optimal strategy is to use a serial plan with a Nested Loops join algorithm. With a large count, the optimal strategy is to use a parallel plan with a Hash join algorithm. The thing is, even though SQL Server maintains the row count for table variables this information is usually not available to the optimizer. That’s because the initial optimization unit is the entire batch and not the single statement (unlike the recompile unit). The optimizer isn’t supposed to run the code that populates the table variable before it optimizes the query; therefore, it has to optimize the query without the knowledge of the row count. By default, it will just assume that that the table is very small (usually one row).
As an example, consider the following code:
                              
DECLARE @OrderIDs AS TABLE     (      orderid INT NOT NULL                  PRIMARY KEY     ); INSERT  INTO @OrderIDs         (orderid         )         SELECT  N.n AS orderid         FROM    dbo.GetNums(900001, 1000000) AS N;  -- http://tsql.solidq.com/GetNums.txt SELECT  O.orderid       , O.orderdate       , O.custid       , O.empid       , O.filler FROM    @OrderIDs AS K         INNER JOIN dbo.Orders AS O ON O.orderid = K.orderid --OPTION(RECOMPILE);
The plan for the query is shown in Figure 3. Observe that the estimated number of rows is 1 even though the actual is 100,000. As a result, the optimizer chose a serial plan with a Nested Loops join algorithm.

Figure 3 - Plan with Nested Loops Join

One common solution that people use for this problem is to force SQL Server to recompile the query in every execution of the code by specifying the RECOMPILE query option. In our case simply uncomment the option in the code. What’s different with this option is that the Optimizer gets to optimize the code at the statement level after the table variable was populated, hence the row count is visible to it. Run the code after uncommenting this option. The plan for the query is shown in Figure 4.
Figure 4 - Plan with Hash Join

Observe that this time the cardinality estimate is accurate and therefore the optimizer chose a parallel plan with a Hash join algorithm.
So with the RECOMPILE query option you do get an efficient plan based on known row count information, but this costs you a recompile in every execution.
Another way to enable the optimizer to know the count of rows in the table variable is to pass it to a stored procedure as an input TVP. The optimizer can tell what the count of rows is since the table variable is populated before it is passed to the stored procedure as a TVP; namely, before optimization starts. To demonstrate this solution first create a table type called OrderIDs by running the following code:
                              
IF TYPE_ID(N'dbo.OrderIDs') IS NOT NULL      DROP TYPE dbo.OrderIDs; CREATE TYPE dbo.OrderIDs AS TABLE (         orderid INT NOT NULL PRIMARY KEY NONCLUSTERED );
Next, create a stored procedure called GetOrders that accepts a TVP of the OrderIDs table type as input, and joins the Orders table with the input TVP to return information about the requested orders:
                              
IF TYPE_ID(N'dbo.OrderIDs') IS NOT NULL      DROP TYPE dbo.OrderIDs; CREATE TYPE dbo.OrderIDs AS TABLE (         orderid INT NOT NULL PRIMARY KEY NONCLUSTERED ); IF OBJECT_ID(N'dbo.GetOrders', N'P') IS NOT NULL      DROP PROC dbo.GetOrders; GO CREATE PROC dbo.GetOrders     (      @OrderIDs AS dbo.OrderIDs READONLY      ) AS      SELECT  O.orderid           , O.orderdate           , O.custid           , O.empid           , O.filler     FROM    dbo.Orders AS O             INNER JOIN @OrderIDs AS I ON O.orderid = I.orderid; GO
Use the following code to declare a table variable of the OrderIDs table type, fill it with 100,000 order IDs, and then call the GetOrders procedure with the table variable as the input TVP:
                              
DECLARE @MyOrderIDs AS dbo.OrderIDs; INSERT INTO @MyOrderIDs(orderid) SELECT  N.n AS orderid FROM dbo.GetNums(900001, 1000000) AS N; EXEC dbo.GetOrders @OrderIDs = @MyOrderIDs;
You get the same plan as the one shown earlier in Figure 4. The optimizer makes an accurate cardinality estimate and consequently chooses an efficient plan for the input table size.
But what if you do not want to use a stored procedure with a TVP (e.g., you need to be able to modify the table variable), and you don’t want to force a recompile in every execution of the code? Microsoft introduces a solution in the form of trace flag 2453, which as mentioned is available in SQL Server 2014 RTM CU3 and in SQL Server 2012 SP2. When this trace flag is enabled, changes in table variables trigger recompiles for nontrivial plans based on the same thresholds as for other tables. Naturally, this results in fewer recompiles compared to forcing one in every execution of the code. And when a recompile does happen, the row count is visible to the optimizer. Curiously, as the support entry for this trace flag explains, unlike with OPTION (RECOMPILE), this trace flag does not cause a recompile to perform parameter embedding (what the entry refers to asparameter peeking). For details about what parameter embedding is, see the following article by Paul White.
To demonstrate this solution, assuming you’re running on a version and build number that support this feature, run the following code:
                              
DBCC TRACEON(2453);  -- use -1 as second parameter for global effect DECLARE @OrderIDs AS TABLE         (      orderid INT NOT NULL PRIMARY KEY     ); INSERT  INTO @OrderIDs(orderid) SELECT  N.n AS orderid FROM    dbo.GetNums(900001, 1000000) AS N; SELECT  O.orderid       , O.orderdate       , O.custid       , O.empid       , O.filler FROM @OrderIDs AS K         INNER JOIN dbo.Orders AS O ON O.orderid = K.orderid;
The code enables trace flag 2453 at the session level (specify -1 as the second parameter to enable it globally), declares a table variable, fills it with 100,000 rows, and then joins the Orders table with the table variable to return the requested orders. A sufficient number of rows is added to the table variable to trigger a statement-level recompile, which in turn allows the Optimizer to get the correct table row count. As a result, you get the same efficient plan for this case as the one show earlier in Figure 4.
To disable the trace flag use the following code:
                              
DBCC TRACEOFF(2453);  -- use -1 as second parameter for global effect

 

Conclusion

This article covered five improvements in SQL Server 2014, two of which were backported to SQL Server 2012. Some of the improvements, like inline index definitions and memory optimized table types and TVPs, are new tools that you will use in new code. Other improvements, like parallel SELECT INTO, relaxed eager writes, and improved cardinality estimates for table variables are internal and don’t require any code changes. You just need to be running on the right version and build, and your solutions automatically start performing better. Both kinds are welcome improvements!

No comments:

Post a Comment