SQL offers many different methods to produce the same results. There is a never-ending debate between SQL developers as to the “best way” or the “most efficient way” to render a result set. Sometimes these disputes even come to blows….well, I am a lover, not a fighter, so I decided to collect some data that will prove which way is the best and most efficient.
The following queries produce an identical result set; the result set contains aggregate payment information for each member who has made more than 1 payment from the dbo.payment table and the first and last name of the member from the dbo.member table.
/*************/
/* Sub Query */
/*************/
SELECT a.[Member Number] ,
m.lastname ,
m.firstname ,
a.[Number Of Payments] ,
a.[Average Payment] ,
a.[Total Paid]
FROM ( SELECT member_no 'Member Number' ,
AVG(payment_amt) 'Average Payment' ,
SUM(payment_amt) 'Total Paid' ,
COUNT(Payment_No) 'Number Of Payments'
FROM dbo.payment
GROUP BY member_no
HAVING COUNT(Payment_No) > 1
) a
JOIN dbo.member m ON a.[Member Number] = m.member_no
/***************/
/* Cross Apply */
/***************/
SELECT ca.[Member Number] ,
m.lastname ,
m.firstname ,
ca.[Number Of Payments] ,
ca.[Average Payment] ,
ca.[Total Paid]
FROM dbo.member m
CROSS APPLY ( SELECT member_no 'Member Number' ,
AVG(payment_amt) 'Average Payment' ,
SUM(payment_amt) 'Total Paid' ,
COUNT(Payment_No) 'Number Of Payments'
FROM dbo.payment
WHERE member_no = m.member_no
GROUP BY member_no
HAVING COUNT(Payment_No) > 1
) ca
/********/
/* CTEs */
/********/
;
WITH Payments
AS ( SELECT member_no 'Member Number' ,
AVG(payment_amt) 'Average Payment' ,
SUM(payment_amt) 'Total Paid' ,
COUNT(Payment_No) 'Number Of Payments'
FROM dbo.payment
GROUP BY member_no
HAVING COUNT(Payment_No) > 1
),
MemberInfo
AS ( SELECT p.[Member Number] ,
m.lastname ,
m.firstname ,
p.[Number Of Payments] ,
p.[Average Payment] ,
p.[Total Paid]
FROM dbo.member m
JOIN Payments p ON m.member_no = p.[Member Number]
)
SELECT *
FROM MemberInfo
/************************/
/* SELECT with Grouping */
/************************/
SELECT p.member_no 'Member Number' ,
m.lastname ,
m.firstname ,
COUNT(Payment_No) 'Number Of Payments' ,
AVG(payment_amt) 'Average Payment' ,
SUM(payment_amt) 'Total Paid'
FROM dbo.payment p
JOIN dbo.member m ON m.member_no = p.member_no
GROUP BY p.member_no ,
m.lastname ,
m.firstname
HAVING COUNT(Payment_No) > 1
We can see what is going on in SQL’s brain by looking at the execution plan. The Execution Plan will demonstrate which steps and in what order SQL executes those steps, and what percentage of batch time each query takes. SO….if I execute all 4 of these queries in a single batch, I will get an idea of the relative time SQL takes to execute them, and how it renders the Execution Plan. We can settle this once and for all. Here is what SQL did with these queries:
Not only did the queries take the same amount of time to execute, SQL generated the same Execution Plan for each of them. Everybody is right…..I guess we can all finally go to lunch together! But wait a second, I may not be a fighter, but I AM an instigator. Let’s see how a table variable stacks up. Here is the code I executed:
/********************/
/* Table Variable */
/********************/
DECLARE @AggregateTable TABLE
(
member_no INT ,
AveragePayment MONEY ,
TotalPaid MONEY ,
NumberOfPayments MONEY
)
INSERT @AggregateTable
SELECT member_no 'Member Number' ,
AVG(payment_amt) 'Average Payment' ,
SUM(payment_amt) 'Total Paid' ,
COUNT(Payment_No) 'Number Of Payments'
FROM dbo.payment
GROUP BY member_no
HAVING COUNT(Payment_No) > 1
SELECT at.member_no 'Member Number' ,
m.lastname ,
m.firstname ,
at.NumberOfPayments 'Number Of Payments' ,
at.AveragePayment 'Average Payment' ,
at.TotalPaid 'Total Paid'
FROM @AggregateTable at
JOIN dbo.member m ON m.member_no = at.member_no
In the interest of keeping things in groupings of 4, I removed the last query from the previous batch and added the table variable query. Here’s what I got:
Since we first insert into the table variable, then we read from it, the Execution Plan renders 2 steps. BUT, the combination of the 2 steps is only 22% of the batch. It is actually faster than the other methods even though it is treated as 2 separate queries in the Execution Plan. The argument I often hear against Table Variables is that SQL only estimates 1 row for the table size in the Execution Plan. While this is true, the estimate does not come in to play until you read from the table variable. In this case, the table variable had 6,706 rows, but it still outperformed the other queries. People argue that table variables should only be used for hash or lookup tables. The fact is, you have control of what you put IN to the variable, so as long as you keep it within reason, these results suggest that a table variable is a viable alternative to sub-queries.
If anyone does volume testing on this theory, I would be interested in the results. My suspicion is that there is a breaking point where efficiency goes down the tubes immediately, and it would be interesting to see where the threshold is.