Counter of Festivals

Ashok Blog for SQL Learners and Beginners and Experts

Wednesday, 6 May 2015

Columnstore indexes in SQL 2012 and Sparse Columns in SQL 2008

Columnstore indexes in SQL 2012:

Columnstore indexes in the SQL Server Database Engine can be used to significantly speed-up the processing time of common data warehousing queries.

Ref:

https://msdn.microsoft.com/en-us/library/gg492153(v=sql.110).aspx

It is available From SQL 2012 only.

Sample:

CREATE NONCLUSTERED COLUMNSTORE INDEX csindx_simple
ON SimpleTable
(OrderDateKey, DueDateKey, ShipDateKey);

GO
The following data types can be included in a columnstore index.
  • char and varchar
  • nchar and nvarchar (except varchar(max) and nvarchar(max))
  • decimal (and numeric) (Except with precision greater than 18 digits.)
  • int , bigintsmallint, and tinyint
  • float (and real)
  • bit
  • money and smallmoney
  • All date and time data types (except datetimeoffset with scale greater than 2)
The following data types cannot be included in a columnstore index.
  • binary and varbinary
  • ntext , text, and image
  • varchar(max) and nvarchar(max)
  • uniqueidentifier
  • rowversion (and timestamp)
  • sql_variant
  • decimal (and numeric) with precision greater than 18 digits
  • datetimeoffset with scale greater than 2
  • CLR types (hierarchyid and spatial types)
  • xml
Basic Restrictions
A columnstore index:
  • Cannot have more than 1024 columns.
  • Cannot be clustered. Only nonclustered columnstore indexes are available.
  • Cannot be a unique index.
  • Cannot be created on a view or indexed view.
  • Cannot include a sparse column.
Sparse columns are ordinary columns that have an optimized storage for null values. Sparse columns reduce the space requirements for null values at the cost of more overhead to retrieve nonnull values. Consider using sparse columns when the space saved is at least 20 percent to 40 percent

for more details about sparse column see here


  • Cannot act as a primary key or a foreign key.
  • Cannot be changed using the ALTER INDEX statement. Drop and re-create the columnstore index instead. (You can use ALTER INDEX to disable and rebuild a columnstore index.)
  • Cannot be created by with the INCLUDE keyword.
  • Cannot include the ASC or DESC keywords for sorting the index. Columnstore indexes are ordered according to the compression algorithms. Sorting would eliminate many of the performance benefits.
Columnstore indexes cannot be combined with the following features:
  • Page and row compression, and vardecimal storage format (A columnstore index is already compressed in a different format.)
  • Replication
  • Change tracking
  • Change data capture
  • Filestream
Table cannot be updated – For SQL Server 2012, a table with a columnstore index cannot be updated. For workarounds, see Best Practices: Updating Data in a Columnstore Index

Tables that have a columnstore index cannot be updated. 
To update a table with a columnstore index, drop the columnstore index, perform any required INSERTDELETEUPDATE, or MERGE operations, and then rebuild the columnstore index.

Choosing Columns for a Column Store Index

Some of the performance benefit of a columnstore index is derived from the compression techniques that reduce the number of data pages that must be read and manipulated to process the query. Compression works best on character or numeric columns that have large amounts of duplicated values. For example, dimension tables might have columns for postal codes, cities, and sales regions. If many postal codes are located in each city, and if many cities are located in each sales region, then the sales region column would be the most compressed, the city column would have somewhat less compression, and the postal code would have the least compression. Although all columns are good candidates for a columnstore index, adding the sales region code column to the columnstore index will achieve the greatest benefit from columnstore compression, and the postal code will achieve the least.
Dimension Table
Postalcodes cities salesregions
City Table
Cityname postalcodes
Salesregions Table
Salesregions Cityname

Sparse Columns in SQL 2008:


Sparse columns are ordinary columns that have an optimized storage for null values. Sparse columns reduce the space requirements for null values at the cost of more overhead to retrieve nonnull values. Consider using sparse columns when the space saved is at least 20 percent to 40 percent

Sparse Column is one more new feature introduced in SQL SERVER 2008. 
Storing a null value in a sparse column doesn’t take any space, but storing a non-null value in sparse column takes 4 bytes extra space than the non-sparse columns of the same data type.

SPARSE column are better at managing NULL and ZERO values in SQL Server. It does not take any space in database at all. If column is created with SPARSE clause with it and it contains ZERO or NULL it will be take lesser space then regular column (without SPARSE clause).
In SQL Server 2008 maximum column allowed per table is 1024. All the SPARSE columns does not count to this limit of 1024. The maximum limit of SPARSE column is 100,000. 
In summary any table can have maximum of 100,000 SPARSE and 1024 regular columns.

GO
CREATE TABLE SPARSECOLUMNTABLE
(
 col1 int identity(1,1),
 col2 datetime sparse,
 col3 int sparse
)
CREATE TABLE NONSPARSECOLUMNTABLE
(
 col1 int identity(1,1),
 col2 datetime,
 col3 int
)
GO
INSERT INTO SPARSECOLUMNTABLE VALUES(NULL,NULL)
INSERT INTO NONSPARSECOLUMNTABLE VALUES(NULL,NULL)
GO 25000

 Now check the space used by these tables by executing the below statements:
EXEC SP_Spaceused SPARSECOLUMNTABLE 
EXEC SP_Spaceused NONSPARSECOLUMNTABLE
Result:
name              rows        reserved data   index_size unused
SPARSECOLUMNTABLE 25000       392 KB  344 KB 8 KB       40 KB

name                 rows        reserved data   index_size unused
NONSPARSECOLUMNTABLE 25000       712 KB  656 KB 8 KB       48 KB
sparse SQL SERVER   2008   Introduction to SPARSE Columns


Script:
CREATE TABLE UnSparsed(ID INT IDENTITY(1,1),FirstCol INT,SecondCol VARCHAR(100),ThirdCol SmallDateTime)GOCREATE TABLE Sparsed(ID INT IDENTITY(1,1),FirstCol INT SPARSE,SecondCol VARCHAR(100SPARSE,ThirdCol SmallDateTime SPARSE)GODECLARE @idx INT = 0WHILE @idx 50000BEGIN
INSERT INTO 
UnSparsed VALUES (NULL,NULL, NULL)INSERT INTO Sparsed VALUES (NULL, NULL, NULL)SET @idx+=1ENDGOsp_spaceused 'UnSparsed'GOsp_spaceused 'Sparsed'GODROP TABLE UnSparsed
GO
DROP TABLE Sparsed
GO