Statistical Semantic Search uses the indexes that are created by Full-Text Search, and creates additional indexes. As a result of this dependency on full-text search, you create a new semantic index when you define a new full-text index, or when you alter an existing full-text index. You can create a new semantic index by using Transact-SQL statements, or by using the Full-Text Indexing Wizard and other dialog boxes in SQL Server Management Studio
Semantic Search is a powerful new addition to SQL Server 2012. Semantic Search builds upon the full-text index feature and adds the ability for SQL Server to determine key words in documents. Uses can include analysis of text to determine similarity between two documents, extract key words from documents, or find documents that contain key words.
Step-by-step, enabling Semantic Search on a column in SQL Server 2012 involves:
- Install the Full-Text and Semantic Extractions for Search feature
- Install the Microsoft Office 2010 Filter Packs and Service Pack 1
- Install, attach and register the semantic language database
- Create a full-text catalog
- Create a full-text index with the Statistical_Semantics option enabled
Install the Full-Text and Semantic Extractions for Search Feature
If you already enabled the Full-Text and Semantic Extractions for Search feature of the Database Engine during SQL Server 2012 setup, no additional actions are necessary.
If you did not choose to include that feature, you will need to use the SQL Server Installation Center or the installation media to launch SQL Server setup again and select that feature:
but what do you do if you forget to install/enable this option (forget to click tick full text and semantic extractions for search)
scenario:
above pictures SQL 2012 RTM setup file itself semantic full text should be available here if you forget
to install/enable then you just apply patches SP 2(Service Pack) and SP (Service Pack) 3 CU 5
those service pack/hot fix/cumulative updates setup file should not contain any features we should go
to initial setup file to run again to enable feature but it would not affect any db/ version backward
but before we should take backup of all db's and system configuration file and logins
After installation we can check using
SELECT SERVER PROPERTY('Is Full Text Installed')
it shows 1 as installed and 0 as not installed
2. Install the Microsoft Office 2010 Filter Packs and Service Pack 1
The Microsoft Office 2010 Filter Packs contain IFilters (the DLLs used by SQL Server’s full-text indexing to extract text from various file formats, including those from Office 2010, but also other formats) that are not included by default. Without these additional IFilters, SQL Server Full-Text (and by extension, Semantic Search) only understands 50 file formats. After adding the additional filter packs, the number goes up to 157 formats.
The Microsoft Office 2010 Filter Packs can be downloaded from http://www.microsoft.com/download/en/details.aspx?id=17062 and their Service Pack 1 from http://www.microsoft.com/download/en/details.aspx?id=26606 (for 32-bit operating systems) or http://www.microsoft.com/download/en/details.aspx?id=26604 (for 64-bit operating systems).
If you want to index and search PDF files, you will need the Adobe IFilter from http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025.
After installing the filter packs, you will need to execute a few T-SQL commands to actually let SQL Server know that additional IFilters are available:
1
2
3
| EXEC sp_fulltext_service @ action = 'load_os_resources' , @value=1; EXEC sp_fulltext_service 'update_languages' ; EXEC sp_fulltext_service 'restart_all_fdhosts' ; |
After running these commands, you can obtain a list of supported formats using this SQL command:
1
| EXEC sp_help_fulltext_system_components 'filter' |
3. Install, Attach and Register the Semantic Language Database
The semantic analysis performed by SQL Server is essentially statistical analysis of the words in the column contents that have already been indexed by Full-Text Indexing. This statistical analysis relies on base data which is provided as a SQL Server database. This database however is not installed on your system by default; it requires executing a separate installation as well as manually attaching the database files to your SQL Server instance.
The installation is found on the SQL Server installation media in a folder for your processor type:
- For x86, the setup is \x86\Setup\SemanticLanguageDatabase.msi
- For x64, the setup is \x64\Setup\SemanticLanguageDatabase.msi
The setup file will simply extract the MDF and LDF database files to a location of your choice. After the installation, you will attach the database to your SQL Server instance. You would do this like you would any other database file. You can use SSMS or T-SQL script.
Finally, you must register that database as the language statistics database using the following T-SQL command:
1
| EXEC sp_fulltext_semantic_register_language_statistics_db @db name = '_SemanticsDB' |
4. Create a Full-Text Catalog
For each database where you want to perform Full-Text Indexing and semantic search, you must create at least one full-text catalog. This T-SQL command creates a Full-Text Catalog and sets it as the default catalog for new Full-Text Indices:
1
| CREATE FULL TEXT CATALOG Catalog_Name AS DEFAULT |
5. Create a Full-Text Index with the Statistical_Semantics Option Enabled
The last step is to create a Full-Text Index and specify that you want to use statistical semantics also:
1
2
3
4
5
6
7
| CREATE FULL TEXT INDEX ON Table_Name ( Column Name LANGUAGE 1033 Statistical_Semantics ) KEY INDEX PK__PrimaryKey |
This code creates a full-text index in the default full-text catalog for the table with name “Table Name” and includes only one column: Column Name. This code assumes that the column selected for the index contains the actual text data. If the column contains binary data (for example, a PDF file), you must also specify the TYPE COLUMN clause with the name of the column that stores the file extension. This will allow proper selection of the I Filter for that row.
The LANGUAGE clause uses a standard language identifier to indicate the language of the text. In SQL Server 2012, Full-Text Indexing supports 58 languages, but semantic search only supports 15 languages.
The Statistical_Semantics clause enables the statistical semantics analysis on that column. You can enable semantic search on a column-by-column basis, even within a single table.
You can also use the S S M S GUI to create the Full-Text Catalog and Index.
Requirements and Restrictions for Creating a Semantic Index
- You can create an index on any of the database objects that are supported for full-text indexing, including tables and indexed views.
- Before you can enable semantic indexing for specific columns, the following prerequisites must exist:
- A full-text catalog must exist for the database.
- The table must have a full-text index.
- The selected columns must participate in the full-text index.
You can create and enable all these requirements at the same time.
- You can create a semantic index on columns that have any of the data types that are supported for full-text indexing. For more information, see Create and Manage Full-Text Indexes.
- You can specify any document type that is supported for full-text indexing for var binary(max) columns. For more information, see How To: Determine Which Document Types Can Be Indexed in this topic.
- Semantic indexing creates two types of indexes for the columns that you select – an index of key phrases, and an index of document similarity. You cannot select only one type of index or the other when you enable semantic indexing. However you can query these two indexes independently. For more information, see Find Key Phrases in Documents with Semantic Search and Find Similar and Related Documents with Semantic Search.
- If you do not explicitly specify an L C I D for a semantic index, then only the primary language and its associated language statistics are used for semantic indexing.
- If you specify a language for a column for which the language model is not available, the creation of the index fails and returns an error message.
How To: Create a Semantic Index When There Is No Full-Text Index
When you create a new full-text index with the CREATE FULLTEXT INDEX statement, you can enable semantic indexing at the column level by specifying the keyword STATISTICAL_SEMANTICS as part of the column definition. You can also enable semantic indexing when you use the Full-Text Indexing Wizard to create a new full-text index.
Create a new semantic index by using Transact-SQL
Call the CREATE FULLTEXT INDEX statement and specify STATISTICAL_SEMANTICS for each column on which you want to create a semantic index. For more information about all the options for this statement, see CREATE FULLTEXT INDEX (Transact-SQL).
Call the CREATE FULLTEXT INDEX statement and specify STATISTICAL_SEMANTICS for each column on which you want to create a semantic index. For more information about all the options for this statement, see CREATE FULLTEXT INDEX (Transact-SQL).
Example 1: Create a unique index, full-text index, and semantic index
The following example creates a default full-text catalog, ft. The example then creates a unique index on the JobCandidateID column of the HumanResources.JobCandidate table of the AdventureWorks2012 sample database. This unique index is required as the key column for a full-text index. The example then creates a full-text index and a semantic index on the Resume column.
CREATE FULLTEXT CATALOG ft AS DEFAULT GO CREATE UNIQUE INDEX ui_ukJobCand ON HumanResources.JobCandidate(JobCandidateID) GO CREATE FULLTEXT INDEX ON HumanResources.JobCandidate (Resume Language 1033 Statistical_Semantics ) KEY INDEX JobCandidateID WITH STOPLIST = SYSTEM GO
Example 2: Create a full-text and semantic index on several columns with delayed index population
The following example creates a full-text catalog, documents_catalog, in the AdventureWorks2012 sample database. The example then creates a full-text index that uses this new catalog. The full-text index is created on the Title, DocumentSummary, and Document columns of the Production.Document table, while the semantic index is only created on the Document column. This full-text index uses the newly-created full-text catalog and an existing unique key index, PK_Document_DocumentID. As recommended, this index key is created on an integer column, DocumentID. The example specifies the LCID for English, 1033, which is the language of the data in the columns.
This example also specifies that change tracking is off with no population. Later, during off-peak hours, the example uses an ALTER FULLTEXT INDEXstatement to start a full population on the new index and enable automatic change tracking.
CREATE FULLTEXT CATALOG documents_catalog GO CREATE FULLTEXT INDEX ON Production.Document ( Title Language 1033, DocumentSummary Language 1033, Document TYPE COLUMN FileExtension Language 1033 Statistical_Semantics ) KEY INDEX PK_Document_DocumentID ON documents_catalog WITH CHANGE_TRACKING OFF, NO POPULATION GO
Later, at an off-peak time, the index is populated:
ALTER FULLTEXT INDEX ON Production.Document SET CHANGE_TRACKING AUTO GO
Create a new semantic index by using SQL Server Management Studio
Run the Full-Text Indexing Wizard and enable Statistical Semantics on the Select Table Columns page for each column on which you want to create a semantic index. For more information, including information about how to start the Full-Text Indexing Wizard, see Use the Full-Text Indexing Wizard.
Run the Full-Text Indexing Wizard and enable Statistical Semantics on the Select Table Columns page for each column on which you want to create a semantic index. For more information, including information about how to start the Full-Text Indexing Wizard, see Use the Full-Text Indexing Wizard.
How To: Create a Semantic Index When There Is an Existing Full-Text Index
You can add semantic indexing when you alter an existing full-text index with the ALTER FULLTEXT INDEX statement. You can also add semantic indexing by using various dialog boxes in SQL Server Management Studio.
Add a semantic index by using Transact-SQL
Call the ALTER FULLTEXT INDEX statement with the options described below for each column on which you want to add a semantic index. For more information about all the options for this statement, see ALTER FULLTEXT INDEX (Transact-SQL).
Call the ALTER FULLTEXT INDEX statement with the options described below for each column on which you want to add a semantic index. For more information about all the options for this statement, see ALTER FULLTEXT INDEX (Transact-SQL).
Both full-text and semantic indexes are repopulated after a call to ALTER, unless you specify otherwise.
- To add full-text indexing only to a column, use the ADD syntax.
- To add both full-text and semantic indexing to a column, use the ADD syntax with the STATISTICAL_SEMANTICS option.
- To add semantic indexing to a column that is already enabled for full-text indexing, use the ADD STATISTICAL_SEMANTICS option. You can only add semantic indexing to one column in a single ALTER statement.
Example: Add semantic indexing to a column that already has full-text indexing
The following example alters an existing full-text index on Production.Document table in AdventureWorks2012 sample database. The example adds a semantic index on the Document column of the Production.Document table, which already has a full-text index. The example specifies that the index will not be repopulated automatically.
ALTER FULLTEXT INDEX ON Production.Document ALTER COLUMN Document ADD Statistical_Semantics WITH NO POPULATION GO
Add a semantic index by using SQL Server Management Studio
You can change the columns that are enabled for semantic and full-text indexing on the Full-Text Index Columns page of the Full-Text Index Properties dialog box. For more information, see Manage Full-Text Indexes.
You can change the columns that are enabled for semantic and full-text indexing on the Full-Text Index Columns page of the Full-Text Index Properties dialog box. For more information, see Manage Full-Text Indexes.
Requirements and Restrictions for Altering an Existing Index
- You cannot alter an existing index while population of the index is in progress. For more information on monitoring the progress of index population, see Manage and Monitor Semantic Search.
- You cannot add indexing to a column, and alter or drop indexing for the same column, in a single call to the ALTER FULLTEXT INDEXstatement.
How to: Drop a Semantic Index
You can drop semantic indexing when you alter an existing full-text index with the ALTER FULLTEXT INDEX statement. You can also drop semantic indexing by using various dialog boxes in SQL Server Management Studio.
Drop a semantic index by using Transact-SQL
- To drop semantic indexing only from a column or columns, call the ALTER FULLTEXT INDEX statement with the ALTER COLUMNcolumn_nameDROP STATISTICAL_SEMANTICS option. You can drop the indexing from multiple columns in a single ALTERstatement.
```tsql USE database_name GO ALTER FULLTEXT INDEX ALTER COLUMN column_name DROP STATISTICAL_SEMANTICS GO ```
- To drop both full-text and semantic indexing from a column, call the ALTER FULLTEXT INDEX statement with the ALTER COLUMNcolumn_nameDROP option.
USE database_name GO ALTER FULLTEXT INDEX ALTER COLUMN column_name DROP GO
Drop a semantic index by using SQL Server Management Studio
You can change the columns that are enabled for semantic and full-text indexing on the Full-Text Index Columns page of the Full-Text Index Properties dialog box. For more information, see Manage Full-Text Indexes.
You can change the columns that are enabled for semantic and full-text indexing on the Full-Text Index Columns page of the Full-Text Index Properties dialog box. For more information, see Manage Full-Text Indexes.
Requirements and Restrictions for Dropping a Semantic Index
- You cannot drop full-text indexing from a column while retaining semantic indexing. Semantic indexing depends on full-text indexing for document similarity results.
- You cannot specify the NO POPULATION option when you drop semantic indexing from the last column in a table for which semantic indexing was enabled. A population cycle is required to remove the results that were indexed previously.
How To: Check Whether Semantic Search Is Enabled on Database Objects
Is semantic search enabled for a database?
Query the IsFullTextEnabled property of the DATABASEPROPERTYEX (Transact-SQL) metadata function.
Query the IsFullTextEnabled property of the DATABASEPROPERTYEX (Transact-SQL) metadata function.
A return value of 1 indicates that full-text search and semantic search are enabled for the database; a return value of 0 indicates that they are not enabled.
SELECT DATABASEPROPERTYEX('database_name', 'IsFullTextEnabled') GO
Is semantic search enabled for a table?
Query the TableFullTextSemanticExtraction property of the OBJECTPROPERTYEX (Transact-SQL) metadata function.
Query the TableFullTextSemanticExtraction property of the OBJECTPROPERTYEX (Transact-SQL) metadata function.
A return value of 1 indicates that semantic search is enabled for the table; a return value of 0 indicates that it is not enabled.
SELECT OBJECTPROPERTYEX(OBJECT_ID('table_name'), 'TableFullTextSemanticExtraction') GO
Is semantic search enabled for a column?
To determine whether semantic search is enabled for a specific column:
To determine whether semantic search is enabled for a specific column:
- Query the StatisticalSemantics property of the COLUMNPROPERTY (Transact-SQL) metadata function.A return value of 1 indicates that semantic search is enabled for the column; a return value of 0 indicates that it is not enabled.
SELECT COLUMNPROPERTY(OBJECT_ID('table_name'), 'column_name', 'StatisticalSemantics') GO
- Query the catalog view sys.fulltext_index_columns (Transact-SQL) for the full-text index.A value of 1 in the statistical_semantics column indicates that the specified column is enabled for semantic indexing in addition to full-text indexing.
SELECT * FROM sys.fulltext_index_columns WHERE object_id = OBJECT_ID('table_name') GO
- In Object Explorer in Management Studio, right-click on a column and select Properties. On the General page of the Column Propertiesdialog box, check the value of the Statistical Semantics property.A value of True indicates that the specified column is enabled for semantic indexing in addition to full-text indexing.
How To: Check Which Languages Are Supported for Semantic Search
|
---|
Fewer languages are supported for semantic indexing than for full-text indexing. As a result, there may be columns that you can index for full-text search, but not for semantic search. |
Query the catalog view sys.fulltext_semantic_languages (Transact-SQL).
SELECT * FROM sys.fulltext_semantic_languages GO
The following languages are supported for semantic indexing. This list represents the output of the catalog view sys.fulltext_semantic_languages (Transact-SQL), ordered by LCID.
Language | LCID |
---|---|
German | 1031 |
English (US) | 1033 |
French | 1036 |
Italian | 1040 |
Portuguese (Brazil) | 1046 |
Russian | 1049 |
Swedish | 1053 |
English (UK) | 2057 |
Portuguese (Portugal) | 2070 |
Spanish | 3082 |
How To: Determine Which Document Types Can Be Indexed
Query the catalog view sys.fulltext_document_types (Transact-SQL).
If the document type that you want to index is not in the list of supported types, then you may have to locate, download, and install additional filters. For more information, see View or Change Registered Filters and Word Breakers.
Consider creating a separate filegroup for the full-text and semantic indexes if disk space allocation is a concern. The semantic indexes are created in the same filegroup as the full-text index. A fully populated semantic index may contain large amount of data.
How To: Check Whether Semantic Search Is Installed
Query the IsFullTextInstalled property of the SERVERPROPERTY (Transact-SQL) metadata function.
A return value of 1 indicates that Full-Text Search and Semantic Search are installed; a return value of 0 indicates that they are not installed.
SELECT SERVERPROPERTY('IsFullTextInstalled'); GO
How To: Install Semantic Search
To install Semantic Search, select Full-Text and Semantic Extractions for Search on the Features to Install page during setup.
Statistical Semantic Search depends on Full-Text Search. These two optional features of SQL Server are installed together.
Semantic Search has an additional external dependency that is called the semantic language statistics database. This database contains the statistical language models required by semantic search. A single semantic language statistics database contains the language models for all the languages that are supported for semantic indexing.
How To: Check Whether the Semantic Language Statistics Database Is Installed
Query the catalog view sys.fulltext_semantic_language_statistics_database (Transact-SQL).
If the semantic language statistics database is installed and registered for the instance, then the query results contain a single row of information about the database.
SELECT * FROM sys.fulltext_semantic_language_statistics_database; GO
How To: Install, Attach, and Register the Semantic Language Statistics Database
The semantic language statistics database is not installed by the SQL Server setup program. To set up the Semantic Language Statistics database as a prerequisite for semantic indexing, do the following tasks:
1. Install the semantic language statistics database.
- Locate the semantic language statistics database on the SQL Server installation media or download it from the Web.
- Locate the Windows installer package named **SemanticLanguageDatabase.msi** on the [!INCLUDE[ssNoVersion](../Token/ssNoVersion_md.md)] installation media. - Download the installer package from the [Microsoft® SQL Server® 2014 Semantic Language Statistics](http://go.microsoft.com/fwlink/?LinkID=296743) page on the [!INCLUDE[msCoName](../Token/msCoName_md.md)] Download Center.
- Run the SemanticLanguageDatabase.msi Windows installer package to extract the database and log file.You can optionally change the destination directory. By default, the installer extracts the files to a folder named Microsoft Semantic Language Database in the Program Files folder. The MSI file contains a compressed database file and log file.
- Move the extracted database file and log file to a suitable location in the file system.If you leave the files in their default location, it will not be possible to extract another copy of the database for another instance of SQL Server.
|
---|
When the semantic language statistics database is extracted, restricted permissions are assigned to the database file and log file in the default location in the file system. As a result, you may not have permission to attach the database if you leave it in the default location. If an error is raised when you try to attach the database, move the files, or check and fix file system permissions as appropriate. |
2. Attach the semantic language statistics database.
Attach the database to the instance of SQL Server by using Management Studio or by calling CREATE DATABASE (SQL Server Transact-SQL) with the FOR ATTACH syntax. For more information, see Database Detach and Attach (SQL Server).
Attach the database to the instance of SQL Server by using Management Studio or by calling CREATE DATABASE (SQL Server Transact-SQL) with the FOR ATTACH syntax. For more information, see Database Detach and Attach (SQL Server).
By default, the name of the database is semanticsdb. You can optionally give the database a different name when you attach it. You have to provide this name when you register the database in the subsequent step.
CREATE DATABASE semanticsdb ON ( FILENAME = 'C:\Microsoft Semantic Language Database\semanticsdb.mdf' ) LOG ON ( FILENAME = 'C:\Microsoft Semantic Language Database\semanticsdb_log.ldf' ) FOR ATTACH; GO
This code sample assumes that you have moved the database from its default location to a new location.
3. Register the semantic language statistics database.
Call the stored procedure sp_fulltext_semantic_register_language_statistics_db (Transact-SQL) and provide the name that you gave to the database when you attached it.
Call the stored procedure sp_fulltext_semantic_register_language_statistics_db (Transact-SQL) and provide the name that you gave to the database when you attached it.
EXEC sp_fulltext_semantic_register_language_statistics_db @dbname = N'semanticsdb'; GO
How To: Unregister, Detach, and Remove the Semantic Language Statistics Database
Unregister the semantic language statistics database.
Call the stored procedure sp_fulltext_semantic_unregister_language_statistics_db (Transact-SQL). You do not have to provide the name of the database since an instance can have only one semantic language statistics database.
Call the stored procedure sp_fulltext_semantic_unregister_language_statistics_db (Transact-SQL). You do not have to provide the name of the database since an instance can have only one semantic language statistics database.
EXEC sp_fulltext_semantic_unregister_language_statistics_db; GO
Detach the semantic language statistics database.
Call the stored procedure sp_detach_db (Transact-SQL) and provide the name of the database.
Call the stored procedure sp_detach_db (Transact-SQL) and provide the name of the database.
USE master; GO EXEC sp_detach_db @dbname = N'semanticsdb'; GO
Remove the semantic language statistics database.
After unregistering and detaching the database, you can simply delete the database file. There is no uninstall program and there is no entry in Programs and Features in the Control Panel.
After unregistering and detaching the database, you can simply delete the database file. There is no uninstall program and there is no entry in Programs and Features in the Control Panel.
Requirements and Restrictions for Installing and Removing the Semantic Language Statistics Database
- You can only attach and register one semantic language statistics database on an instance of SQL Server.Each instance of SQL Server on a single computer requires a separate physical copy of the semantic language statistics database. Attach one copy to each instance.
- You cannot detach a valid and registered semantic language statistics database and replace it with an arbitrary database that has the same name. Doing so will cause active or future index populations to fail.
- The semantic language statistics database is read-only. You cannot customize this database. If you alter the content of the database in any way, the results for future semantic indexing are indeterministic. To restore the original state of this data, you can drop the altered database, and download and attach a new and unaltered copy of the database.
- It is possible to detach or drop the semantic language statistics database. If there are any active indexing operations that have read locks on the database, then the detach or drop operation will fail or time out. This is consistent with existing behavior. After the database is removed, semantic indexing operations will fail.
How to: Install the Latest Filters for Microsoft Office and other Microsoft Document Types
This release of SQL Server installs the latest Microsoft word breakers and stemmers, but does not install the latest filters for Microsoft Office documents and other Microsoft document types. These filters are required for indexing documents created with recent versions of Microsoft Office and other Microsoft applications. To download the latest filters, see Microsoft Office 2010 Filter Packs.
ref:
https://msdn.microsoft.com/en-us/library/gg509116.aspx
https://msdn.microsoft.com/en-us/library/gg509085.aspx
https://svenaelterman.wordpress.com/2012/04/14/step-by-step-enabling-semantic-search-on-sql-server-2012/