Modifying the Thesaurus in Full Text Search in SQL Server 2008
While I was prepping my full-text search session for TechEd Australia today, I decided to modify the thesaurus. I found the discussion in books online a bit confusing regarding the locations of the files involved.
What threw me was it said the default location of the thesaurus files was:
SQL_Server_install_path\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\FTDATA\
I went there and couldn’t find anything. I worked out that for ENU language, my file should be called tsENU.xml. I searched for that and found it in:
C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn\Templates
I modified that and got nowhere. Turns out that as the name implies, this is where the templates for the files live. I then tried creating the missing folder under the stated default location and also got nowhere. Turns out that it’s actually related to your data directory location. Where I needed to put it was:
C:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\FTData
because I’d used C:\SQLData as my data folder. So it’s not the SQL Server install path, it’s the data file folder. Anyway, that file contains the following XML:
<XML ID="Microsoft Search Thesaurus"> <!-- Commented out <thesaurus xmlns="x-schema:tsSchema.xml"> <diacritics\_sensitive>0</diacritics\_sensitive> <expansion> <sub>Internet Explorer</sub> <sub>IE</sub> <sub>IE5</sub> </expansion> <replacement> <pat>NT5</pat> <pat>W2K</pat> <sub>Windows 2000</sub> </replacement> <expansion> <sub>run</sub> <sub>jog</sub> </expansion> </thesaurus> --> </XML>
The template file has three examples commented out. The diacritics entry relates to whether or not accents matter ie: in words like cafe. Expansion entries are used to provide synonyms. In this case, if you search for any one of “Internet Explorer”, “IE” and “IE5”, it will search for all of them. Replacement is used to force an alternate lookup. In the example above, if you search for either “NT5” or “W2K”, it will search for “Windows 2000” instead.
After modifying the correct file, I then caused it to be loaded by specifying:
EXEC sys.sp_fulltext_load_thesaurus_file 1033
and then I was in working fine with queries like:
SELECT MessageID,Description
FROM dbo.Messages
WHERE CONTAINS(Description,‘FORMSOF(THESAURUS,punter)’)
ORDER BY MessageID;
Hope this helps someone get started with it.
2008-08-13