SDU Tools: Token Set Similarity in SQL Server T-SQL
Our free SDU Tools for developers and DBAs, now includes a very large number of tools, with procedures, functions, and views. The TokenSetSimilarity function calculates token set similarity for two strings.
It answers the question: Do these two strings contain mostly the same words, even if the order, spacing, or repetition differs?
It is useful where word order varies, or extra or missing words are common. It can also help where character-level typos are less important than the presence of words.
The function is order-insensitive (which is good for names and titles) and is intentionally designed to be set-based (so duplicates don’t increase similarity). For token-weighting (downplaying common words like pty, ltd, inc), you might consider adding a stop-word table and filtering tokens before insert. It returns NULL on NULL or empty string input.
This function is computationally intensive. If you have large volumes of data to compare, consider modifying it to a table-valued function.
Find out more
You can see it in action in the main image above, and in the video here:
You can use our tools as a set or as a great example of how to write functions like these.
Access to SDU Tools is one of the benefits of being an SDU Insider, along with access to our other free tools and eBooks. Please just visit here for more info:
http://sdutools.sqldownunder.com
2026-04-04