Text Querying with EF Core and SQL Server
Introduction
SQL Server provides a number of functions that can be used for non-exact text queries over text columns (CHAR, NCHAR, VARCHAR, NVARCHAR, TEXT, NTEXT). These include searching a piece of text contained inside another, using full text search, or using patterns, and computing a code from a text. Some of them are:
- CONTAINS: Searches for precise or fuzzy (less precise) matches to single words and phrases, words within a certain distance of one another, or weighted matches in SQL Server
- DIFFERENCE: This function returns an integer value measuring the difference between the SOUNDEX values of two different character expressions
- FREETEXT: Is a predicate used in the Transact-SQL WHERE clause of a Transact-SQL SELECT statement to perform a SQL Server full-text search on full-text indexed columns containing character-based data types
- LIKE: Determines whether a specific character string matches a specified pattern
- PATINDEX: Returns the starting position of the first occurrence of a pattern in a specified expression, or zero if the pattern is not found, on all valid text and character data types
- SOUNDEX: Returns a four-character (SOUNDEX) code to evaluate the similarity of two strings.
Some of them can be used right now in EF Core, as there are some extensions for them, while others need some extra work. And that's what we will be covering here!
Included Extensions
Some LINQ string expressions can be translated automatically to SQL: StartsWith, EndsWith, Contains all translate to LIKE, IsNullOrEmpty is slightly more complex.
Other extensions need to be included and called explicitly. Most included extensions live in the EF static class and are actually static methods in the DbFunctions static class, but there is one that operates directly on the String class. They are:
Contains
CONTAINS searches a piece of text inside another piece of text.
EF.Contains is the extension for CONTAINS, and a typical usage might be:
var namesContainingNet = ctx.People
.Where(x => EF.Contains(x.Name, "net"))
.ToList();
FreeText
The FREETEXT function makes use of the SQL Server full text search, which must be enabled. It is different to CONTAINS because it it also evaluate the meaning, using inflectional forms as well as thesaurus.
EF.FreeText is the extension for doing FREETEXT searches, and we use it like this:
var namesContainingNet = ctx.People
.Where(x => EF.FreeText(x.Name, "ard"))
.ToList();
PatIndex
PATINDEX returns the starting position of the first occurrence of a pattern in a text, or zero if the pattern is not found.
EF.PatIndex is the EF Core mapping of PATINDEX, and an example is:
var positionOfFirstNumber = ctx.People
.Select(x => EF.PatIndex(x.Name, "%[0-9]%"))
.ToList();
Like
LIKE's purpose is to search based on patterns, where special characters % and _ can be used instead of "anything" or "any letter", respectively, and [] and [^] can be used to include/exclude sets of characters.
The Like extension is different in that it is an extension over the String class. An example:
var namesStartingWithRicardo = ctx.People
.Where(x => x.Name.Like("ricardo%"))
.ToList();
Custom Extensions
Not long ago, I wrote a post about calling database functions with EF Core. It is very straightforward, and we'll leverage that to add two new extensions, Difference and Soundex, that will map to likewise-named functions in SQL Server.
Soundex
The SOUNDEX function returns the results of the processing of the SOUNDEX algorithm to a text column. This algorithm is used to calculate a phonetical hash of a text, so that similar sounding words have the same hash. For example, words:
- "Ricardo"
- "Rikardo"
- "Richard"
- "Ricard"
- "Rykardo"
All have the same SOUNDEX hash, "R263".
An extension could be written as:
public static class SqlExtensions
{
public static string Soundex(this string property) => throw new NotImplementedException();
}
And its registration, in OnModelCreating:
modelBuilder.HasDbFunction(typeof(SqlExtensions)
.GetMethod(nameof(SqlExtensions.Soundex), [typeof(string)]))
.IsBuiltIn(true)
.IsNullable()
.HasName("SOUNDEX");
A simple usage:
var soundexHashes = ctx.People
.Select(x => x.Name.Soundex())
.ToList();
Difference
The SQL Server DIFFERENCE function returns the "distance" between the SOUNDEX hashes of some text and another text. Picking up the previous example, the difference between:
- "Ricardo" and "Richard" is 4, the least possible difference as per the algorithm
- "Ricardo" and "Ricar" is 3
- "Ricardo" and "Ric" is 2
- "Ricardo" and "FooBar" is 1
- "Ricardo" and "" is 0, the highest possible difference
The extension code:
public static class SqlExtensions
{
public static string Difference(this string property, string anotherProperty) => throw new NotImplementedException();
}
Registration:
modelBuilder.HasDbFunction(typeof(SqlExtensions)
.GetMethod(nameof(SqlExtensions.Difference), [typeof(string), typeof(string)]))
.IsBuiltIn(true)
.IsNullable()
.HasName("DIFFERENCE");
And usage:
var nameDistance = ctx.People
.Select(x => x.Name.Difference("Ric"))
.ToList();
Note: of course, these extensions only work on SQL Server!
Conclusion
As you can see, there are many ways by which we can query text using EF Core, it's not just the standard equality, starts/ends with, or contains operators. I hope you find this useful!
Comments
Post a Comment