Working with Strings in .NET

Introduction

So, by now you should know all there is to about string concatenation in .NET and the StringBuilder class and the string.Concat method. I'm not going to cover that here, but, instead, some less known functionality around strings that can (also) bring some performance advantages.

Searching

When it comes to searching for values ("needles") inside a string ("haystack"), we have a couple options:

Since .NET 8, we have another one: the SearchValues<T> class. This one is optimised for searching for one or multiple strings (exact values) inside of a string. During its creation, it parses the string and builds a memory model that is suitable for searching. With time, using SearchValues<T> instead of the built-ins (IndexOfIndexOfAnyContains) will yield much faster performance. Here is how to use it:

var names = SearchValues.Create([ "fox", "dog" ], StringComparison.OrdinalIgnoreCase); //SearchValues<char>
var text = "The quick brown fox jumps over the lazy dog.";
var containsAny = text.ContainsAny(names); //true
var containsDog = names.Contains("dog"); //true

Notice the comparisonType parameter on the SearchValues.Create method: it allows us to pass one of the SearchComparison values, the most obvious ones are InvariantCultureIgnoreCase and OrdinalIgnoreCase, for ignoring the casing on the strings. ContainsAny takes a collection of arguments, and Contains, only one.

Another usage, for finding the index of the first needle found on the haystack:

var indexOfFirstName = text
    .AsSpan()
    .IndexOfAny(names);

The IndexOfAny method will return the index for the first needle that is found on the haystack, -1 if none was found.

Of course, on a simple usage you likely won't see any benefits, but try it many times, with different needles, and you'll probably start to see them.

Formatting

We've all used string formatting using string.Format. It's a very useful method that allows us to pass index-based tokens (with optional formatting information) and arguments that get then replaced into a target string. It is called composite formatting:

var formattedString = string.Format("Today is {0}", DateTime.Today);

String interpolation was introduced some time ago, and it allows us to build strings from a format string with the arguments already embedded in it, which has the advantage that we cannot mix the indexes, you see exactly what will be added. The class that is responsible for it behind the scenes is FormattableString and here is a simple example:

var formattable = (FormattableString) $"Today is {DateTime.Today}";
var formattedString = $"Today is {DateTime.Today}"; //string
var format = formattable.Format; //"Today is {0}"
var argumentCount = formattable.ArgumentCount; //1

If we don't cast to FormattableString, the result will be a string, which means we lost the Format and ArgumentCount properties, which we may not need anyway. No way to access the actual arguments. ToString returns the formatted string.

A new way to do formatting was introduced recently, by means of the CompositeFormat class. This class takes a format string and internally builds a model that is suitable for formatting actual arguments, which makes it a replacement for string.Format. We can use it like this:

var format = CompositeFormat.Parse("Today is {0}");
var text = string.Format(CultureInfo.InvariantCulture, format, DateTime.Today.DayOfWeek);

Again, this is related to performance: if we run this enough times, the performance gain should be noticeable.

Sub Strings

One recent change with significant importance in recent .NET versions was the introduction of spans and avoiding allocations. A string can now be turned into an immutable span of chars (ReadOnlySpan<T>), and we can slice it to get sub strings with zero memory allocation.

We turn a string into a span of characters using AsSpan and we can get a sub string of it using Slice:

var text = "Hello, World!";
var textSpan = text.AsSpan();
var hello = textSpan.Slice(0, 5);
var world = textSpan.Slice(7, 5);

The behaviour is just the same as Substring, just with less (zero) allocations. Also, with many calls, the gain in memory consumption (at least) should be evident.

Interning Strings

String interning is a process by which .NET stores identical strings in an internal memory pool. If a string passed to the Intern method is already in the pool, then it is returned, otherwise, it is added and a reference to it returned. It has the advantage that no two identical strings are stored - only one instance - with the disadvantage of a slightly longer startup time. Constant strings are automatically interned and we can intern any strings that we construct at runtime.

Have a look at these examples:

var text1 = string.Intern(string.Concat("Hello", ", ", "World!"));
var text2 = "Hello, World!";
var text3 = string.Concat("Hello", ", ", "World!");

var areSameReference1and3 = object.ReferenceEquals(text1, text3); //false
var areEqual1and3 = text1 == text3; //true
var areSameReference1and2 = object.ReferenceEquals(text1, text2); //true
var areEqual1and2 = text1 == text2; //true

So, as we can see:

  • Strings, either interned or otherwise, are always compared by their contents when using the == operator or the Equals or Compare methods
  • Strings that are created (ConcatToString, +/+= operators, etc) are different in-memory objects
  • Two identical interned strings are always the object

So, does this mean that we should always intern strings? Not quite, there is a tradeoff between memory usage and speed. In programs that create a number of strings, you should benchmark and test to see what are the gains.

Conclusion

All of these are performance or memory-related, so, they may or may not be useful, depending on our use case. Microsoft is making a huge push for perfomance and this includes all of this and more. We should all be aware of these improvements and new APIs and start using them, even if performance or memory consumption it not (immediately) an issue for us. These two should be, in my opinion, always on our minds when implementing applications and services.

As usual, hope you find these useful, and let me know about your thoughts!

Comments

Popular posts from this blog

C# Magical Syntax

OpenTelemetry with ASP.NET Core

ASP.NET Core Middleware