Using GUIDs with EF Core
Introduction
GUIDs, also known as UUIDs, are a very useful data type that is present in all major databases. It allows having values that are magically generated and never repeat, making them ideal for usage across data sources, where a single source of data cannot be reused. in .NET, they are represented by the Guid type; in SQL Server, it is UNIQUEIDENTIFIER, in PostgreSQL it is UUID, in Oracle it is RAW(16), and in MySQL it is BINARY(16). EF Core supports all of them.
There are pros and cons regarding using GUIDs as database primary keys:
- They are obviously great for uniqueness
- They can be generated on the client
- They take a lot of space (16 bytes), compared to 4 bytes for integers or 8 for longs
- They can lead to very fragmented indexes
Let's see what we can do about that.
Primary Key Generation in EF Core
There are 3 ways to have EF Core generate GUIDs for the primary keys:
- Have the primary key property marked as database generated of type identity
- Specify a default SQL function for it
- Add a value generator for that key
Technically, you can also use an event or an interceptor to set the value prior to insertion, but that is a bad idea, as you'd have to traverse all the entities to insert that could possibly be related and set the values yourself.
Let me go through all these options one by one. First, let's assume we have this (very simple) entity class:
public class MyEntity
{
public Guid Id { get; set; }
public required string Name { get; set; }
}
Option #1 - Using Database Generated Identity
We need to mark property Id to be database generated and identity. For this, we can either use attributes:
public class MyEntity
{
[DatabaseGenerated(DatabaseGenerationOption.Identity)]
public Guid Id { get; set; }
Or the code mapping:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<MyEntity>().Property(x => x.Id).ValueGeneratedOnAdd();
}
ValueGeneratedOnAdd specifies that the value for this property should be generated at insertion time, but, again, does not say how. This has the advantage that it delegates to the underlying database provider the actual strategy for setting the primary key GUID value. Alas, we then have no control about what exactly that function is.
The [DatabaseGenerated] attribute takes a value of DatabaseGeneratedOption, in this case, Identity. It does the same as ValueGeneratedOnAdd.
Option #2 - Specifying a SQL Function
The second option is tied to an actual database. We specify a function as the default value for the key by calling HasDefaultValueSql. In the case of SQL Server, it could be this:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<MyEntity>().Property(x => x.Id).HasDefaultValueSql("NEWID()");
}
NEWID() is the original SQL Server function that generates a GUID/UNIQUEIDENTIFIER. In a moment I'll talk about an alternative to NEWID(), please bear with me.
Option #3 - Using a Value Generator
The final option is related to using a custom value generator for our key. A value generator is a concrete implementation of ValueGenerator, or, more commonly, ValueGenerator<T>. It can be set using one of the HasValueGenerator extension methods:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<MyEntity>().Property(x => x.Id).HasValueGenerator<GuidValueGenerator>();
}
The GuidValueGenerator just generates a Guid using NewGuid(). There is another class, StringValueGenerator, that also generates a Guid in the same way, but returns it as a string, for the cases when we want to have the primary key as a string-type column.
Dealing with Index Fragmentation
One big issue with using GUIDs as primary keys is that they ar not sequential: you may not have realised, but a GUID is just a big integer (16 bytes, 128 bits), which means, it can be compared. Database indexes depend on that. So, because they are generated "randomly" (not true, but for the sake of this discussion, good enough), in two consecutive generated GUID values, the latter can actually come before or after the previous one. Realising this, the good folks have come up with a solution: V7 GUIDs.
Note: the .NET V7 GUID generation does not solve the index fragmentation issue. The problem still occurs with .NET-generated GUIDs and will probably require database-specific solutions. See also this.
SQL Server implements sequential GUIDs with the NEWSEQUENTIALID() function. .NET, in version 9, added method Guid.CreateVersion7 method for the same purpose. And, EF Core also has a SequentialGuidValueGenerator, which uses it's own strategy, which does not depend on CreateVersion7 (it supports previous versions of .NET).
So, realising that it is a good thing to use sequential GUIDs for primary key generation, we can change our examples to be, for option #2:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<MyEntity>().Property(x => x.Id).HasDefaultValueSql("NEWSEQUENTIALID()");
}
And for option #3:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<MyEntity>().Property(x => x.Id).HasValueGenerator<SequentialGuidValueGenerator>();
}
Sadly, there is no way yet to get a sequential GUID as a string, meaning, there is no counterpart of StringValueGenerator for sequential GUIDs. There is an open ticket here, for this purpose, which is scheduled for .NET 10, but, for now, we can implement it as this:
public class SequentialGuidAsStringValueGenerator : ValueGenerator<string>
{
private static readonly SequentialGuidValueGenerator _guidValueGenerator = new();
public override bool GeneratesTemporaryValues => false;
public override string Next(EntityEntry entry) => _guidValueGenerator.Next(entry).ToString();
}
Essentially, it delegates the actual generation of the Guid to SequentialGuidValueGenerator, and then turns it into a string.
Conclusion
GUIDs are an essential part of data storage and it's hard to imagine how we could live without them. Most databases and frameworks know how to deal with them, but, as you can see, there are some caveats. Make an informed choice, if possible, either avoid GUIDs altogether or use sequential ones. I hope I was able to add some useful information here!
Comments
Post a Comment