How to Seed Data to EF Core
Introduction
We all know EF Core Migrations. They have been around for quite some time, some people like them, others prefer alternatives (hello, Flyway, DbUp, FluentMigrator!). I'm not going to discuss that, but, instead, how to seed data to a database, either using EF Core migrations or not. This is, of course, for inserting initial/reference data that must always be present, for Microsoft's opinion on this, please see this page.
This post assumes that you know about migrations - how to create and apply them, at least, and that you have all it takes for it, including EF Core Tools and the Microsoft.EntityFrameworkCore.Design NuGet package.
For actually adding data, we have essentially four options:
- Explicit insertions
- Data-only migrations
- Entity configuration
- Context configuration (explicit seeding)
Let's see them all one by one.
Using Explicit Insertions
The first case is pretty obvious: we add data explicitly after we forced pending migrations to occur, that is, after calling Migrate/MigrateAsync on DbContext.Database. For example:
using var scope = app.Services.CreateScope();
using var ctx = scope.ServiceProvider.GetRequiredService<BlogContext>();
await ctx.Database.MigrateAsync();
if (!ctx.Blogs.Any())
{
ctx.Blog.Add(new Blog { Title = "Some BLog", Url = "https://some.blog" });
ctx.Blog.Add(new Blog { Title = "Another BLog", Url = "https://another.blog" });
await ctx.SaveChangesAsync();
}Here, of course, we can apply custom logic - in this case, I'm merely checking that the Blogs table has any data prior to inserting into it, but you can do things differently.
Another example, when we want to load data explicitly, maybe loading it from some external resource. Here is an example endpoint with Minimal API:
app.MapPost("/seed", async (BlogContext ctx, CancellationToken cancellationToken) => { //two options: //load data from some external resource, create Blog instances and add to ctx.Blogs //or just add a small, unchanging, set of reference data: if (!await ctx.Blogs.AnyAsync(cancellationToken)) { ctx.Blogs.Add(new Blog { Title = "Some BLog", Url = "https://some.blog" }); ctx.Blogs.Add(new Blog { Title = "Another BLog", Url = "https://another.blog" }); } await ctx.SaveChangesAsync(); return Results.Ok(); });
Using a Data-only Migration
For this option, we create a migration in any of the usual ways:
1 - From inside the Package Manager Console:
Add-Migration ReferenceData
2 - From the command line, using the ef global tool:
dotnet ef migrations add ReferenceData
A new migration is then created inside the Migrations folder of the target project, inheriting from Migration. If there are no schema changes, this migration will be empty, meaning, without any changes to apply:
public partial class ReferenceDataMigration : Migration
{
//
}Now, this is where we can chip in: all we have to do is override the Up and Down methods to add our custom data, in the form of SQL:
public partial class ReferenceDataMigration : Migration
{
protected override void Up(MigrationBuilder migrationBuilder)
{
if (migrationBuilder.IsSqlServer())
{
migrationBuilder.Sql("SET IDENTITY_INSERT dbo.Blogs ON");
}
migrationBuilder.Sql("INSERT INTO dbo.Blogs (Id, Title, Url) VALUES (1, 'Some Blog', 'https://some.blog')");
migrationBuilder.Sql("INSERT INTO dbo.Blogs (Id, Title, Url) VALUES (2, 'Another Blog', 'https://another.blog')");
if (migrationBuilder.IsSqlServer())
{
migrationBuilder.Sql("SET IDENTITY_INSERT dbo.Blogs OFF");
}
}
protected override void Down(MigrationBuilder migrationBuilder)
{
migrationBuilder.Sql("DELETE FROM dbo.Blogs WHERE Id IN (1, 2)");
}
}As you can see, I'm relying on the MigrationBuilder.Sql method for executing raw SQL, so we must know what we're doing: this is because inside a migration we do not have a DbContext; I'm also using IsSqlServer extension method to ensure the SQL I execute is running for SQL Server, this is because I'm calling SET IDENTITY_INSERT, which is a SQL Server-only thing. Up and Down methods should both be specified, Down should revert whatever Up does.
This approach has a couple advantages:
- The migration is only applied once, so we don't need to care about errors from duplicate records
- We can make all sorts of insertions
- We have to work with plain SQL, and be conscious of the target databases, primary keys, etc
Using Entity Configuration
Another option is to declare the data as part of the entity's configuration. We can achieve that with the HasData method:
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder
.Entity<Blog>()
.HasData(
new Blog { Title = "Some BLog", Url = "https://some.blog" },
new Blog { Title = "Another Blog", Url = "https://another.blog" }
);
}Of course, this can also be done inside an IEntityTypeConfiguration<Blog>'s Configure method, if we're using external configuration, and you can have data for all registered entities. The data provided through the HasData method is considered part of the schema, and goes into the snapshot, meaning, if we change it, EF Core migrations will detect that the schema has changed.
The advantage of this approach are:
- We operate at context level, not SQL
- The code is only executed at the first migration, when the database is created
As for possible disadvantages:
- If we ever want to change the initial data, we have to create a new migration
Using Context Configuration
Yet another option is to use UseSeeding/UseAsyncSeeding methods to create seeding callbacks when configuring the DbContext. This approach differs from the previous in that the data is not part of the schema, and we have full control over what queries we issue, and, most important, this method runs not just after all migrations have been applied, but also after the schema is created using EnsureCreated/EnsureCreatedAsync. Here is an example:
builder.Services.AddDbContext<BlogContext>(options =>
{
options
.UseSqlServer("...")
.UseAsyncSeeding(async (ctx, schemaUpdated, cancellationToken) =>
{
if (!await ctx.Set<Blog>().AnyAsync(cancellationToken))
{
ctx.Set<Blog>().Add(new Blog { Title = "Some BLog", Url = "https://some.blog" });
ctx.Set<Blog>().Add(new Blog { Title = "Another BLog", Url = "https://another.blog" });
await ctx.SaveChangesAsync(cancellationToken);
}
})
.UseSeeding((ctx, schemaUpdated) =>
{
if (!ctx.Set<Blog>().Any())
{
ctx.Set<Blog>().Add(new Blog { Title = "Some BLog", Url = "https://some.blog" });
ctx.Set<Blog>().Add(new Blog { Title = "Another BLog", Url = "https://another.blog" });
ctx.SaveChanges();
}
});
});The second parameter to the callback (schemaUpdated, in my example) passed to UseSeeding/UseAsyncSeeding is true if changes have been made to the schema, such as creating it or modifying it, or false otherwise.
An important note from the Microsoft documentation: UseSeeding is called from the EnsureCreated or Migrate, and UseAsyncSeeding is called from the EnsureCreatedAsync or MigrateAsync methods; when using this feature, it is recommended to implement both UseSeeding and UseAsyncSeeding methods using similar logic, even if the code using EF Core is asynchronous, because we never know which one will be called.
Conclusion
So, in a nutshell, this are some of the rules when choosing a seeding strategy:
- Use HasData for small, invariant lookup data you want tracked with schema changes and migrations
- Data-only migrations are good for when you want to be sure that your data is only ever applied once, but you need to write SQL, which can make it database-dependant
- Explicit insertions sometimes have their place, for example, when data is loaded from some external resource, or you are not using migrations, but, in general, other methods should be preferred
- Use UseSeeding/UseAsyncSeeding for everything else: conditional seed logic, big pre-defined datasets, dev/test demo data, generated values, or any seeding that must run independently of migrations
Comments
Post a Comment