The fastest way to reset the database with EF Core

Published on den 3 January 2022

In my opinion there are no better or more important tests than full database tests at controller level or similar. This kind of tests have great stability, are far more correct than if you do mocking and they are also more loosely coupled than mocked tests. The problem is that they can be slow though because unless you carefully designed your tests to not require an empty database from the beginning, you need to clear the database between each test and this is not easy to do quickly.

Here I'll show a way to do this with EF Core interceptors that looks very promising.

Other solutions

Before we look at this solution, let's just go through some other approaches and why they aren't good enough.

A) Delete / rebuild DB between tests

This approach has a place in this strategy too, but only at the start of each test run. This is simply too slow to do between each test.

B) Using transactions between tests

Fairly quick and scales OK but introduce test stability issues + the code that run in tests are different from how it runs in reality which we want to avoid as much as possible. We use this as Bokio but we noticed our tests leak transactions so we have to force a full GC once a while. I'm fairly sure it's because of the transactions. We also had a bunch of other issues with this and can't even run all tests in this mode, in particular when we use transactions in the code we test. This is still a pretty good approach.

C) Using a script to clear all tables between tests

A fairly simple approach and it starts of being really fast. You just need to call this script between tests. The problem is that the time this takes is proportional to the number of tables in the datase. So this will get worse and worse. At Bokio we started out with this but switched to using transactions when we got too many tables to use this efficiently.

You would run code like this between each test

var sql = File.ReadAllText("cleardb.sql");
using var db = Context.New();
db.Database.ExecuteSqlRaw(sql);

D) Using restore points on the DB

I just did research into this. From my small tests the restore is too slow to make this feasible.

A quick and easy solution using EF Core Interceptors

You can read more about Interceptors here but I'll show the important parts below. First off all we need to look at the over all process here. This is the parts we will need.

  1. Recreate the DB once before each test run.
  2. Before each test we need to clear the database if there is any changes.

Both these require us to run some code. I almost always use test helper like this that I call in each database test.

public static class TC
{
    static TC()
    {
        // We always use a factory for our context. We set it up here
        Context.New = () => new Context("Data Source=.\\;Initial Catalog=myproject-test;Integrated Security=True");

        //This code handles 1) above
        using var db = Context.New();
        db.Database.EnsureDeleted();
        db.Database.EnsureCreated();
    }
    public static void New()
    {
        //Clear DB here. This should handle 2) above.
    }
}

And in tests that touch the database I will start the test with.

[Fact]
public void DummyTest()
{
    TC.New();
    //Do database work here
}

Note: that if you have pure unit tests for logic you can and should skip calling TC.New() to get faster execution of those tests.

This part is just the skeleton and would be used for all the approaches above. It would just be the code in New() that changes based on which approach.

Using an interceptor to keep track of what tables that changed and only clear them

This new approach is actually C) but optimized so that the slowdown is proportional to the number of tables we touch in the test rather than in the whole DB. If most of your tests touch most tables in your DB, this approach will not help you.

Let's start by implementing the interceptor we need. This is using EF Core 6.

using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Diagnostics;

namespace EFCoreTesting
{
    /// <summary>
    /// Keep track of which tables we need to clear for the next test run. 
    /// So we only care about additions and it's fine if we clear a bit extra which is why we ignore if there was a failure for example or if the data was later cleared.
    /// </summary>
    public class TrackChangesInterceptor : ISaveChangesInterceptor
    {
        private HashSet<string> affectedTables = new HashSet<string>();

        public IReadOnlyCollection<string> GetAffectedTables() { return affectedTables.ToList().AsReadOnly(); }

        public void SaveChangesFailed(DbContextErrorEventData eventData) { }
        public Task SaveChangesFailedAsync(DbContextErrorEventData eventData, CancellationToken cancellationToken = default) { return Task.CompletedTask; }
        public int SavedChanges(SaveChangesCompletedEventData eventData, int result) { return result; }
        public async ValueTask<int> SavedChangesAsync(SaveChangesCompletedEventData eventData, int result, CancellationToken cancellationToken = default) { return result; }

        public InterceptionResult<int> SavingChanges(DbContextEventData eventData, InterceptionResult<int> result)
        {
            LogAdditions(eventData.Context);
            return result;
        }

        public async ValueTask<InterceptionResult<int>> SavingChangesAsync(DbContextEventData eventData, InterceptionResult<int> result, CancellationToken cancellationToken = default)
        {
            LogAdditions(eventData.Context);
            return result;
        }

        private void LogAdditions(DbContext context)
        {
            context.ChangeTracker.DetectChanges(); // Not sure we need to call this, but should be cheap enough in test and will be more reliable.

            foreach (var entry in context.ChangeTracker.Entries().Where(e => e.State == EntityState.Added))
            {
                affectedTables.Add(entry.Metadata.GetTableName());
            }
        }
    }
}

As you can see we only do work in SavingChanges and SavingChangesAsync. I would prefer to do it on SavedChanges but the ChangeTracker should already have it's state cleared here so we need to inspect it before we save changes. Besides that there is nothing special with this class. We just get the table names affected by all our changes.

To use this we will have to change TC a bit. Note that we also added a 2nd constructor to Context to take the interceptor as input. This is because we only want this interceptor when we run tests so we can't only register it in OnConfiguring like we normally would.

public static class TC
{
    private static TrackChangesInterceptor trackChangesInterceptor;
    static TC()
    {
        Context.New = () => new Context("Data Source=.\\;Initial Catalog=myproject-test;Integrated Security=True", trackChangesInterceptor);
        using var db = Context.New();
        db.Database.EnsureDeleted();
        db.Database.EnsureCreated();
    }
    public static void New()
    {
        if (trackChangesInterceptor != null && trackChangesInterceptor.GetAffectedTables().Any())
        {
            using var db = Context.New();
            var sql = File.ReadAllText("cleardbpartial.sql");
            var tableNames = trackChangesInterceptor.GetAffectedTables();
            sql = sql.Replace("###TABLES###", string.Join(",", tableNames.Select(n => $"'{n}'")));
            db.Database.ExecuteSqlRaw(sql);
        }
        trackChangesInterceptor = new TrackChangesInterceptor();
    }
}

Also notice that we changed the script to cleardbpartial.sql. This script is slightly updated to allow us to filter the tables we want to clear.

This is really all there is to this approach.

The performance

Profiling tests are always a bit wonky IMO but this seems to run in a few MS per test. A benefit is also that if you didn't call SaveChanges in the previous test it will have no overhead at all.

Shows multiple tests running in 1-40ms

Note: We have yet to implement this on our full test suite as we need to upgrade EF first. There is always a risk that it won't scale as well as I think.

When this approach fails

This will mostly work for us but I can see a few cases where it will not work well

  • If you insert using raw sql
  • If you insert using any of the bulk copy solutions
  • If you insert using stored procedures
  • If you insert using views

It's possible that you can adapt this to handle stored procedures and views but the others will be harder. In our case we only have a few tests that use a bulk copy solution. Our approach would be to combine this optimisation with the full clearing of the DB in option C) above.

Combined solution to work with tests that insert in another way

public static class TC
{
    private static TrackChangesInterceptor trackChangesInterceptor;
    private static bool previousTestRequireFullReset = false;
    static TC()
    {
        Context.New = () => new Context("Data Source=.\\;Initial Catalog=myproject-test;Integrated Security=True", trackChangesInterceptor);
        using var db = Context.New();
        db.Database.EnsureDeleted();
        db.Database.EnsureCreated();
    }

    public static void New(bool requireFullReset = false)
    {
        if (previousTestRequireFullReset)
        {
            using var db = Context.New();
            var sql = File.ReadAllText("cleardb.sql");
            db.Database.ExecuteSqlRaw(sql);
        }
        else if (trackChangesInterceptor != null && trackChangesInterceptor.GetAffectedTables().Any())
        {
            using var db = Context.New();
            var sql = File.ReadAllText("cleardbpartial.sql");
            var tableNames = trackChangesInterceptor.GetAffectedTables();
            sql = sql.Replace("###TABLES###", string.Join(",", tableNames.Select(n => $"'{n}'")));
            db.Database.ExecuteSqlRaw(sql);
        }

        previousTestRequireFullReset = requireFullReset;
        trackChangesInterceptor = new TrackChangesInterceptor();
    }
}

In the test where we do any of inserts that doesn't get tracked we would just do TC.New(requireFullReset: true);.

Summary

While this approach won't work for everyone as it is right now it can be a valuable tool when your test suite start to become slower. In particular those that use views and stored procedures to a large extent will have problems with this implementation. I do think it's possible to support these cases but I don't ever use any of them so I didn't look into it.

In the future I will likely use this as my default method and in case I have too much code that insert data outside SaveChanges I'll chose between the script to clear all tables and using transaction rollback.

Then feel free to it or if you have any comments or questions mention @MikaelEliasson on Twitter.

CTO and co-founder at Bokio with a background as an elite athlete. Still doing a lot of sports but more for fun.

#development, #web, #orienteering, #running, #cycling, #boardgames, #personaldevelopment