Efutilities, bulk insert and update

Published on den 15 May 2015

With EFUtilities 1.0.0 released it's time to tell the world about the features. This time we will look at getting data into SQL Server fast. Everything I'm about to show you is documented at https://github.com/MikaelEliasson/EntityFramework.Utilities. The install instructions can also be found there.

Basically this is what you need to know to install it:

*PM> Install-Package EFUtilities*

Keeping track of the weather in the world

Wouldn't it be nice to have a list of the temperature of a lot of cities in the world in your own little project? OK, maybe not. But it will serve as an example of a domain where you might need to update plenty of entities often and where it can't be done by a query.

There is an example file at http://openweathermap.org/current#bulk that contains 12176 cities. More exactly this file http://78.46.48.103/sample/hourly_14.json.gz

A row looks something like this (+ plenty of more information):

{"city":{"id":1283240,"name":"Kathmandu","country":"NP"},"data":[{"main":{"temp":291.63,"pressure":890.16}}]}

Reading the data from file

Note: You can download the full demo at https://github.com/MikaelEliasson/EFUtilitiesDemos

First, let's create our model.

public class Location
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Country { get; set; }
    public double Temp { get; set; }
    public double Pressure { get; set; }
}

And our DbContext

public class Context : DbContext
{
    public DbSet<Location> Locations { get; set; }

    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        base.OnModelCreating(modelBuilder);

        modelBuilder.Entity<Location>()
            .HasKey(x => x.Id)
            .Property(x => x.Id)
            .HasDatabaseGeneratedOption(DatabaseGeneratedOption.None);
    }   
}

Then we need to read all that data, That can de done easily with Json.Net and the dynamic support it has.

NOTE: The reason I set the temp to 0 is for demo purposes to see that the demo works.

var filePath = "D:/Downloads/hourly_14.json/hourly_14.json";
var sw = new Stopwatch();
var lines = File.ReadLines(filePath);
var locations = lines.Select(l =>
{
    dynamic x = JsonConvert.DeserializeObject(l);

    return new Location
    {
        Id = x.city.id,
        Name = x.city.name,
        Country = x.city.country,
        Temp = 0.0,
        Pressure = x.data[0].main.pressure,
    };
}).ToList();

Bulk inserting the data with Entity framework

using (var ctx = new Context())
{
    EFBatchOperation.For(ctx, ctx.Locations).InsertAll(locations);
}

That's that. EFUtilities will read the metadata and handle the rest. It will use SqlBulkCopy under the covers to stream the data into Sql Server as fast as possible. 

This takes about 160ms on my computer.

Inserted 12176 locations in 164 ms

Hottest: Protaras 0K

Bulk updating the data with Entity framework

Now, the thing with the weather is that it's changing. We need to update this often. EFUtilities can from the 1.0.0 version help you do this efficiently (more than 20x faster than normal EF would be in this case). 

Here is the code for the update part. First we read the data again, then update it and finally just read some information to make sure it worked.

var locations2 = lines.Select(l =>
{
    dynamic x = JsonConvert.DeserializeObject(l);

    return new Location
    {
        Id = x.city.id,
        Temp = x.data[0].main.temp,
        Pressure = x.data[0].main.pressure,
    };
}).ToList();

sw.Restart();
using (var ctx = new Context())
{
    EFBatchOperation.For(ctx, ctx.Locations).UpdateAll(locations2, x => x.ColumnsToUpdate(l => l.Pressure, l => l.Temp));
}
sw.Stop();
using (var ctx = new Context())
{
    var count = ctx.Locations.Count();
    var top = ctx.Locations.OrderByDescending(l => l.Temp).First();
    Console.WriteLine("Updated " + count + " locations in " + sw.ElapsedMilliseconds + " ms");
    Console.WriteLine(string.Format("Hottest: {0} {1}K", top.Name, top.Temp));
}

The output for that part:

Updated 12176 locations in 104 ms

Hottest: Zhaoqing 310,93K

So it updated all 12k rows in 104ms. Pretty nice.

The update works by inserting all the data to update into a temp table. The it runs a command to update the data in the location table with the data from the temp table before it finally deletes the temp table.

x => x.ColumnsToUpdate(l => l.Pressure, l => l.Temp)

Tell it that it should only update Pressure and Temp. All other columns are ignored. That's why we don't overwrite the Name the city even though we didn't include that.

Deep dive: Under the covers

If we open upp SQL Profiler we will see the following commands

InsertAll

SQL:BatchCompleted    insert bulk [dbo].[Locations] ([Id] Int, [Name] NVarChar(max), [Country] NVarChar(max), [Temp] Float, [Pressure] Float)

In case we had a larger sample there would have been batching. EFUtilities default to 15k entities / batch. 

UpdateAll

SQL:BatchCompleted    CREATE TABLE dbo.[temp_Locations_635673635823747650](
                        [Id] int, 
                        [Temp] float, 
                        [Pressure] float, 
                        PRIMARY KEY ([Id]))
insert bulk             [dbo].[temp_Locations_635673635823747650] ([Id] Int, [Temp] Float, [Pressure] Float)
SQL:BatchCompleted    UPDATE [Locations]
                        SET
                            [Temp] = TEMP.[Temp],[Pressure] = TEMP.[Pressure]
                        FROM
                            [Locations] ORIG
                        INNER JOIN
                             [temp_Locations_635673635823747650] TEMP
                        ON 
                            ORIG.[Id] = TEMP.[Id]
SQL:BatchCompleted    DROP table dbo.[temp_Locations_635673635823747650]

As you might guess InsertAll is doing the heavy lifting here too. Notice how the number of columns are kept to a minimum.  

Summary

With the help of EFUtilities we can do bulk inserts and bulk updates into SQL Server. The speed depends on a lot of factores like the size of the entity and the computer but reaching speeds of around 100k entities / second is fairly reasonable and it takes very little extra work. 

The full demo can be found, forked and downloaded at: https://github.com/MikaelEliasson/EFUtilitiesDemos

Then feel free to it or if you have any comments or questions mention @MikaelEliasson on Twitter.

CTO and co-founder at Bokio with a background as an elite athlete. Still doing a lot of sports but more for fun.

#development, #web, #orienteering, #running, #cycling, #boardgames, #personaldevelopment