With EFUtilities 1.0.0 released it's time to tell the world about the features. This time we will look at getting data into SQL Server fast. Everything I'm about to show you is documented at https://github.com/MikaelEliasson/EntityFramework.Utilities. The install instructions can also be found there.
Basically this is what you need to know to install it:
*PM> Install-Package EFUtilities*
Keeping track of the weather in the world
Wouldn't it be nice to have a list of the temperature of a lot of cities in the world in your own little project? OK, maybe not. But it will serve as an example of a domain where you might need to update plenty of entities often and where it can't be done by a query.
There is an example file at http://openweathermap.org/current#bulk that contains 12176 cities. More exactly this file http://78.46.48.103/sample/hourly_14.json.gz
A row looks something like this (+ plenty of more information):
{"city":{"id":1283240,"name":"Kathmandu","country":"NP"},"data":[{"main":{"temp":291.63,"pressure":890.16}}]}
Reading the data from file
Note: You can download the full demo at https://github.com/MikaelEliasson/EFUtilitiesDemos
First, let's create our model.
public class Location
{
public int Id { get; set; }
public string Name { get; set; }
public string Country { get; set; }
public double Temp { get; set; }
public double Pressure { get; set; }
}
And our DbContext
public class Context : DbContext
{
public DbSet<Location> Locations { get; set; }
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
base.OnModelCreating(modelBuilder);
modelBuilder.Entity<Location>()
.HasKey(x => x.Id)
.Property(x => x.Id)
.HasDatabaseGeneratedOption(DatabaseGeneratedOption.None);
}
}
Then we need to read all that data, That can de done easily with Json.Net and the dynamic support it has.
NOTE: The reason I set the temp to 0 is for demo purposes to see that the demo works.
var filePath = "D:/Downloads/hourly_14.json/hourly_14.json";
var sw = new Stopwatch();
var lines = File.ReadLines(filePath);
var locations = lines.Select(l =>
{
dynamic x = JsonConvert.DeserializeObject(l);
return new Location
{
Id = x.city.id,
Name = x.city.name,
Country = x.city.country,
Temp = 0.0,
Pressure = x.data[0].main.pressure,
};
}).ToList();
Bulk inserting the data with Entity framework
using (var ctx = new Context())
{
EFBatchOperation.For(ctx, ctx.Locations).InsertAll(locations);
}
That's that. EFUtilities will read the metadata and handle the rest. It will use SqlBulkCopy under the covers to stream the data into Sql Server as fast as possible.
This takes about 160ms on my computer.
Inserted 12176 locations in 164 ms
Hottest: Protaras 0K
Bulk updating the data with Entity framework
Now, the thing with the weather is that it's changing. We need to update this often. EFUtilities can from the 1.0.0 version help you do this efficiently (more than 20x faster than normal EF would be in this case).
Here is the code for the update part. First we read the data again, then update it and finally just read some information to make sure it worked.
var locations2 = lines.Select(l =>
{
dynamic x = JsonConvert.DeserializeObject(l);
return new Location
{
Id = x.city.id,
Temp = x.data[0].main.temp,
Pressure = x.data[0].main.pressure,
};
}).ToList();
sw.Restart();
using (var ctx = new Context())
{
EFBatchOperation.For(ctx, ctx.Locations).UpdateAll(locations2, x => x.ColumnsToUpdate(l => l.Pressure, l => l.Temp));
}
sw.Stop();
using (var ctx = new Context())
{
var count = ctx.Locations.Count();
var top = ctx.Locations.OrderByDescending(l => l.Temp).First();
Console.WriteLine("Updated " + count + " locations in " + sw.ElapsedMilliseconds + " ms");
Console.WriteLine(string.Format("Hottest: {0} {1}K", top.Name, top.Temp));
}
The output for that part:
Updated 12176 locations in 104 ms
Hottest: Zhaoqing 310,93K
So it updated all 12k rows in 104ms. Pretty nice.
The update works by inserting all the data to update into a temp table. The it runs a command to update the data in the location table with the data from the temp table before it finally deletes the temp table.
x => x.ColumnsToUpdate(l => l.Pressure, l => l.Temp)
Tell it that it should only update Pressure and Temp. All other columns are ignored. That's why we don't overwrite the Name the city even though we didn't include that.
Deep dive: Under the covers
If we open upp SQL Profiler we will see the following commands
InsertAll
SQL:BatchCompleted insert bulk [dbo].[Locations] ([Id] Int, [Name] NVarChar(max), [Country] NVarChar(max), [Temp] Float, [Pressure] Float)
In case we had a larger sample there would have been batching. EFUtilities default to 15k entities / batch.
UpdateAll
SQL:BatchCompleted CREATE TABLE dbo.[temp_Locations_635673635823747650](
[Id] int,
[Temp] float,
[Pressure] float,
PRIMARY KEY ([Id]))
insert bulk [dbo].[temp_Locations_635673635823747650] ([Id] Int, [Temp] Float, [Pressure] Float)
SQL:BatchCompleted UPDATE [Locations]
SET
[Temp] = TEMP.[Temp],[Pressure] = TEMP.[Pressure]
FROM
[Locations] ORIG
INNER JOIN
[temp_Locations_635673635823747650] TEMP
ON
ORIG.[Id] = TEMP.[Id]
SQL:BatchCompleted DROP table dbo.[temp_Locations_635673635823747650]
As you might guess InsertAll is doing the heavy lifting here too. Notice how the number of columns are kept to a minimum.
Summary
With the help of EFUtilities we can do bulk inserts and bulk updates into SQL Server. The speed depends on a lot of factores like the size of the entity and the computer but reaching speeds of around 100k entities / second is fairly reasonable and it takes very little extra work.
The full demo can be found, forked and downloaded at: https://github.com/MikaelEliasson/EFUtilitiesDemos