Mikael Eliasson - Using the sqlbulkcopy to batch inserts

Sometimes you need to insert large amounts of data into your Sql Server. There is a hidden gem in the .net framework that is really good at this, the SqlBulkCopy Class.

So in my case I had a situation where I needed to insert thousands of TrackPoints into the database. I used EF wich can't to bulk inserts. Pretty much because EF will try to update the Identity of the inserted value and therefore need to run one insert per statement.

Here is how my repository looks:

public void InsertTrackPoints(IEnumerable<TrackPoint> trackPoints)
{

    using (var reader = new TrackPointsDataReader(trackPoints))
    {

        using (SqlConnection cn = new SqlConnection(this.connectionStrings.Data))
        {
            cn.Open();
            using (SqlBulkCopy copy = new SqlBulkCopy(cn))
            {
                copy.BatchSize = reader.RecordsAffected;
                copy.DestinationTableName = "TrackPoints";
                copy.NotifyAfter = 0;

                foreach (var i in Enumerable.Range(0, reader.FieldCount))
                {
                    copy.ColumnMappings.Add(i, i);
                }
                copy.WriteToServer(reader);
                copy.Close();
            }
        }  
    }
}

The most interesting thing here is the TrackPointsDataReader which inherits the DbDataReader. This is a pretty easy way to go from an in memory collection to something the SqlBulkCopy can consume. I should warn you that this might not be the best implementation. I think you can do something more generic and robust. I usually refactor something the second time I need to use almost the same code and so far it has only been once.

public class TrackPointsDataReader : DbDataReader
   {
       public IEnumerable<TrackPoint> Items { get; set; }
       public IEnumerator<TrackPoint> Enumerator { get; set; }

       private IList<Func<TrackPoint, object>> ordinalToValueMapping = new List<Func<TrackPoint, object>>
       {
           tp => tp.Time,
           tp => tp.TrackID,
           tp => tp.Long,
           tp => tp.Lat,
           tp => tp.Height,
           tp => tp.BPM,
           tp => tp.X,
           tp => tp.Y,
           tp => tp.IsSplit
       };

       public TrackPointsDataReader(IEnumerable<TrackPoint> items)
       {
           Items = items;
           Enumerator = items.GetEnumerator();
       }

       public override void Close()
       {
           this.Enumerator = null;
           this.Items = null;
       }

       public override int FieldCount
       {
           get { return ordinalToValueMapping.Count; }
       }

       public override bool HasRows
       {
           get { return this.Items != null && this.Items.Any(); }
       }

       public override bool IsClosed
       {
           get { return Enumerator == null; }
       }

       public override bool Read()
       {
           return this.Enumerator.MoveNext();
       }

       public override int RecordsAffected
       {
           get { return this.Items.Count(); }
       }

       public override object GetValue(int ordinal)
       {
           return this.ordinalToValueMapping[ordinal](this.Enumerator.Current);
       }
}

Another and partially different implementation is by Daniel Wertheim(@danielwertheim) here http://daniel.wertheim.se/2010/11/10/c-custom-datareader-for-sqlbulkcopy/

So what is the result of all this? In my test case I insert around 3600 TrackPoints and the MiniProfiler says 49 ms(On localhost though):

	duration (ms)	with children (ms)	from start (ms)
http://localhost:31084/UploadRoute/SaveTrack/71	28.9	439.1	+0.0
Build Connection	0.0	0.0	+2.4
Controller: UploadRouteController.SaveTrack	68.9	410.2	+28.4
Save	292.4	341.3	+97.4
Save TrackPoints	48.9	48.9	+389.8

The performance is really good. The MiniProfiler clearly shows that there are other parts I need to focus on more.

Note that the implementation can probably be made much more tolerant to column reorderings and that it is a bit 'rough' at the moment. Any improvements are very welcome!