When using Linq you need to be aware of some performance pitfalls. Here we look at Skip() and it's hidden cost.
I was chunking up some work using this algorithm:
for (int i = 0; i < ordered.Length; i += chunkSize)
{
//Create file
foreach (var item in ordered.Skip(i).Take(chunkSize))
{
//Write line to file
}
//Run sql to load file into mysql
}
It got me thinking if that Skip really was a good idea. In this case my collection was a List
Now it could be that Linq had someway optimized Skip for certain collections like List and Array but the only way to find out is to measure so that's what we do.
class Program
{
private const int chunkSize = 10000;
static void Main(string[] args)
{
var watch = new Stopwatch();
var collection = Enumerable.Range(0, 9999999).ToList();
var collectionAsArray = collection.ToArray();
var collectionAsHashSet = new HashSet<int>(collection);
watch.Start();
long items = 0;
for (int i = 0; i < collection.Count;)
{
var limit = Math.Min(i + chunkSize, collection.Count);
for (; i < limit; i++)
{
items += collection[i];
}
}
watch.Stop();
Console.WriteLine("Loops {0} | {1}", watch.ElapsedMilliseconds, items);
watch.Reset();
Console.WriteLine("Test List");
RunSkipTest(collection, watch);
Console.WriteLine("Test Array");
RunSkipTest(collectionAsArray, watch);
Console.WriteLine("Test HashSet");
RunSkipTest(collectionAsHashSet, watch);
}
private static void RunSkipTest(ICollection<int> collection, Stopwatch watch)
{
watch.Start();
long items = 0;
for (int i = 0; i < collection.Count; i += chunkSize)
{
foreach (var item in collection.Skip(i).Take(chunkSize))
{
items += item;
}
}
watch.Stop();
Console.WriteLine("Skips {0} | {1}", watch.ElapsedMilliseconds, items);
watch.Reset();
}
}
Some explanations. The for loops in Main is what I would use before Linq came around. *items *is really only a way to confirm that the algorithms does the same thing and to ensure the compiler doesn't remove the loop.
The result:
**(The format is: xxx ms | verfication value)**
Test with 99999
Loops 1 | 4999850001
Test List
Skips 18 | 4999850001
Test Array
Skips 10 | 4999850001
Test HashSet
Skips 13 | 4999850001
Test with 300000 items
Loops 3 | 44999850000
Test List
Skips 63 | 44999850000
Test Array
Skips 56 | 44999850000
Test HashSet
Skips 89 | 44999850000
Test with 9999999 items
Loops 122 | 49999985000001
Test List
Skips 51884 | 49999985000001
Test Array
Skips 50442 | 49999985000001
Test HashSet
Skips 85992 | 49999985000001
As you can see there is a noticable difference once the collection grows. The conclusion is that even though it would be possible Linq is not optimizing Skip for any collection. If you need performance you need to do the indexing yourself (unless the collection is small where it doesn't matter).