Skip(), Take() and the poor performance it brings

Sindre Berge

February 4, 2022

LINQs Skip().Take() is a great when you want to export a subset of a data source. A typical use case is to add paging to data tables where users can determine how many elements they want to be displayed at the same.

var collection = Enumerable.Range(0, 5); // 0, 1, 2, 3, 4
var chunk = collection.Skip(2).Take(2).ToList();  // 2, 3

Skip() and Take() is implemented as extension methods on IEnumerable. Most of data structures that implements IEnumerable does not have indexed access, as that is limited to structures such as as List and Array. The implementation of Skip() cannot access elements at position N directly. Instead, it would use IEnumerable.GetEnumerator().MoveNext() subsequently to move to the first element that should be included in the result set. To access element N+1, N calls to MoveNext() will be made. The table below shows run times for splitting a dataset of N integers on my local computer. The runtime drastically increases with number of elements.

Number of elements	Skip().Take()
25	3ms
250	5ms
2500	12ms
25000	324ms
250000	18284ms

How can we improve? With indexed access, of course.

The following code snippets shows an extension method on IEnumerable that converts the source to an Array and utilizes direct access based on index. No more need to move through all preceding elements with GetEnumerator().MoveNext().

public static IEnumerable<T> ChunkOptimizedForLargeDatasets<T>
	(this IEnumerable<T> source, int skip, int take){

	var list = source as IList<T> ?? source.ToArray();
	var upperLimit = Math.Min(list.Count, skip + take);
    for (var i = skip; i < upperLimit; i++)
    {
    	yield return list[i];
    }
}

Let's compare the run times with our custom implementation.

Number of elements	Skip().Take()	Custom
25	3ms	2ms
250	5ms	1ms
2500	12ms	1ms
25000	324ms	2ms
250000	18284ms	16ms

When to use?

Given the low performance gain for smaller datasets it would only make sense to introduce a custom implementation to projects operating on very large datasets in memory. For those project, this implementation could be used where the initial cost of converting an IEnumerable to indexable data structures is low compared to the performance gain of index access.

Blog.

Skip(), Take() and the poor performance it brings

When to use?

More Posts

JavaScript Proxy

It looks like a developer designed it!

Let's pipe!