People tend to feel they know everything until they stumble upon borders of their knowledge. Today I had a chance to test my understanding of late evaluation of IEnumerable in LINQ expressions.
Scott Allen recently asked what is wrong with following code, and if it is an obvious bug:
1: public static IEnumerable<int> Values(
2: this Random random,
3: int minValue, int maxValue)
4: {
5: while (true)
6: yield return random.Next(minValue, maxValue);
7: }
1: var values =
2: new Random().Values(1, 100)
3: .Distinct()
4: .OrderBy(value => value)
5: .Take(10)
6: .ToArray();
This functional piece of code is supposed to create a sorted array of 10 distinct random numbers between 1 and 100. And I’d say it does it in rather beautiful and self-explanatory way, except that it blocks forever trying to sort infinite sequence.
Let’s look closer into what happens here. Values() call on line 2 creates an iterator of infinite sequence of random numbers, but does not start to enumerate it. Distinct() call creates a new iterator that filters results of Values(). Same story with OrderBy() and Take().
But then ToArray() call starts to actually enumerate the sequence, and now we have a problem. When Take() asks underlying sequence for first element, OrderBy() cannot produce it without enumerating all elements. And Values() sequence is infinite, so it blocks forever.
As people pointed in comments, initial fix is simple (but there is more to it): we just change order of Take and OrderBy, so Take produces exactly 10 elements and OrderBy can enumerate them all.
Is this problem obvious? It is, as long as you know what to look for. You need to look for cases where you are trying to enumerate infinite sequence, i.e. call the method that fully enumerates the sequence without limiting it first. But it takes experience to spot it the same way it takes experience to spot potentially infinite loops in procedural code or potential NullReferenceExceptions in OO code.
How can we help people in spotting it? With better tools. Resharper already warns you about problems with null arguments sometimes using the following logic: if result of some function is marked as potential null with special attribute, you should check it for null before sending into argument that is marked as not accepting null.
I think it is possible to extend Resharper or another static code analysis tool with the notion of infinite sequences. We mark IEnumerables that result from function calls as potentially infinite, definitely finite, or neutral (same as the source enumeration). Then we mark functions that will enumerate whole sequence (such as ToArray, or OrderBy. Tool warns us if we do stupid things. Profit.
It still leaves another problem with initial snippet unaddressed: side effects in Values call. I do not want to go into it here, you can read detailed explanation about this with workarounds in Bart De Smet's blog.
I do not think that functional style is ‘too clever’ or anything, you just need to get used to it.
2 comments: