SharpDevelop Community

Get your problems solved!
Welcome to SharpDevelop Community Sign in | Join | Help
in Search

Daniel Grunwald

ILSpy - yield return

This weekend, I worked on decompiling 'yield return' statements. The C# compiler is performing quite a bit magic to make 'yield return' work, and the decompiler must be aware of all this magic and be able to revert it.

After two days of hard work, I'm happy to announce that ILSpy (starting with 1.0.0.528) can now decompile enumerators.

Grab the new ILSpy build while it's hot, or just look at the obligatory screenshot:

If you want to understand the code generated by the compiler, you can disable this new feature in the new 'View > Options' dialog. Or you could read Jon Skeet's great article on this topic: Iterator block implementation details: auto-generated state machines.

Here's the generated MoveNext() code for the SelectMany implementation:

    private bool MoveNext()
    {
        bool flag;
        try {
            int i = this.$1__state;
            if (i == 0) {
                this.$1__state = -1;
                this.$7__wrap17 = this.source.GetEnumerator();
                this.$1__state = 1;
                goto IL_B0;
            }
            if (!3) {
                goto IL_C6;
            }
            this.$1__state = 2;
            IL_9D:
            if (this.$7__wrap19.MoveNext()) {
                this.<subElement>5__16 = this.$7__wrap19.Current;
                this.$2__current = this.<subElement>5__16;
                this.$1__state = 3;
                flag = true;
                return flag;
            }
            this.$m__Finally1a();
            IL_B0:
            if (this.$7__wrap17.MoveNext()) {
                this.<element>5__15 = this.$7__wrap17.Current;
                this.$7__wrap19 = this.selector.Invoke(this.<element>5__15).GetEnumerator();
                this.$1__state = 2;
                goto IL_9D;
            }
            this.$m__Finally18();
            IL_C6:
            flag = false;
        } catch { // in IL, this is a try-fault block, but C# doesn't have those...
            this.Dispose();
            throw;
        }
        return flag;
    }

Now how can one map the generated code back to the original C#? The general idea is simple (the devil is in the details...):

Every time the code assigns to this.current, this.state and then returns, we transform that into a "yield return" instruction and a "goto" instruction to the label belonging to the new state. Because we run this transformation very early in the decompiler's pipeline (prior to any control flow analysis), the following steps will pick up on the "goto"s and be able to detect loops and simplify the "goto"s away.

However, how do we determine the label that is responsible for (to give an example) state 3? The answer is 'IL_9D', but figuring this out is non-trivial: the C# compiler makes use of if-statements (to be exact: beq and bne.un), switch statements, and mixtures of both. Moreover, switch statements are usually preceded by subtractions, as the IL switch only deals with cases 0 to n-1. The ILAst for the beginning of the above MoveNext() method looks like this:

    stloc(var_1_06, ldfld(Enumerable/<SelectManyIterator>d__14`2<TSource, TResult>::<>1__state, ldarg(0)))
    brtrue(IL_17, ceq(ldloc(var_1_06), ldc.i4(0)))
    brtrue(IL_96, ceq(ldloc(var_1_06), ldc.i4(3)))
    br(IL_C6)
    IL_17:
    stfld(Enumerable/<SelectManyIterator>d__14`2<TSource, TResult>::<>1__state, ldarg(0), ldc.i4(-1))
    ...

If you haven't been following the previous posts: the ILAst is an intermediate data structure used in the decompiler. It represents an IL program using nested expressions, thus eliminating the IL evaluation stack. At the point where the "yield return" transformation runs, opcodes have already been simplified, so "beq" now is "brtrue(ceq)".

To determine where MoveNext() will branch to in a given state, ILSpy will simulate the execution of the beginning of the MoveNext() method. It does this symbolically: "this.$1__state" evaluates to (state+0). In general, "values" in this symbolic execution are (x), (state+x), (state==x) and (this), where x is an int32. The execution will go linearly through the ILAst; it works on the assumption that there are no backward branches. Execution stops once it encounters a statement it doesn't understand - usually, this is the assignment "this.$1__state = -1;", which indicates that the enumerator started executing. For each statement in the ILAst, the range of states that can lead to that value is stored.

So the result of the analysis is the following table:
    IL_17: state 0 to 0
    IL_96: state 3 to 3
    IL_C6: state int.MinValue to -1; 1 to 2, and 4 to int.MaxValue

This allows us to reconstruct the control flow in the MoveNext() method. However, one piece of the puzzle is still missing: the try-finally blocks. The C# compiler doesn't compile any of those into the MoveNext() method. Instead, it puts each finally block into its own method, and calls them in the MoveNext() method only on the regular exit of the try blocks. In case of an exception, the try-fault handler simply calls Dispose(), which takes care of calling the finally blocks depending on the current state:

    void System.IDisposable.Dispose()
    {
        switch (this.$1__state) {
            case 1:
            case 2:
            case 3{
                try {
                    switch (this.$1__state)
                    {
                        case 2:
                        case 3:
                        {
                            try {
                            } finally {
                                this.$m__Finally1a();
                            }
                            break;
                        }
                    }
                } finally {
                    this.$m__Finally18();
                }
                return;
            }
        }
    }
    private void $m__Finally18()
    {
        this.$1__state = -1;
        if (this.$7__wrap17 !null) {
            this.$7__wrap17.Dispose();
        }
    }
    private void $m__Finally1a()
    {
        this.$1__state = 1;
        if (this.$7__wrap19 !null) {
            this.$7__wrap19.Dispose();
        }
    }

We analyze the Dispose() method using the same symbolic execution that we used for the jump code at the beginning of MoveNext(). This tells us that $m__Finally1a is called in states 2 and 3; and that $m__Finally18 is called in states 1 to 3. Using this information, we can reconstruct the try-finally blocks within MoveNext(). The remaining parts of the ILAst pipeline then take care to replace the "goto"s with loop and if structures. Finally, the C# pattern transformations take care of translating the code back to the foreach pattern, resulting in the highly readable code in the screenshot at the beginning of this post.

 

Published Mar 06 2011, 08:22 PM by DanielGrunwald
Filed under:

Comments

 

NoMoDo said:

Hi Daniel,

Really interesting stuff! Thanks for another captivating blog post!

Just out of curiousity - is there any meaningful reason why you don't do change "selector.Invoke(current)" to "selector(current") in the code in the screenshot, or have you simply not gotten around to it yet?

March 7, 2011 9:42 AM
 

DanielGrunwald said:

We just didn't get to that yet.

And we also don't remove redundant "yield break;" statement yets - we'll have to change the transformations for "return;" to also work with "yield break;"

March 8, 2011 4:19 AM
Powered by Community Server (Commercial Edition), by Telligent Systems
Don't contact us via this (fleischfalle@alphasierrapapa.com) email address.