SharpDevelop Community

Get your problems solved!
Welcome to SharpDevelop Community Sign in | Join | Help
in Search

Daniel Grunwald

ILSpy - decompiler architecture

In my introductory post two weeks ago, I said we had not decided on a decompiler engine yet. The decision was made soon after: we found some issues in the Cecil.Decompiler design, and so decided to move forward with our own decompiler engine (based on David's dissertation). David will post more about those issues and how we can avoid them in the ILSpy design.

Here, I will write about the ILSpy architecture. But actually, the best way to learn about it is to download the source code, compile a debug build, and run it:

As you can see in this screenshot, the debug builds offer several more "languages" than the release builds (which show C# and IL only). These additional languages represent the intermediate steps between the IL code and the decompiled C# code.

The decompiler works on two different representations: first, it transforms the IL code into an intermediate language we call "ILAst". In this language, every IL instruction is represented by exactly one function. Such functions calls can be nested when one IL instruction directly consumes the output of another:

callvirt(UIElement::set_IsEnabled, ldfld(AboutPage/<>c__DisplayClass7::button, ldarg(0)), ldc.i4(0))

In essence, the ILAst is a structured representation of the IL code. Further transformation steps introduce even more structure into this ILAst, such as loops and if/else constructs.

The last step done on the ILAst representation is type inference. If you look at the example like above, you'll see that the UiElement.set_IsEnabled method is called with the integer 0 as argument. However, the setter expects a boolean in C#. The issue here is that IL does not contain the full type information of the original program. For methods and fields, and even for local variables, full type information is available in the IL code. But for temporary results (stored on the IL evaluation stack), the CLR uses a less strict type system. In that type system, the type "I4" can stand for any of the following types: int bool uint short ushort byte sbyte, and also for any enum based on one of those integer types. The type analysis step uses inference to determine which of the C# types should be used by the decompiler. This results in the following ILAst:

callvirt:void(UIElement::set_IsEnabled, ldfld:Button(AboutPage/<>c__DisplayClass7::button, ldarg:AboutPage/<>c__DisplayClass7(0)), ldc.i4:bool(0))

This information allows the decompiler in the following step to know that the literal 0 was actually the literal false. So in the C# AST (which is built on top of the NRefactory library) we get the following statement:

this.button.IsEnabled = false;

Technically, we already have valid C# at this point. But we still proceed to transform this C# code, both to simplify away artifacts introduced by the compiler (or decompiler); and to introduce support for some of the more higher-level constructs in the C# language. In this example, the statement that was disabling the button was actually part of an anonymous method. The "DelegateConstruction" transformation step will inline anonymous methods into the method where they are instanciated. It also removes all traces of the "DisplayClass" (the class used by the C# compiler to represent the closure), leading to the following code:

    button.Click +delegate {
        button.IsEnabled = false;

Note that by inlining the anonymous method, the "this" was changed to refer to the closure instance, which was then subsequenly removed. The final code directly works on the local button variable, and none of the display class/closure implementation remains. The decompiled code now is almost identical to the original source code.

Published Feb 19 2011, 01:49 AM by DanielGrunwald
Filed under:


No Comments
Powered by Community Server (Commercial Edition), by Telligent Systems
Don't contact us via this ( email address.