SharpDevelop Community

Get your problems solved!
Welcome to SharpDevelop Community Sign in | Join | Help
in Search

Daniel Grunwald

February 2011 - Posts

  • ILSpy - decompiler architecture

    In my introductory post two weeks ago, I said we had not decided on a decompiler engine yet. The decision was made soon after: we found some issues in the Cecil.Decompiler design, and so decided to move forward with our own decompiler engine (based on David's dissertation). David will post more about those issues and how we can avoid them in the ILSpy design.

    Here, I will write about the ILSpy architecture. But actually, the best way to learn about it is to download the source code, compile a debug build, and run it:

    As you can see in this screenshot, the debug builds offer several more "languages" than the release builds (which show C# and IL only). These additional languages represent the intermediate steps between the IL code and the decompiled C# code.

    The decompiler works on two different representations: first, it transforms the IL code into an intermediate language we call "ILAst". In this language, every IL instruction is represented by exactly one function. Such functions calls can be nested when one IL instruction directly consumes the output of another:

    callvirt(UIElement::set_IsEnabled, ldfld(AboutPage/<>c__DisplayClass7::button, ldarg(0)), ldc.i4(0))

    In essence, the ILAst is a structured representation of the IL code. Further transformation steps introduce even more structure into this ILAst, such as loops and if/else constructs.

    The last step done on the ILAst representation is type inference. If you look at the example like above, you'll see that the UiElement.set_IsEnabled method is called with the integer 0 as argument. However, the setter expects a boolean in C#. The issue here is that IL does not contain the full type information of the original program. For methods and fields, and even for local variables, full type information is available in the IL code. But for temporary results (stored on the IL evaluation stack), the CLR uses a less strict type system. In that type system, the type "I4" can stand for any of the following types: int bool uint short ushort byte sbyte, and also for any enum based on one of those integer types. The type analysis step uses inference to determine which of the C# types should be used by the decompiler. This results in the following ILAst:

    callvirt:void(UIElement::set_IsEnabled, ldfld:Button(AboutPage/<>c__DisplayClass7::button, ldarg:AboutPage/<>c__DisplayClass7(0)), ldc.i4:bool(0))

    This information allows the decompiler in the following step to know that the literal 0 was actually the literal false. So in the C# AST (which is built on top of the NRefactory library) we get the following statement:

    this.button.IsEnabled = false;

    Technically, we already have valid C# at this point. But we still proceed to transform this C# code, both to simplify away artifacts introduced by the compiler (or decompiler); and to introduce support for some of the more higher-level constructs in the C# language. In this example, the statement that was disabling the button was actually part of an anonymous method. The "DelegateConstruction" transformation step will inline anonymous methods into the method where they are instanciated. It also removes all traces of the "DisplayClass" (the class used by the C# compiler to represent the closure), leading to the following code:

        button.Click +delegate {
            button.IsEnabled = false;
            ...
        }

    Note that by inlining the anonymous method, the "this" was changed to refer to the closure instance, which was then subsequenly removed. The final code directly works on the local button variable, and none of the display class/closure implementation remains. The decompiled code now is almost identical to the original source code.

  • ILSpy - Disassembler

    Even though the killer feature that everyone is waiting for is the decompiler; it's often still is interesting to directly look at the IL code. Of course, this feature can't be missing in a tool called ILSpy.
    Implementing it also provided me with a way to test the GUI logic while the decompiler is still under construction.

    We use AvalonEdit for the code view, so you can expect some advanced features:
    The most obvious one is syntax highlighting, which is using AvalonEdit's built-in highlighting engine. As usual for AvalonEdit, copying the text into any HTML-capable application (such as this blog's edit box in Firefox) will preserve the highlighting.
     .method public static hidebysig string CreateHtmlFragment

    Code Folding allows you to collapse and expand methods - useful if you're viewing a complete class.
    Finally, there's an important invisible feature: every reference is a hyperlink.
    Click on a branch target (e.g. IL_0063), and the code view jumps to the target IL instruction. Click on a type/member reference, and it will be selected in the tree view and will be decompiled.

    The disassembler itself also can perform a nice trick: it will display the code in an indented form.
    For this purpose, I wrote a very simple detector for try-catch-finally and loop structures.
    Basically, the exception handling table in the method's footer is converted into a tree, and then displayed inline (ILDasm has this feature as well).

    For loops, the detection logic is dead simple: if there's a backwards branch in the code, then that's probably a loop with the body from start to end. To verify whether it's indeed a well-formed loop, I find all branches from outside the loop body jumping into the loop body, and test whether all of those jump to the same instruction. If there are multiple entry points, then the structure is not considered a loop.
    As a last test, any loops that cannot be inserted into the tree structure (because the don't nest properly and overlap with loops detected earlier) are ignored as well.

    So if you have IL code like this: "ABAB" where A is a loop and B is another (with jump instructions from the end of one A to the start of the other), then this simple algorithm will be wrong and detect only one loop "(ABA)B" and incorrectly consider B to be part of the loop body.
    I experimented with some other loop detection algorithms and actually had one which worked pretty well and could handle even the above case. However, the loops detected by that algorithm were not necessarily consecutive code blocks (as with the example above), so they couldn't be displayed in the disassembler view without reording the IL code.
    I don't know of any compilers that generate loops in non-consecutive blocks of code; but some obfuscators employ such unusual code patterns. The decompiler will likely used a more advanced form of loop detection in order to deal with those cases.

  • ILSpy - a new .NET assembly inspector

    First, I'll let the screenshot speak for itself:

    Now that RedGate has announced that Reflector will no longer available be for free; the SharpDevelop team has started to create an open-source replacement: ILSpy

    Just as any of the assembly browsers inspired by ILDasm, the UI is simple: a tree view on the left allows you to view the contents of the assembly; the text view on the right shows the contents of the selected method.

    As for the decompiler engine itself, we still haven't decided - the screenshot above was created with Cecil.Decompiler, and you can clearly see that there are quite a few issues even in such a trivial method. We do have an alternative solution - David Srbecky (who wrote SharpDevelop's debugger) has written his own decompiler in early 2008 as part of his dissertation. However, tests show that whichever library we pick up, it will take a lot of work until the results come anywhere close to what people are used to.

    The new GUI is written in WPF and reuses a few SharpDevelop components - the tree view is SharpTreeView, written by Ivan Shumilin for the SharpDevelop WPF Designer outline view. This tree view is not used in any other place so far, but that will likely change in the future, as it has additional features over the normal treeview: multiselection, support for columns (GridView mode), and a built-in framework for copy/paste and drag'n'drop. The drag'n'drop support is already used in ILSpy to allow the user to reorder the assembly list.

    The text view on the right is, of course, using AvalonEdit, SharpDevelop's text editor (new as of version 4.0).

    If you are interested in contributing, write me a mail; or just join us in #sharpdevelop (on freenode).

Powered by Community Server (Commercial Edition), by Telligent Systems
Don't contact us via this (fleischfalle@alphasierrapapa.com) email address.