SharpDevelop Community

Get your problems solved!
Welcome to SharpDevelop Community Sign in | Join | Help
in Search

Daniel Grunwald

  • Decompiling Async/Await

    Just after the ILSpy 2.0 release, I started adding support for decompiling C# 5 async/await to ILSpy.

    You can get the async-enabled ILSpy build from our build server.

    The async support is not yet complete; for example decompilation fails if the IL evaluation stack is not empty at the point of the await expression.

    The decompilation logic highly depends on the patterns produced by the C# 5 compiler - it only works with code compiled with the C# compiler in the .NET 4.5 beta release, not with any previous CTPs. Also, it is likely that ILSpy will need adjustments for the final C# 5 compiler.

    While testing, I found that the .NET 4.5 beta BCL was not compiled with the beta compiler - where the beta compiler uses multiple awaiter fields, the BCL code uses a single field of type object and uses arrays of length 1. This is similar to the code generated by the .NET 4.5 developer preview, so my guess is that Microsoft used some internal version in between the developer preview and the beta for compiling the .NET 4.5 beta BCL. For more information, take a look at Jon Skeet's description of the async codegen changes.
    This means the ILSpy cannot decompile async methods in the .NET 4.5 beta BCL. This problem should disappear with the next .NET 4.5 release (.NET 4.5 RC?).

    So how does ILSpy decompile async methods, then? Consider the compiler-generated code of the move next method:

    // Async.$AwaitInLoopCondition$d__17
    void IAsyncStateMachine.MoveNext()
    {
        try
        {
            int num = this.$1__state;
            TaskAwaiter<bool> taskAwaiter;
            if (num == 0)
            {
                taskAwaiter = this.$u__$awaiter18;
                this.$u__$awaiter18 = default(TaskAwaiter<bool>);
                this.$1__state = -1;
                goto IL_7C;
            }
            IL_23:
            taskAwaiter = this.$4__this.SimpleBoolTaskMethod().GetAwaiter();
            if (!taskAwaiter.IsCompleted)
            {
                this.$1__state = 0;
                this.$u__$awaiter18 = taskAwaiter;
                this.$t__builder.AwaitUnsafeOnCompleted<TaskAwaiter<bool>, Async.$AwaitInLoopCondition$d__17>(ref taskAwaiter, ref this);
                return;
            }
            IL_7C:
            bool arg_8B_0 = taskAwaiter.GetResult();
            taskAwaiter = default(TaskAwaiter<bool>);
            if (arg_8B_0)
            {
                Console.WriteLine("Body");
                goto IL_23;
            }
        }
        catch (Exception exception)
        {
            this.$1__state = -2;
            this.$t__builder.SetException(exception);
            return;
        }
        this.$1__state = -2;
        this.$t__builder.SetResult();
    }

    The state machine works similar to the one used by yield return; so we could reuse a lot of the code from the yield return decompiler.
    Each try block begins with a state dispatcher: depending on the value of this.$1__state, the code jumps to the appropriate location. If the async method involves exception handling, there will be a separate state dispatcher at the beginning of each try block.
    In this case, there are only two states: the initial state (state = -1) and the state at the await expression (state = 0). The state dispatcher consists only of the two statements "int num = this.$1__state; if (num == 0)". We rely on the fact that in the actual IL code, the state dispatcher is a contiguous sequence of IL instructions, in front of any of the method's actual code.

    Note that the async/await decompiler step runs on the ILAst very early in the decompiler pipeline, immediately after the yield return transform, which is prior to any control flow analysis. We're basically still dealing with IL instructions here; but I'm explaining it in terms of C# as that is easier to read (and makes the code much shorter).

    The analysis of the state dispatcher works using symbolic execution; it is described in more detail in the yield return decompiler explanation. In our example, the result of the analysis is that the beginning of the first if statement is reached for state==0, and label IL_23 is reached for all other states.

    With this information, we start cleaning up the control flow of the method. We look for any 'return;' statements and analyze the instructions directly in front:

                this.$1__state = 0;
                this.$u__$awaiter18 = taskAwaiter;
                this.$t__builder.AwaitUnsafeOnCompleted<TaskAwaiter<bool>, Async.$AwaitInLoopCondition$d__17>(ref taskAwaiter, ref this);
                return;

    We then replace this piece code with an instruction that represents the AwaitUnsafeOnCompleted call (represented as "await ref taskAwaiter;" in the following code), followed by a goto to the label for the target state (using the information gained from the symbolic execution). We also remove the boilerplate associated with the $t__builder and the state dispatcher. For demonstration purposes, I'll skip the remaining steps of the async/await decompiler and resume the pipeline to decompile the ILAst to C#, producing the following code:

    public async void AwaitInLoopCondition()
    {
        while (true)
        {
            TaskAwaiter<bool> taskAwaiter = this.$4__this.SimpleBoolTaskMethod().GetAwaiter();
            if (!taskAwaiter.IsCompleted)
            {
                await ref taskAwaiter;
                taskAwaiter = this.$u__$awaiter18;
                this.$u__$awaiter18 = default(TaskAwaiter<bool>);
                this.$1__state = -1;
            }
            bool arg_8B_0 = taskAwaiter.GetResult();
            taskAwaiter = default(TaskAwaiter<bool>);
            if (!arg_8B_0)
            {
                break;
            }
            Console.WriteLine("Body");
        }
    }

    As you can see, this transformation has simplified the control flow of the method dramatically.

    We now just perform some finishing touches on the method:

    • Access to the state machine fields is replaced with local variable access, e.g. "this.$4__this" becomes "this".
    • We detect the "GetAwaiter() / if (!taskAwaiter.IsCompleted) / GetResult() / clear awaiter" pattern and replace it with a simple await expression

    Mind that all of this isn't done on the C# representation, but in an early stage of the ILAst pipeline. After some simplifications (variable inlining, copy propagation), the resulting ILAst looks like this:

    br(IL_23)
    IL_16:
    call(Console::WriteLine, ldstr("Body"))
    IL_23:
    brtrue(IL_16, await(callvirt(Async::SimpleBoolTaskMethod, ldloc(this))))
    ret()

    Apart from the 'await' opcode, this is exactly the same as the while-loop would look in a non-async method. The remainder of the decompiler pipeline will detect the loop and translate it to the C# code you've seen in the introductory screenshot.

  • MSBuild Multi-Targeting in SharpDevelop

    SharpDevelop has had multi-targeting support for a long time - for example, SharpDevelop 2.0 supported targeting .NET 1.0, 1.1 and 2.0. Our original multi-targeting implementation would not only change the target framework, but also use the matching C# compiler version*.

    When Visual Studio 2008 and MSBuild 3.5 came along and introduced official multi-targeting support, we separated the 'target framework' and 'compiler version' settings. The 'target framework' setting uses the <TargetFrameworkVersion> MSBuild property, which is the official multi-targeting support as in Visual Studio 2008. The 'compiler version' setting determines the MSBuild ToolsVersion, which controls the version of the C# compiler to use - Visual Studio does not have this feature.

    I'll call the latter feature MSBuild Multi-Targeting, as this allows us to pick the MSBuild version to use, and thus enables SharpDevelop to open and edit VS 2005 or 2008 projects without having to upgrade them to the VS 2010 project format.

    Unfortunately, life isn't as simple as that. It turns out that MSBuild 4.0 is unable to compile projects with a ToolsVersion lower than 4.0 if the Windows SDK 7.1 is not installed. To allow users to use SharpDevelop without downloading the Windows SDK, we implemented a simple fix: we use MSBuild 3.5 to compile projects with a ToolsVersion of 2.0 or 3.5. This is why SharpDevelop ships with both "ICSharp­Code.Sharp­Develop.Build­Worker40.exe" and "ICSharp­Code.Sharp­Develop.Build­Worker35.exe".

    Now what happens if SharpDevelop is run on a machine without .NET 3.5? If the framework specified by the 'ToolsVersion' is missing, SharpDevelop crashed with an MSBuild error when opening the project. There were also crashes when creating/upgrading projects to missing ToolsVersions. Moreover, in the rare scenario where .NET 2.0 and .NET 4.0 are installed, but .NET 3.5 is missing, SharpDevelop was able to open the project but the build worker would crash when trying to compile.

    For this reason, the SharpDevelop 4.0 and 4.1 setups require both  .NET 3.5 and .NET 4.0 to be installed. This wasn't an issue when we made that decision - .NET 3.5 is likely to be already installed on most machines. However, Windows 8 will change that - .NET 4.5 is installed by default, but .NET 3.5 is missing. So we added the necessary error handling to SharpDevelop 4.2. The SharpDevelop 4.2 setup no longer requires .NET 3.5 - you'll need it only when targeting .NET 2.0/3.0 or 3.5.

    Another issue is that .NET 4.0 does not ship with the Reference Assemblies - you need to install the Windows SDK to get those. This causes MSBuild to reference the assemblies in the GAC instead, which might be a later version (due to installed service packs or in-place upgrades like .NET 4.5), and also emit massive amounts of warnings (one warning per reference). Moreover, it caused the 'Copy Local' flag to default to true for references to .NET assemblies, causing System.dll etc. to be copied into the output directory.

    At the time, the reference assemblies were only available as part of Visual Studio 2010 - the free Windows SDK 7.1 was released later. So it was a high priority for us to work around this problem. For this reason, SharpDevelop injects a custom MSBuild .targets file into the project being built: SharpDevelop.TargetingPack.targets. This file runs a simple custom MSBuild task that detects references to default .NET assemblies and sets the 'Copy Local' flag to false. (we also inject several other custom .targets files; for example for running FxCop or StyleCop as part of a build)

    We used the Microsoft.Build.Utitilies.dll when implementing this custom task. However, that library ships only with .NET 2.0, not with .NET 4.0, so we had to switch to Microsoft.Build.Utitilies.v4.dll to get the C# 4.0 build working without .NET 2.0. This should not be a problem as the copy local workaround is only included when targeting .NET 4.0 or higher, so we won't try to load it the 3.5 build worker process.

     

    To summarize, the SharpDevelop 4.2 setup requires:

    • Windows XP SP2 or higher
    • .NET 4.0 Full (.NET 4.5 Full will also work)
    • VC++ 2008 runtime (part of .NET 3.5 so most people have it already)
    • In the minimal configuration, you can only compile for .NET 4.0 using MSBuild 4.0.

    Additionally:

    • If .NET 4.5 is installed, the C# 5 compiler will replace the C# 4 compiler; and .NET 4.5 will appear as an additional target framework.
    • If .NET 3.5 SP1 is installed, you will be able to use .NET 2.0/3.0/3.5 as target framework, and C# 2 and C# 3 as compiler versions.
    • Installing the Windows SDK 7.1 is highly recommended (provides reference assemblies and documentation for code completion).
    • Some SharpDevelop features might require installation of additional tools such as FxCop, StyleCopF#, TortoiseSVN, SHFB.

    * Everything said about the C# compiler in this post also applies to the VB compiler.

  • ILSpy 2.0 Beta 1

    After a long pause, we have finally released the first Beta of ILSpy 2.0.

    Download:

    New features compared with version 1.0:

    • Assembly Lists
    • Support for decompiling Expression trees
    • Support for lifted operatores on nullables
    • Integrated Debugger
    • Decompile to Visual Basic
    • Search for multiple strings separated by space (searching for "Assembly manager" in ILSpy.exe would find AssemblyListManager)
    • Clicking on a local variable will highlight all other occurrences of that variable
    • Ctrl+F can be used to search within the decompiled code view
  • SharpDevelop 5 - NRefactory 5 + semantic highlighting

    While we have several more releases of SharpDevelop 4.x planned; we are also in parallel working on a big new release: SharpDevelop 5.0.

    The major change in SharpDevelop 5 will be the complete rewrite of the NRefactory and SharpDevelop.Dom libraries, which together implement code-completion and many refactorings. Work on NRefactory 5 originally started in August 2010, as a new library written from scratch.

    The NRefactory C# parser and abstract syntax tree was rewritten by Mike Krüger. The new version of the AST contains all individual tokens, and has positions on every node - this makes the implementation of refactorings a lot easier.

    The new AST is already in use in ILSpy - when we started working on ILSpy in February, we immediately ported the old decompiler (from David's dissertation) to NRefactory 5. As part of the ILSpy development, I added several new features to NRefactory - for example the pattern matching support for the AST, and the visitor that converts the AST back to C# code.

    I also wrote a new resolver for NRefactory (the resolver previously was part of SD.Dom). The main goals here were to:

    • Improve performance by making the step of resolving type references an explicit method call
    • Follow the C# specification more closely
    • Improve performance in general
    • Add support for projects that target framework X but get used in framework Y projects (ITypeResolveContext).

    I started working on the new resolver in October 2010 and completed it in August 2011 (feature-complete for C# 4.0). All this work was besides the work on SD 4.x and ILSpy, not to mention my regular "job" (I'm a student and work on SharpDevelop in my free time).

    Since then I've started integrating NRefactory 5 into SharpDevelop. This still is a huge task, as basically anything that deals with C# code, - i.e. class browser, code completion, go to definition, find references, all refactorings, the forms designer loader, etc. - needs to be rewritten or at least adjusted.

    But today I tried something a little different: a new feature - semantic highlighting.

        struct Color {
            public static void StaticMethod();
            public void InstanceMethod();
        }
        
        class Program
        {
            Color color;
            public Color Color {
                get { return color; }
                set { this.color = value; }
            }
            
            public void X(int value)
            {
                Color.StaticMethod();
                Color.InstanceMethod();
                Color.MissingMethod();
            }

    This code (with syntax highlighting) was copied out of an early SharpDevelop 5 alpha version. As you can see, we now highlight references to types in a blue-greenish color - but only when semantics of the code actually refer to the type, as the difference between the static and instance method calls shows.

    Fields are highlighted in italics, which allows telling fields and local variables apart in a large method. Additionally, the dark blue/bold method highlighting is now only applied when the method exists - not to delegate-typed fields, and not to missing methods. And context-dependent keywords like value are highlighted only when used in the appropriate context.

    The highlighting engine has the full power of the NRefactory resolver available, which allows use to easily experiment with new types of highlightings - for example, it would take only three lines of code to, say, highlight extension methods in a different color than normal methods. Or value types differently than reference types.

    We also plan to detect syntax errors and the most common semantic errors (method not found, missing parameter, cannot convert from type X to Y) and highlight those errors while typing, so that you don't have to recompile as often.

    There are tons of potential highlightings that give useful information - but if we use all of them, the text editor will look way too colorful and noisy. What do you think is most important?

     

  • ILSpy - Decompiler Architecture Overview - Part 2

    The decompiler pipeline can be separated into two phases: the first phase works on a tree representation of the IL code - I described the steps of that phase in the previous post.

    The second phase works on C# code, and is the topic of this blog post.

    To give you a reminder: the ILAst is a tree of IL instructions, with pseudo-opcodes inserted for the higher-level language constructs that were detected. Let's take a look how the example C# code from the previous blog post looks in the final ILAst (after type inference).

    Original C#:

    static IEnumerable<IEnumerable<char>> Test(List<string> list)
    {
        foreach (string current in list) {
            yield return (from c in current where char.IsUpper(c) || char.IsDigit(c) select char.ToLower(c));
        }
        yield return new List<char> { 'E', 'N', 'D' };
    }

    Final ILAst:

    As you can see, ILAst already detected the following language constructs:

    • yield return
    • "while (enumerator.MoveNext())" loop
    • collection initializer

    Still missing are the "foreach" construct, and the contents of the lambdas that were created by the query expression.

    As with the previous blog post, you might want to run a debug build of ILSpy and load the above example into it, so that you can see the full output of the intermediate steps. Only debug builds make the intermediate steps of the decompiler pipeline available in the language dropdown.

    C# Code Generation

    We generate C# code from the ILAst. The C# code is represented using the C# Abstract Source Tree from the NRefactory library. This step uses the type information to insert casts where required, so that the semantics of the generated C# code match the semantics of the original IL code.

    However, some semantics, like the overflow checking context (checked or unchecked) or exact type referenced by a short type name, are not yet translated into C#, but stored as annotations on the C# AST. But apart from those details, the resulting code is valid C#.

    This step is probably the biggest source of bugs, as matching the IL and C# semantics to each other isn't easy. For example, IL explicitly specifies which method is being called; but C# uses overload resolution. Inserting casts so that the resulting C# will call the correct method is not simple - we don't want to insert casts everywhere, as that would make the code hard to read.

    Implementation: AstMethodBodyBuilder.cs

    C# Code Transforms

    All the remaining steps are transformations on the C# AST. NRefactory provides an API similar to System.Xml.Linq for working with the C# source tree, which makes modifications relatively easy. Additionally, the visitor pattern can be used to traverse the AST; and NRefactory implements a powerful pattern-matching construct.

    The decompiler transformations are implemented as classes in the ICSharpCode.Decompiler.Ast.Transforms namespace.

    PushNegation

    This transformation eliminates negations where possible. Some ILAst operations introduce additional negations - for example "brtrue(lbl, condition); ...; lbl:" becomes "if (!condition) { ... }". The transform will remove double negations, and will move remaining negations into other operators. "!(a == b)" becomes "a != b".

    DelegateConstruction

    So far, delegates were compiled into code such as "new D(obj, ldftn(M))". That is, a delegate takes two arguments: the target object, and the method being called. The target object is null if the method is static. This isn't valid C#, so we transform it into "new D(obj.M)". However, if the method is anonymous, then we decompile the target method (up to the DelegateConstruction step) and put the decompiled method body into an anonymous method. Or, if the method body consists of only a single return statement, we use expression lambda syntax.

    Applied to our example code, we get "(char c) => char.IsUpper(c) || char.IsDigit(c)" and "(char c) => char.ToLower(c)". Now, in this example, the lambdas did not capture any variables, and thus got compiled to static methods. The transform gets more complicated if the lambda does capture variables: In this case, the C# compiler would have created an inner class (called "DisplayClass") to represent the closure. The compiler puts all captured variables as fields into that closure.

    To decompile that correctly, the first part of the DelegateConstruction transform will replace any occurrences of "this" within the lambda body with the target object that was passed to the delegate. This makes the resulting code somewhat correct - but the closure is still visible in the decompiled code. For example, a method implementing curried addition of 3 integers would look like this:

    public static Func<int, Func<int, int>> CurriedAddition(int a)
    {
        DelegateConstruction.c__DisplayClass13 c__DisplayClass;
        c__DisplayClass = new DelegateConstruction.c__DisplayClass13();
        c__DisplayClass.a = a;
        return delegate(int b)
        {
            DelegateConstruction.c__DisplayClass13.c__DisplayClass15 c__DisplayClass2;
            c__DisplayClass2 = new DelegateConstruction.c__DisplayClass13.c__DisplayClass15();
            c__DisplayClass2.CS$8__locals14 = c__DisplayClass;
            c__DisplayClass2.b = b;
            return (int c) => c__DisplayClass2.CS$8__locals14.+ c__DisplayClass2.+ c;
        };
    }

    In a second step, the DelegateConstruction transformation looks for such display classes, and removes them by promoting all their fields to local variables. If one of the fields is simply a copy of the function's parameter, no new local variable is introduced, but the parameter is used instead.

    So after this cleanup step is complete, the curried addition example will look exactly like the original C# code:

    public static Func<int, Func<int, int>> CurriedAddition(int a)
    {
        return (int b) => (int c) => a + b + c;
    }

    PatternStatementTransform

    This step does pattern-matching. It defines code patterns for the following language constructs:

    • using
    • foreach (both on generic and on non-generic collections)
    • for
    • do..while
    • lock
    • switch with string
    • Automatic Properties
    • Automatic Events (normal events without explicit add/remove accessor)
    • Destructors
    • try {} catch {} finally

    The expanded code pattern is searched for using NRefactory's pattern matching. When it is found, some additional checks are performed to see whether the transformation is valid (e.g. a 'foreach' loop variable must not be assigned to). If it is valid, the matched code pattern is replaced with the detected construct.

    Since the ILAst has only one loop construct, all generated C# code initially uses only while-loops. But if a loop looks like this: "while (true) { ... if (condition) break; }", then we can change it into "do { } while (!condition);". Using NRefactory's pattern matching, the pattern we look for can be defined easily:

    static readonly WhileStatement doWhilePattern = new WhileStatement {
        Condition = new PrimitiveExpression(true),
        EmbeddedStatement = new BlockStatement {
            Statements = {
                new Repeat(new AnyNode("statement")),
                new IfElseStatement {
                    Condition = new AnyNode("condition"),
                    TrueStatement = new BlockStatement { new BreakStatement() }
                }
            }}};

    Pattern matching reuses some ideas from regular expressions: the "Repeat" node will match any number of nodes (like the star operator in regular expressions ), and the strings passed to the "AnyNode" constructor create capture groups with that names. For a successful match, the "statement" group will contain all statements except for the final "if (condition) break;", and the "condition" group will contain the loop condition.

    ReplaceMethodCallsWithOperators

    This step eliminates invocations of user-defined operators ("string.op_Equality(a,b)") and replaces them with the operator itself ("a == b").

    It also simplifies statements of the form "localVar = localVar + 1;" to use the post-increment operator. Note that this transformation is only valid for statements -- within expressions, we would be required to use pre-increment.

    IntroduceUnsafeModifier

    This transformation looks through the method body and looks for any operations that are valid only in an unsafe context. If any are found, the method is marked with the unsafe modifier.

    The step also contains some code readability improvements for unsafe code: "*(ptr + num)" gets transformed to "ptr[num]", and "(*ptr).Member" gets transformed to "ptr->Member".

    AddCheckedBlocks

    In IL, there are different opcodes for instructions with and without overflow checking (e.g. "add" vs. "add.ovf"). However, in C# the overflow checking cannot be specified on single operators, but only on whole expressions. For example, "a = checked(b + c)" will also evaluate the sub-expressions b and c with overflow checking. If those contain any IL opcodes that didn't use overflow checking, then the C# code must use unchecked expressions within the checked expression.

    Code can quickly get unreadable if you do this around every instruction, so we looked for a way to place the blocks intelligently. We formulated the problem as an optimization problem, with the following goal:

    1. Use the minimum number of checked blocks and expressions
    2. Prefer checked expressions over checked blocks
    3. Make the scope of checked expressions as small as possible
    4. Make the scope of checked blocks as large as possible

    We use dynamic programming to calculate the optimal solution in linear time. Essentially, the algorithm calculates two solutions for each node: both have optimal cost, but one expects the parent context to be checked, the other expects it to be unchecked. This allows composing the whole solution (global optimum) from the partial solutions (optimal solutions for each node in the two contexts).

    DeclareVariables

    So far, all variables were declared at the start of the method. This step aims to make the code more readable by moving the variable declarations so that they have the smallest possible scope.

    This step will introduce multiple declarations for the same variable whenever this is allowable. This might happen if two loops use the same variable, but the value assigned to the variable by the first loop will never be read by the second loop.

    Basically, we split up a variable whenever this is possible without triggering the C# compiler error "Use of unassigned local variable" - if the second code block ensures it always initializes the variable before reading it, it can impossible read the value assigned by the first code block. For this purpose, I implemented C# definite assignment analysis, which is surprisingly complex - the specification is 10 pages long, and makes heavy use of the reachability rules, which take another 10 pages in the C# specification.

    ConvertConstructorCallIntoInitializer

    This step is all about constructors. First, we look at all constructors in the current class. If they all start with the same instruction, and that instruction is assigning to an instance field in the current class, then we convert that statement into a field initializer.

    After that, all constructors should start with a call to the constructor of the base class. We take that call, and change it into an initializer (" : base()" syntax).

    IntroduceUsingDeclarations

    When initially creating C# code from the ILAst, ILSpy always uses short type names (without namespace name). However, it annotates the type references, so that the referenced type is still known.

    This step looks at the annotations and introduces the appropriate using declarations. Then, the step looks at all referenced assemblies, and looks which types were imported by the using declarations. If several types with the same name were imported, that name is marked as ambiguous.

    Now, the transformation again looks at all type references, and fully qualifies those that are ambiguous.

    Note that this transformation step is disabled when you use ILSpy to look at a single method. It is used only when decompiling a whole class, or when decompiling a whole assembly.

    IntroduceExtensionMethods

    This step will replace calls to extension methods with the infix syntax. "Enumerable.Select(a, b)" becomes "a.Select(b)".

    Now, let me show you the decompiled running example after this step:

    private static IEnumerable<IEnumerable<char>> Test(List<string> list)
    {
        foreach (string current in list)
        {
            yield return current.Where((char c) => char.IsUpper(c) || char.IsDigit(c)).Select((char c) => char.ToLower(c));
        }
        yield return new List<char> { 'E',  'N',  'D' };
    }

    IntroduceQueryExpressions

    This step takes a look at method calls, and tries to find patterns that look like the output of C# query expressions. Basically, we apply the same steps as the C# compiler when it translates query expressions into method calls, but in reverse.

    This results in the following decompiled code:

            yield return 
                from c in current
                where char.IsUpper(c) || char.IsDigit(c)
                select char.ToLower(c);

    The IntroduceQueryExpressions step does a mostly literal translation of method calls to query clauses. However, the C# language defined some query expressions to be translated in terms of other query expressions. Examples are "let" clauses and query continuations with "into". Especially let-Clauses are tricky; since they cause the C# compiler to generate so-called transparent identifiers (see C# specification for details). Such a query might look like this:

    from <>h__TransparentIdentifier2b in
        from o in orders
        select new
        {
            o = o, 
            t = o.Details.Sum((QueryExpressions.OrderDetail d) => d.UnitPrice * d.Quantity)
        }
    where <>h__TransparentIdentifier2b.>1000m
    select new
    {
        OrderID = <>h__TransparentIdentifier2b.o.OrderID, 
        Total = <>h__TransparentIdentifier2b.t
    };

     

    CombineQueryExpressions

    This step combines LINQ queries to simplify them (e.g. introduces query continuations); and gets rid of transparent identifiers by re-introducing the original 'let' clause. The above query combined results in the easy-to-understand query:

    from o in orders
    let t = o.Details.Sum((QueryExpressions.OrderDetail d) => d.UnitPrice * d.Quantity)
    where t >1000m
    select new
    {
        OrderID = o.OrderID, 
        Total = t
    };

    This concludes the transformations done by the decompiler.

    There's only one tiny detail left: we run NRefactory's InsertParenthesesVisitor, which introduces both required parentheses, and some additional parentheses to make the code more readable. The parenthesis-inserting step will run even if you use the language drop down to stop the decompilation at a previous step.

    The very last step, of course, is the OutputVisitor, which generates text from the C# AST.

  • ILSpy - Decompiler Architecture Overview

    When ILSpy was only two weeks old, I blogged about the decompiler architecture. The basic idea of the decompiler pipeline (IL -> ILAst -> C#) is still valid, but there were several changes in the details, and tons of additions as ILSpy learned about more features in the C# language.

    The pipeline has grown a lot - there are now 47 separate steps, while in the middle of February (when the previous architecture post was written), there were only 14.

    If you want to follow this post, grab the source code of ILSpy and create a debug build, so that you can take a look at the intermediate steps while I am discussing them. Only debug builds will show all the intermediate steps in the language dropdown.

    It's impossible to give a short sample where every intermediate step does something (the sample would have to use every possible C# feature), but the following sample should show what is going on in the most important steps:

    static IEnumerable<IEnumerable<char>> Test(List<string> list)
    {
        foreach (string current in list) {
            yield return (from c in current where char.IsUpper(c) || char.IsDigit(c) select char.ToLower(c));
        }
        yield return new List<char> { 'E', 'N', 'D' };
    }

    Take this code, compile it, and then decompile it with a debug build of ILSpy, so that you can take a look at the results of the intermediate steps.

    Essentially, the decompiler pipeline can be separated into two phases: the first phase works on a tree representation of the IL code - we call this representation the ILAst. The second phase works on C# code, stored in the C# Abstract Syntax Tree provided by the NRefactory library.

    ILSpy uses the Mono.Cecil library for reading assembly files. Cecil parses the IL code into a flat list of IL instructions, and also takes care of reading all the metadata. Thus, the decompiler's input is Cecil's object model, giving it approximately the same information as you see when you select 'IL' language in the dropdown.

    ILAst

    We construct the intermediate representation ILAst. Basically, every IL instruction becomes one ILAst instruction. The main difference is that ILAst does not use an implicit evaluation stack, but creates temporary variables for every write to a stack location. However, the ILAst also supports additional opcodes (called pseudo-opcodes) which are used by various decompiler steps to represent higher-level constructs.

    Another difference is that we create a tree structure for try-finally blocks - Cecil just provides us with the exception handler table from the metadata.

    Implementation: ILAstBuilder.cs

    Variable Splitting

    Using data flow analysis, we split up variables where possible. So if you had "x = 1; x = add(x, 1);", that will become "x_1 = 1; x_2 = add(x_1, 1)". We do not use SSA form for this (although there's an unused SSA implementation left over in the codebase), we only split variables up when this is possible without having to introduce phi-functions. The goal of this operation is to make compiler-generated variables eligible for inlining.

    Implementation: ILAstBuilder.cs

    ILAst Optimizations

    • Dead code removal. We remove unreachable code, because it's impossible to infer any information about the stack usage of unreachable code. Also, obfuscators tend to put invalid IL into unreachable code sections. This actually already happens as part of the ILAst construction, before variable splitting.
    • Remove redundant code
      • Delete 'nop' instructions
      • Delete 'br' instructions that jump directly to the next instruction
      • Delete 'dup' instructions - since ILAst works with variables for stack locations, we can just read a variable twice, eliminating the 'dup'.
    • Simplify instruction set for branch instructions
      • Replaces all conditional branches with 'brtrue'. This works by replacing the 'b*' instructions (branch instructions) with 'brtrue(c*)' (branch if compare instruction returns true). This step makes use the 'LogicNot' pseudo-opcode.
        The goal simply is to reduce the number of different cases that the following steps have to handle.
    • Copy propagation. This is a classical compiler optimization; however, ILSpy uses it only for two specific cases:
      • Any address-loading instruction is copied to its point of use. This ensures that no decompiler-generated variable has a managed reference as type - "ref int v = someVariable;" wouldn't be valid C# code, so we have to instead use "ref someVariable" in the place where "v" is used.
      • Copies of parameters of the current function are propagated, as long as the parameter is never written to. This mainly exists in order to propagate the "this" parameter, so that the following patterns can detect it more easily.
    • Dead store removal. If a variable is stored and nobody is there to read it, then was it really written?
      Originally we removed all such dead stores; but after some users complained about 'missing code', we restricted this optimization to apply only to stack locations. Dead stores to stack locations occur mainly after the removal of 'pop' instructions.

    The optimizations are primarily meant to even out the differences between debug and release builds, by optimizing away the stuff that the C# compiler adds to debug builds.

    Implementation: ILAstOptimizer.cs

    Inlining

    We perform 'inlining' on the ILAst. That is, if instruction N stores a variable, and instruction N+1 reads it, and there's no other place using that variable, then we move the definition of the variable into the next expression.

    So "stack0 = local1; stack1 = ldc.i4(1); stack2 = add(stack0, stack1); local1 = stack2" will become "local1 = add(local1, ldc.i4(1))". Inlining is the main operation that produces trees from the flat IL.

    Implementation: ILInlining.cs

    Yield Return

    If the method is an iterator (constructs a [CompilerGenerated] type that implements IEnumerator), then we perform the yield-return-transformation.

    Implementation: YieldReturnDecompiler.cs

    Analysis of higher-level constructs

    After inlining, we tend to have a single C# statement in a single ILAst statement. However, some C# expressions compile to a sequence of statements. We now try to detect those constructs, and replace the statement sequence with a single statement using a pseudo-opcode.

    We can detect and replace a construct only if it's represented by consecutive statements, so when one construct is nested in another, we first have to process the nested construct before processing the outer construct. Because constructs can be nested arbitrarily, we run all the analyses in a "do { ... } while(modified);" loop. If you select "ILAst (after step X)" in the language dropdown, decompilation will stop after that step in the first loop iteration.

    • SimplifyShortCircuit: introduces && and || operators.
    • SimplifyTernaryOperator: introduces ?: operator
    • SimplifyNullCoalescing: introduces ?? operator
    • JoinBasicBlocks: The decompiler tries to use the minimal possible number of basic blocks. Some optimizations might remove branches and therefore it is necessary to check whether two consecutive basic blocks can be joined into one after such optimizations. It is important to do this because other optimizations like inlining might not work if the code is split into two basic blocks.
    • TransformDecimalCtorToConstant: changes invocations of the "new decimal(int lo, int mid, int hi, bool isNegative, byte scale)" constructor into literals.
    • SimplifyLdObjAndStObj: replaces "ldobj(ldloca(X))" with "ldloc(X)", and similar for other kinds of address-loading instructions.
    • TransformArrayInitializers: introduces array initializers
    • TransformObjectInitializers: introduces object and collection initializers
    • MakeAssignmentExpression: detects when the result of an assignment is used in another expression, and inlines the stloc-instruction accordingly. This is essential for decompiling loops like "while ((line = r.ReadLine()) !null)", as otherwise the loop condition couldn't be represented as a single expression.
      This step also introduces the 'CompoundAssignment' opcode for C# code like "this.M().Property *10;". Only because this step de-duplicates the expression on the left-hand side of the assignment, the "this.M()" method call can be inlined into it.
    • IntroducePostIncrement: While pre-increments are handled as special case of compound assignments; post-increment expressions need to be handled separately.
    • InlineVariables2: this performs inlining again, since the steps in the loop might have opened up additional inlining possibilities. The next loop iteration depends on the fact that variables are inlined where possible.

    Implementation: ILAstOptimizer.cs, PeepholeTransform.cs, InitializerPeepholeTransform.cs

    To get more of an idea of what is going on, consider the collection initializer "new List<char> { 'E', 'N', 'D' }". In the ILAst, this is represented as 5 separate instructions:

    stloc(g__initLocal0, newobj(List`1<char>::.ctor))
    callvirt(List`1<char>::Add, ldloc(g__initLocal0), ldc.i4(69))
    callvirt(List`1<char>::Add, ldloc(g__initLocal0), ldc.i4(78))
    callvirt(List`1<char>::Add, ldloc(g__initLocal0), ldc.i4(68))
    yieldreturn(ldloc(g__initLocal0))

    The collection initializer transformation will change this into:

    stloc(g__initLocal0, initcollection(newobj(List`1<char>::.ctor), callvirt(List`1<char>::Add, initializedobject(), ldc.i4(69)), callvirt(List`1<char>::Add, initializedobject(), ldc.i4(78)), callvirt(List`1<char>::Add, initializedobject(), ldc.i4(68))))
    yieldreturn(ldloc(g__initLocal0))

    Now after this transformation, the value g__initLocal0 is written to exactly once, and read from exactly one. This allows us to inline the 'initcollection' expression into the 'yieldreturn' statement, thus combining all of the 5 original statements into a single one.

    Loop Detection and Condition Detection

    Using control flow analysis (finding dominators and dominance frontiers), we detect loops in the control flow graph. A heuristic on a control flow graph is used to find the most likely loop body.

    We also build 'if' statements from the remaining conditional branch instructions.

    Implementation: LoopsAndConditions.cs

    Goto Removal

    Goto statements are removed when they are made redundant by the control flow structures built up in the previous step. Remaining goto statements are converted into 'break;' or 'continue;' statements where possible.

    Implementation: GotoRemoval.cs

    Reduce If Nesting

    We try to re-arrange the if statements to reduce the nesting level. For example, if the end of the then-block is unreachable (e.g. because the then-block ends with 'return;'), we can move the else block below the if statement.

    Remove Delegate Initialization

    The C# compiler will use static fields (and in some cases also local variables) to cache the delegate instances associated with lambda expressions. This step will remove such caching, which opens up additional inlining opportunities. In fact, we will have to move this step into the big 'while(modified)' loop so that we can correctly handle lambda expressions within object/collection initializers.

    Introduce Fixed Statements

    .NET implements fixed statements as special 'pinned' local variables. As there isn't any representation for those in C#, we translate them into 'fixed' statements.

    Variable Recombination

    Split up variables were useful for inlining and some other analyses; but now we don't need them any more. This step simply recombines the variables that we split up earlier.

    Type Analysis

    Here, finally, comes the semantic analysis. All previous steps just transformed the IL code. Some were introducing some higher-level constructs, but those were defined as pseudo-IL-opcodes, which pretty much just are shorthands for certain IL sequences. Semantic analysis now figures out whether "ldc.i4(1)" means "1" or "true" or "StringComparison.CurrentCultureIgnoreCase".

    This is formulated as a type inference problem: we determine the expected type and the actual type for each expression in the ILAst. In case some decompiler-generated variables (for the stack locations) weren't removed by the ILAst transformations, we also need to infer types for those.

    Implementation: TypeAnalysis.cs

    This concludes our discussion of the first phase of the decompiler pipeline. In the next post, I will describe the translation to C# and the remaining transformations.

  • ILSpy - Query Expressions

    ILSpy supports LINQ query expression - we added that feature shortly before the M2 release.

    Today, I implemented support for decompiling object initializers and fixed some bugs related to deeply nested lambdas. With these two improvements, query expression translation becomes possible in several more cases.

    This screenshot shows Luke Hoban's famous LINQ ray-tracer.

    Why are queries related to object initializers? Simple: LINQ queries allow only the use of expressions. When an object initializer is decompiled into multiple statements, there's no way to fit those into a "let" or "select" clause, so query expression translation has to abort.

    Another issue with this sample was the deep nesting of the compiler-generated lambdas. Once closures are nested more than two levels deep, the C# compiler starts copying the parent-pointer from one closure into its subclosure ("localsZ.localsY = localsX.localsY;"). This case was missing from the lambda decompilation, so some references to the closure classes were left in the decompiled code. This bug has now been fixed, so nested lambdas should decompile correctly.

    We're now close to supporting all features in C# 3.0, the only major missing item is expression tree support. So LINQ queries currently decompile into query syntax only if they're compiled into delegates (LINQ-to-Objects, Parallel LINQ), not if they're compiled into expression trees (LINQ-to-SQL etc.).

  • ILSpy - yield return

    This weekend, I worked on decompiling 'yield return' statements. The C# compiler is performing quite a bit magic to make 'yield return' work, and the decompiler must be aware of all this magic and be able to revert it.

    After two days of hard work, I'm happy to announce that ILSpy (starting with 1.0.0.528) can now decompile enumerators.

    Grab the new ILSpy build while it's hot, or just look at the obligatory screenshot:

    If you want to understand the code generated by the compiler, you can disable this new feature in the new 'View > Options' dialog. Or you could read Jon Skeet's great article on this topic: Iterator block implementation details: auto-generated state machines.

    Here's the generated MoveNext() code for the SelectMany implementation:

        private bool MoveNext()
        {
            bool flag;
            try {
                int i = this.$1__state;
                if (i == 0) {
                    this.$1__state = -1;
                    this.$7__wrap17 = this.source.GetEnumerator();
                    this.$1__state = 1;
                    goto IL_B0;
                }
                if (!3) {
                    goto IL_C6;
                }
                this.$1__state = 2;
                IL_9D:
                if (this.$7__wrap19.MoveNext()) {
                    this.<subElement>5__16 = this.$7__wrap19.Current;
                    this.$2__current = this.<subElement>5__16;
                    this.$1__state = 3;
                    flag = true;
                    return flag;
                }
                this.$m__Finally1a();
                IL_B0:
                if (this.$7__wrap17.MoveNext()) {
                    this.<element>5__15 = this.$7__wrap17.Current;
                    this.$7__wrap19 = this.selector.Invoke(this.<element>5__15).GetEnumerator();
                    this.$1__state = 2;
                    goto IL_9D;
                }
                this.$m__Finally18();
                IL_C6:
                flag = false;
            } catch { // in IL, this is a try-fault block, but C# doesn't have those...
                this.Dispose();
                throw;
            }
            return flag;
        }

    Now how can one map the generated code back to the original C#? The general idea is simple (the devil is in the details...):

    Every time the code assigns to this.current, this.state and then returns, we transform that into a "yield return" instruction and a "goto" instruction to the label belonging to the new state. Because we run this transformation very early in the decompiler's pipeline (prior to any control flow analysis), the following steps will pick up on the "goto"s and be able to detect loops and simplify the "goto"s away.

    However, how do we determine the label that is responsible for (to give an example) state 3? The answer is 'IL_9D', but figuring this out is non-trivial: the C# compiler makes use of if-statements (to be exact: beq and bne.un), switch statements, and mixtures of both. Moreover, switch statements are usually preceded by subtractions, as the IL switch only deals with cases 0 to n-1. The ILAst for the beginning of the above MoveNext() method looks like this:

        stloc(var_1_06, ldfld(Enumerable/<SelectManyIterator>d__14`2<TSource, TResult>::<>1__state, ldarg(0)))
        brtrue(IL_17, ceq(ldloc(var_1_06), ldc.i4(0)))
        brtrue(IL_96, ceq(ldloc(var_1_06), ldc.i4(3)))
        br(IL_C6)
        IL_17:
        stfld(Enumerable/<SelectManyIterator>d__14`2<TSource, TResult>::<>1__state, ldarg(0), ldc.i4(-1))
        ...

    If you haven't been following the previous posts: the ILAst is an intermediate data structure used in the decompiler. It represents an IL program using nested expressions, thus eliminating the IL evaluation stack. At the point where the "yield return" transformation runs, opcodes have already been simplified, so "beq" now is "brtrue(ceq)".

    To determine where MoveNext() will branch to in a given state, ILSpy will simulate the execution of the beginning of the MoveNext() method. It does this symbolically: "this.$1__state" evaluates to (state+0). In general, "values" in this symbolic execution are (x), (state+x), (state==x) and (this), where x is an int32. The execution will go linearly through the ILAst; it works on the assumption that there are no backward branches. Execution stops once it encounters a statement it doesn't understand - usually, this is the assignment "this.$1__state = -1;", which indicates that the enumerator started executing. For each statement in the ILAst, the range of states that can lead to that value is stored.

    So the result of the analysis is the following table:
        IL_17: state 0 to 0
        IL_96: state 3 to 3
        IL_C6: state int.MinValue to -1; 1 to 2, and 4 to int.MaxValue

    This allows us to reconstruct the control flow in the MoveNext() method. However, one piece of the puzzle is still missing: the try-finally blocks. The C# compiler doesn't compile any of those into the MoveNext() method. Instead, it puts each finally block into its own method, and calls them in the MoveNext() method only on the regular exit of the try blocks. In case of an exception, the try-fault handler simply calls Dispose(), which takes care of calling the finally blocks depending on the current state:

        void System.IDisposable.Dispose()
        {
            switch (this.$1__state) {
                case 1:
                case 2:
                case 3{
                    try {
                        switch (this.$1__state)
                        {
                            case 2:
                            case 3:
                            {
                                try {
                                } finally {
                                    this.$m__Finally1a();
                                }
                                break;
                            }
                        }
                    } finally {
                        this.$m__Finally18();
                    }
                    return;
                }
            }
        }
        private void $m__Finally18()
        {
            this.$1__state = -1;
            if (this.$7__wrap17 !null) {
                this.$7__wrap17.Dispose();
            }
        }
        private void $m__Finally1a()
        {
            this.$1__state = 1;
            if (this.$7__wrap19 !null) {
                this.$7__wrap19.Dispose();
            }
        }

    We analyze the Dispose() method using the same symbolic execution that we used for the jump code at the beginning of MoveNext(). This tells us that $m__Finally1a is called in states 2 and 3; and that $m__Finally18 is called in states 1 to 3. Using this information, we can reconstruct the try-finally blocks within MoveNext(). The remaining parts of the ILAst pipeline then take care to replace the "goto"s with loop and if structures. Finally, the C# pattern transformations take care of translating the code back to the foreach pattern, resulting in the highly readable code in the screenshot at the beginning of this post.

     

  • ILSpy - decompiler architecture

    In my introductory post two weeks ago, I said we had not decided on a decompiler engine yet. The decision was made soon after: we found some issues in the Cecil.Decompiler design, and so decided to move forward with our own decompiler engine (based on David's dissertation). David will post more about those issues and how we can avoid them in the ILSpy design.

    Here, I will write about the ILSpy architecture. But actually, the best way to learn about it is to download the source code, compile a debug build, and run it:

    As you can see in this screenshot, the debug builds offer several more "languages" than the release builds (which show C# and IL only). These additional languages represent the intermediate steps between the IL code and the decompiled C# code.

    The decompiler works on two different representations: first, it transforms the IL code into an intermediate language we call "ILAst". In this language, every IL instruction is represented by exactly one function. Such functions calls can be nested when one IL instruction directly consumes the output of another:

    callvirt(UIElement::set_IsEnabled, ldfld(AboutPage/<>c__DisplayClass7::button, ldarg(0)), ldc.i4(0))

    In essence, the ILAst is a structured representation of the IL code. Further transformation steps introduce even more structure into this ILAst, such as loops and if/else constructs.

    The last step done on the ILAst representation is type inference. If you look at the example like above, you'll see that the UiElement.set_IsEnabled method is called with the integer 0 as argument. However, the setter expects a boolean in C#. The issue here is that IL does not contain the full type information of the original program. For methods and fields, and even for local variables, full type information is available in the IL code. But for temporary results (stored on the IL evaluation stack), the CLR uses a less strict type system. In that type system, the type "I4" can stand for any of the following types: int bool uint short ushort byte sbyte, and also for any enum based on one of those integer types. The type analysis step uses inference to determine which of the C# types should be used by the decompiler. This results in the following ILAst:

    callvirt:void(UIElement::set_IsEnabled, ldfld:Button(AboutPage/<>c__DisplayClass7::button, ldarg:AboutPage/<>c__DisplayClass7(0)), ldc.i4:bool(0))

    This information allows the decompiler in the following step to know that the literal 0 was actually the literal false. So in the C# AST (which is built on top of the NRefactory library) we get the following statement:

    this.button.IsEnabled = false;

    Technically, we already have valid C# at this point. But we still proceed to transform this C# code, both to simplify away artifacts introduced by the compiler (or decompiler); and to introduce support for some of the more higher-level constructs in the C# language. In this example, the statement that was disabling the button was actually part of an anonymous method. The "DelegateConstruction" transformation step will inline anonymous methods into the method where they are instanciated. It also removes all traces of the "DisplayClass" (the class used by the C# compiler to represent the closure), leading to the following code:

        button.Click +delegate {
            button.IsEnabled = false;
            ...
        }

    Note that by inlining the anonymous method, the "this" was changed to refer to the closure instance, which was then subsequenly removed. The final code directly works on the local button variable, and none of the display class/closure implementation remains. The decompiled code now is almost identical to the original source code.

  • ILSpy - Disassembler

    Even though the killer feature that everyone is waiting for is the decompiler; it's often still is interesting to directly look at the IL code. Of course, this feature can't be missing in a tool called ILSpy.
    Implementing it also provided me with a way to test the GUI logic while the decompiler is still under construction.

    We use AvalonEdit for the code view, so you can expect some advanced features:
    The most obvious one is syntax highlighting, which is using AvalonEdit's built-in highlighting engine. As usual for AvalonEdit, copying the text into any HTML-capable application (such as this blog's edit box in Firefox) will preserve the highlighting.
     .method public static hidebysig string CreateHtmlFragment

    Code Folding allows you to collapse and expand methods - useful if you're viewing a complete class.
    Finally, there's an important invisible feature: every reference is a hyperlink.
    Click on a branch target (e.g. IL_0063), and the code view jumps to the target IL instruction. Click on a type/member reference, and it will be selected in the tree view and will be decompiled.

    The disassembler itself also can perform a nice trick: it will display the code in an indented form.
    For this purpose, I wrote a very simple detector for try-catch-finally and loop structures.
    Basically, the exception handling table in the method's footer is converted into a tree, and then displayed inline (ILDasm has this feature as well).

    For loops, the detection logic is dead simple: if there's a backwards branch in the code, then that's probably a loop with the body from start to end. To verify whether it's indeed a well-formed loop, I find all branches from outside the loop body jumping into the loop body, and test whether all of those jump to the same instruction. If there are multiple entry points, then the structure is not considered a loop.
    As a last test, any loops that cannot be inserted into the tree structure (because the don't nest properly and overlap with loops detected earlier) are ignored as well.

    So if you have IL code like this: "ABAB" where A is a loop and B is another (with jump instructions from the end of one A to the start of the other), then this simple algorithm will be wrong and detect only one loop "(ABA)B" and incorrectly consider B to be part of the loop body.
    I experimented with some other loop detection algorithms and actually had one which worked pretty well and could handle even the above case. However, the loops detected by that algorithm were not necessarily consecutive code blocks (as with the example above), so they couldn't be displayed in the disassembler view without reording the IL code.
    I don't know of any compilers that generate loops in non-consecutive blocks of code; but some obfuscators employ such unusual code patterns. The decompiler will likely used a more advanced form of loop detection in order to deal with those cases.

  • ILSpy - a new .NET assembly inspector

    First, I'll let the screenshot speak for itself:

    Now that RedGate has announced that Reflector will no longer available be for free; the SharpDevelop team has started to create an open-source replacement: ILSpy

    Just as any of the assembly browsers inspired by ILDasm, the UI is simple: a tree view on the left allows you to view the contents of the assembly; the text view on the right shows the contents of the selected method.

    As for the decompiler engine itself, we still haven't decided - the screenshot above was created with Cecil.Decompiler, and you can clearly see that there are quite a few issues even in such a trivial method. We do have an alternative solution - David Srbecky (who wrote SharpDevelop's debugger) has written his own decompiler in early 2008 as part of his dissertation. However, tests show that whichever library we pick up, it will take a lot of work until the results come anywhere close to what people are used to.

    The new GUI is written in WPF and reuses a few SharpDevelop components - the tree view is SharpTreeView, written by Ivan Shumilin for the SharpDevelop WPF Designer outline view. This tree view is not used in any other place so far, but that will likely change in the future, as it has additional features over the normal treeview: multiselection, support for columns (GridView mode), and a built-in framework for copy/paste and drag'n'drop. The drag'n'drop support is already used in ILSpy to allow the user to reorder the assembly list.

    The text view on the right is, of course, using AvalonEdit, SharpDevelop's text editor (new as of version 4.0).

    If you are interested in contributing, write me a mail; or just join us in #sharpdevelop (on freenode).

  • Creating a localizable WPF dialog

    This post explains how to create a localizable dialog with WPF using the SharpDevelop infrastructure, i.e. when writing a SharpDevelop AddIn. If you are writing a standalone application, you can still get ideas from the post, but you will have to define the referenced styles and markup extensions yourself.

    If we create a simple dialog using the designer, the XAML code might look like this:

    <?xml version="1.0" encoding="utf-8"?>
    <Window
        x:Class="SomeNamespace.MyDialog" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Height="125" Width="300"
        Title="This is my dialog">
        <Grid>
            <Grid.RowDefinitions>
                <RowDefinition Height="1*" />
                <RowDefinition Height="Auto" />
            </Grid.RowDefinitions>
            <TextBox
                Grid.Column="0" Grid.Row="0"
                Margin="8,8,8,8">
            This big text box is representing the main content.
            </TextBox>
            <StackPanel
                Orientation="Horizontal"
                Grid.Column="0" Grid.Row="1"
                HorizontalAlignment="Right" VerticalAlignment="Bottom"
                Margin="0,0,8,4.25"
                Width="160" Height="24.5">
                <Button
                    Content="OK"
                    IsDefault="True"
                    Width="75" Height="23" />
                <Button
                    Content="Cancel"
                    IsCancel="True"
                    Margin="4,0"
                    Width="75" Height="23" />
            </StackPanel>
        </Grid>
    </Window>

    This results in the following window:

    Now, there is a whole bunch of issues with this dialog. The most obvious are that the text on the buttons is blurry, and that the background color is wrong (dialogs are expected to use the Control color, a light grey). Also, this window is shown in the task bar - dialogs shouldn't do that. We can fix all of these issues by applying a style to the window:

    <Window ... xmlns:core="http://icsharpcode.net/sharpdevelop/core" Style="{x:Static core:GlobalStyles.DialogWindowStyle}">

    This style will do the following things:

    • Use 'Control' as background color
    • Hides the window from the task bar
    • Enable WPF 4 layout and text rendering (makes everything less blurry)
    • Enables right-to-left layout if the current language is a right-to-left language

    If you have a normal window (not dialog), then you can also use {x:Static core:GlobalStyles.WindowStyle}, which does not set the background color and does not hide the window in the task bar.

    Now let's get to the localization part: we need to load the translated string resources. The {core:Localize} markup extension can be used to do this (assuming the string resources are registered with the SharpDevelop ResourceService):

    <Window ... Title="{core:Localize SomeNamespace.MyDialog.Title}">
    ...
    <Button
    Content="{core:Localize Global.OKButtonText}" .../>

    There are a few global resources for commonly used elements, but all other strings should be added using keys that indicate where the resource is being used.

    But now we're running into more potential issues: what if the translated string doesn't fit on our button? All explicitly specified widths are potential trouble-makers, so let's get rid of them. Let's trust the automatic layout being done by WPF.

    The title is missing because we forgot to define that string in the resource file. But let's ignore that issue and concentrate on the buttons instead.

    'OK' is way too small, and even 'Abbrechen' (German for Cancel) could use some more spacing between the button border and the text. Fortunately, SharpDevelop again provides a style for this common issue. We could apply that style to each button individually, but let's register it in our window so that it automatically gets picked up by all buttons:

        <Window.Resources>
            <Style TargetType="Button" BasedOn="{x:Static core:GlobalStyles.ButtonStyle}"/>
        </Window.Resources>

    This style will give the button the correct padding (9,1) and minimum width (73).

    These buttons look pretty reasonable - but if you look closely, you'll notice that the "Abbrechen" button is a bit wider than the "OK" button (80 pixels vs. 73 pixels). In this case, it's not a big problem; but the effect might be more pronounced in other languages, or with other text on the buttons; so let's take a look at how to fix this. WPF has the container "UniformGrid" to assign the same size to a set of controls. However, if you try to apply that in this case, you'll notice that the UniformGrid will include the margin in the size calculation, so whichever button has the margin set will appear to be a bit smaller.

    There are two solutions to this problem: either evenly distribute the margin over both buttons (give OK a right-margin of 2; and Cancel a left-margin of 2), or use the UniformGridWithSpacing container. Here, we use the latter approach, which has the advantage that it can be extended to more than 2 buttons without having to think about the distribution of margins.

    UniformGridWithSpacing is defined in ICSharpCode.SharpDevelop.Widgets, so we'll need to import that namespace: xmlns:widgets="http://icsharpcode.net/sharpdevelop/widgets"

    Here's how you use the grid:

            <widgets:UniformGridWithSpacing Columns="2"
                Grid.Column="0" Grid.Row="1"
                HorizontalAlignment="Right"
                Margin="0,0,12,12">
                <Button Content="{core:Localize Global.OKButtonText}" IsDefault="True" />
                <Button Content="{core:Localize Global.CancelButtonText}" IsCancel="True" />
            </widgets:UniformGridWithSpacing>

    The spacing can be defined using the SpaceBetweenColumns property, but that's not necessary in this case as the default value (7) is correct for this purpose. And yes, the Windows User Experience Interaction Guidelines really suggest 7 pixels here; not 4 or 8 as is often mistakenly assumed (I made the same mistake in the Window we started with).

    Finally, you should ensure that your dialog doesn't show a common bug: open your dialog, then switch to another application, then switch back to SharpDevelop. What should happen is that your dialog appears, forcing the user to finish whatever he was doing with your dialog. If the dialog does not appear and the SharpDevelop main window is unresponsive, you forget to give your dialog an owner. In the code creating the window (ideally just before calling w.ShowDialog()), add:

    w.Owner = WorkbenchSingleton.MainWindow;

    If your dialog is triggered by another dialog, then use your immediate parent window as owner instead.

    Posted Nov 05 2010, 04:06 PM by DanielGrunwald with no comments
    Filed under:
  • Translations of SharpDevelop

    Unlike SharpDevelop 3.2, which was available in 18 languages, SharpDevelop 4.0 (in its default configuration) will only support 13 languages:

    We had to drop the following languages because they were unmaintained and were lacking translations for too many strings:

    • Chinese (Simplified)
    • Hungarian
    • Italian
    • Romanian
    • Swedish

    For this purpose, we defined the cut-off to be 25% of the total amount of string resources for SharpDevelop (currently about 2600): the above languages were excluded because they have more than 650 missing strings.

    However, we decided to include some of the incomplete translations with SharpDevelop, they are just disabled by default. Qualified for this inclusion are languages with less than 50% of translations missing: the five languages mentioned above, and Russian.

    By editing C:\Program Files(x86)\SharpDevelop\4.0\data\Resources\Languages\LanguageDefinition.xml (assuming the default installation location), you can enable these partial translations. The file also contains out-commented entries for other languages that were dropped long ago - enabling those won't have any effect as these languages do not ship with SharpDevelop.

    If you want to revive one of these old translations, start a new translation, or simply join one of the existing translation teams, please contact Christoph Wille (christophw <at> icsharpcode.net). You will get access to a Web application that can be used to enter translations online. The database behind this web application is then used to generate the resource files for SharpDevelop (please do NOT translate the .resx file!).

    Looking at the currently shipping translations, Dutch, French, German, and Spanish (both versions) are doing fine [thanks to the respective translators!], but it seems that Czech, Korean, Norwegian, Polish, Portuguese, and Turkish could use some help.

  • Source Code Repository Migrated to GitHub

    Last Saturday, we migrated our self-hosted Subversion repository to git and started using github for hosting.

    As part of this change, the SharpDevelop repository was split up in three parts:

    The SharpZipLib repository was migrated as well.

    You only need the main repository to compile and extend the SharpDevelop code.

    As part of this change, I also added a simple Git AddIn to SharpDevelop 4.0. It allows invoking the TortoiseGit commit dialog from the SharpDevelop project browser.

    With the move from Subversion to git, we also had to change the way we handle version numbers. Git allows distributed development and encourages branching, so it impossible to assign a simple increasing number to commits.

    So instead, we now calculate revision numbers based on the history: we count the number of commits between a known starting point and the current version. At least for the builds produced by our build server, this gives the illusion of an increasing revision number. Moreover, anyone checking out the same commit from git will calculate the same revision number. However, the numbers are not unique for builds created on different branches.

    The counting is done using: git rev-list 6eceaaafce5ed9b45d19a1645b1b012675aac996..HEAD | wc -l (the hash is the known starting commit)

    In fact, the numbers are now counting independently on each branch, synchronizing only when branches are merged into each other. For this reason, SharpDevelop now stores additional information about the commit it was built from:

    • Version number
    • Branch name (for feature branches)
    • Short commit hash

    The same information is shown in the version info inside crash reports. Note that the branch name is included only for feature branches - the "master" branch and branches starting with a digit are considered version branches and will not be shown. The feature branch name will also be shown on the SharpDevelop splash screen.

    For the purpose of looking up "which build corresponds to this commit" (or vice versa), we made our build server push this information into the git repository. For example, if you take a look at the commit b6f4ade7, you will see a "git note" at the end of the github page which says "Build 4.0.0.6500 on master successful".

  • Compiling for .NET 4.0 without installing the Windows SDK

    If you've tried to compile for .NET 4.0 using MSBuild on the command line without having the Windows SDK or Visual Studio 2010 installed, you'll probably have noticed this warning message:

    warning MSB3644: The reference assemblies for framework ".NETFramework,Version=v4.0" were not found. To resolve this, install the SDK or Targeting Pack for this framework version or retarget your application to a version of the framework for which you have the SDK or Targeting Pack installed. Note that assemblies will be resolved from the Global Assembly Cache (GAC) and will be used in place of reference assemblies. Therefore your assembly may not be correctly targeted for the framework you intend.

    Additionally, if the target platform of your project is set to AnyCPU, MSBuild will copy several files from the .NET Framework into your output directory:

    • mscorlib.dll
    • norm*.nlp
    • System.Data.dll
    • System.Data.OracleClient.dll
    • System.EnterpriseServices.dll
    • System.EnterpriseServices.Wrapper.dll
    • System.Transactions.dll
    • System.Web.dll

    This happens even if your project doesn't reference any of the named .dlls.

    So what are these "reference assemblies", are why are they needed for compilation? First, let's look back at .NET 2.0 (which didn't have reference assemblies) so that we can understand the problem that Microsoft wanted to solve.

    In .NET 2.0, MSBuild tells the compilers to directly reference the assemblies from the Framework's directory. The problem was: when .NET 2.0 SP1 added new classes and members, developers could accidentally use the new classes and make their program incompatible with the unpatched .NET 2.0.

     In .NET 4.0, Microsoft solves this problem by separating the assemblies used for compilation from the assemblies used for runtime. When you install the .NET Framework, you only get the runtime assemblies (installed in the GAC). These may be patched by hotfixes or service packs. However, the patches will not touch the separately installed reference assemblies: the compiler will continue to check your code against the original .NET 4.0 API. In fact, if you try to open the reference assemblies in Reflector, you won't see any code: these assemblies have only the metadata and do not contain any IL instructions. They are intended as a API reference only.

    But, while reference assemblies certainly are a good idea, not everybody has them installed. The ".NET 4.0 Targeting Pack" is not available as a standalone download, the only way to get it is to install the Windows SDK 7.1 (a 585 MB download) or to install Visual Studio 2010. So we decided that SharpDevelop 4.0 should be usable without having the reference assemblies installed. You won't get their benefits (when the .NET runtime gets patched), but you should be able to work as you could with .NET 2.0.

    So what did we do? First, I implemented a workaround for the "copy local" bug. I don't know why that bug occurs only when targeting AnyCPU, but I understand why this depends on the reference assemblies: MSBuild uses a file called "RedistList\FrameworkList.xml" to decide which references should, by default, be copied into the application directory, and which are part of the .NET framework and thus always present in the GAC. This file is part of the reference assemblies, so MSBuild must be using something else when those aren't installed. The workaround in SharpDevelop adds a custom MSBuild task to the assembly resolution target, which sets CopyLocal=false for all .NET assemblies (for this purpose, SharpDevelop contains a hard-coded list of .NET assemblies).

    Second: I found the MSBuild warning really annoying - it appears once per reference, so that's a few hundred warnings in a large solution. SharpDevelop 4.0 now simply ignores MSB3644.

    Together, these two small changes make builds of .NET 4.0 projects work as expected, even if no Windows SDK is installed.

    By the way: you can also use our workaround for the "copy local" bug in command line builds: call MSBuild with the parameter /p:CustomAfterMicrosoftCommonTargets=path-to-sharpdevelop\bin\SharpDevelop.TargetingPack.targets

More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
Don't contact us via this (fleischfalle@alphasierrapapa.com) email address.