SharpDevelop Community

Get your problems solved!
Welcome to SharpDevelop Community Sign in | Join | Help
in Search

Daniel Grunwald

  • ILSpy 2.2 release

    ILSpy 2.2 is available!

    ILSpy_2.2.0.1706_Binaries.zip
    ILSpy_2.2.0.1706_Source.zip

    It's been more than 2 years since the previous release. The ILSpy core team has been busy with SharpDevelop 5, so we haven't done much on ILSpy -- just some bugfixes.

    However, we've had several external contributors who have supplied us with a few new features (and lots of bugfixes):

    • #345: Added option to allow folding on all braces
    • #345: Added context menu to code view with folding commands
    • #384: Show all images contained in .ico resource
    • #423: Decompiling as a Visual Studio project now creates AssemblyInfo file
    • #467: Added option to display metadata tokens in tree
    • Fixed lots of decompilation bugs

    In the future, our team would like to get back to work on ILSpy. We have some ideas for a new decompiler engine that could fix many of the remaining decompilation issues, and hopefully simplify our code at the same time.

  • Architecture changes in SharpDevelop 5.0 Beta 1

    Last Sunday, we released SharpDevelop 5.0 Beta 1. This is a major milestone for us: SharpDevelop 5.0 Beta 1 ships architecture changes that we have been working on since 2010. These changes enable us to provide a better experience when editing C# code: semantic highlighting, background detection of syntax errors and other issues, and tons of refactorings.

    However, there is also a downside to these major architecture changes: they required a rewrite of most of our language bindings, and languages other than C# have lost features in the transition. Most dramatically, even essential features such as code completion are no longer available for Visual Basic. And the Boo language is no longer supported at all.

    In this post, I will try to give an overview of the most important API changes in SharpDevelop 5.0. I hope this information will be useful to developers of third-party AddIns, and other users of the SharpDevelop codebase.
    Most difficulty in porting an AddIn from SD 4.x to 5.0 is with the redesigned ICSharpCode.NRefactory and ICSharpCode.SharpDevelop.Dom libraries. These were rewritten from scratch, and there have been some major conceptual changes (e.g. much more pervasive immutability; and the split of the type system into resolved+unresolved variables), which might require significant changes to to your AddIns.

    If your AddIn is not a language binding and does not provide refactorings, porting it will not be as difficult; although there are still plenty of other API changes in SharpDevelop 5.0 Beta 1.
    Many of these other changes are trivial, like types being moved to different namespaces/libraries, or methods being moved to different classes.

    An incomplete list of classes that were renamed/moved can be found in the SharpDevelop wiki. Note that several important types were moved to the ICSharpCode.NRefactory library. You might have to add a reference to ICSharpCode.NRefactory If your AddIn doesn't already have that reference.

    NRefactory was rewritten from scratch, and SharpDevelop.Dom was replaced by NRefactory 5

    In SharpDevelop 4.x, 'NRefactory' was the C# (and VB) parser library. Semantic analysis and code completion belonged to the 'SharpDevelop.Dom' library. This changes with SharpDevelop 5.0: NRefactory now takes over the responsibilities of the old SD.Dom, and thus has much more central position in the SharpDevelop infrastructure.

    While SharpDevelop 4.x had a single type system (in the old SD.Dom), SharpDevelop 5.0 has no less than three different type system APIs:

    • Unresolved type system (NR.TypeSystem.IUnresolved*)
    • Resolved type system (NR.TypeSystem)
    • Mutable, observable type system model (new SD.Dom)

    In comparison, the old type system in SD 4.x was part immutable, part mutable, but not observable. When porting code from SD 4.x, be careful which of the new type systems you use.

    The unresolved type system should be used if you are providing a type system to the SD infrastructure (IParser.Parse implementation in your language binding). Try to avoid using the unresolved type system for anything else.

    The resolved type system is the most powerful of the three options, and should be used in most cases. However, you should not hold onto references to the resolved system for an extended amount of time: doing so will hold a snapshot of the whole solution in memory!
    Instead, if you need to hold on to a reference to the type system, use the new SD.Dom. This mutable model tracks changes to the type system, and thus doesn't keep old versions of the project in memory.
    You can use the .GetModel() extension method to convert from resolved type system to SD.Dom, and the .Resolve() method to convert back.

    For an introduction to NRefactory, take a look at the article "Using NRefactory for analyzing C# code".

    Static services were replaced with interfaces

    To make SharpDevelop AddIn code easier to test, we have replaced many of the static service classes in SharpDevelop with interfaces. The new static class "SD" has static properties for retrieving references to the services, so the call "ResourceService.GetString()" becomes "SD.ResourceService.GetString()". To make porting code easier, some commonly used static service classes still exist (e.g. MessageService and LoggingService), but they just forward calls to the interface (so are still testable). However, not all services have been converted to interfaces yet.

    In unit tests, Rhino.Mocks (or another mocking framework of your choice) can be used to easily create mocks/stubs of the services:

    SD.InitializeForUnitTests(); // initialize container and replace services from previous test cases
    SD.Services.AddService(typeof(IParserService), MockRepository.GenerateStrictMock<IParserService>());
    SD
    .ParserService.Stub(p => p.GetCachedParseInformation(textEditor.FileName)).Return(parseInfo);
    SD
    .ParserService.Stub(p => p.GetCompilationForFile(textEditor.FileName)).Return(compilation);

    It is possible to define a service interface in ICSharpCode.SharpDevelop.dll and have the implementation somewhere else (SharpDevelop will find the implementation through the addin tree).
    This allows for AddIns to consume each other's functionality (e.g. debugger accessing the decompiler service) without having to define a custom extension point in each case.
    We are also trying to separate the public API from the implementation details. For example, instead of a public class NewFileDialog, we simply provide a 'ShowFileDialog()' API. This reduction of the API surface will allow us to make major changes (for example, reimplement the dialog using WPF instead of WinForms) without breaking consumers of the API.

    Namespaces in ICSharpCode.SharpDevelop reorganized

    Along with the API surface reduction, we are moving around types in ICSharpCode.SharpDevelop.dll, reorganizing the namespace structure.
    AddIns code will need to update plenty of using directives to follow along.

    As for the reasoning behind this change: the old namespace structure was mostly based on a separation of 'Gui' and 'Services'.
    However, if UI code is properly separated from the underlying logic, this means that essentially every feature ends up having both GUI and service components. With the new namespace structure, we re-group the types into feature areas instead.

    AddInTree paths reorganized

    Plenty of AddIn tree paths have been changed to better match the new namespace structure.
           
    I used a global replace operation for renaming paths; so AddIns that are in the SharpDevelop repository but not in the SharpDevelop solution (e.g. the sample AddIns) should have been adjusted as well.

    However, 3rd party AddIns will have to be manually adjusted. Our wiki has a list with some of the changes (second table, at the end of the page).

    SD.MainThread

    The new best way to invoke a call on the main thread is:
      SD.MainThread.InvokeAsyncAndForget(delegate { ... });

    Note that InvokeAsync returns a Task (like all .NET 4.5 *Async APIs). If any exceptions occur while executing the delegate, they will get stored in the task object. This can cause the exception to get silently ignored if the task object isn't used later. The "InvokeAsyncAndForget()" method works like the old "BeginInvoke()" method and does not try to catch any exceptions.

    It is also often possible to avoid explicit thread switching alltogether by using the C# 5 async/await feature.

    ICSharpCode.Core.ICommand was replaced with WPF ICommand

    SharpDevelop 4.x was still using our own ICommand interface, which we introduced back in the .NET 1.0 days. Now that System.Windows.Input.ICommand is independent from WPF (it was moved to System.dll so that it can be used by Win8 apps), we decided to remove our own ICommand interface and just use the equivalent interface from the .NET framework.

    There are some minor API differences, so we kept the old base class "AbstractMenuCommand" around, so that existing commands do not require any modifications. For new commands, you should derive from "SimpleCommand" instead of "AbstractMenuCommand".

    SD.PropertyService

    The Get()/Set() methods no longer support nested Properties objects or lists of elements -- you will need to use the new dedicated GetList()/SetList()/NestedProperties() methods for that.

    The Get() method no longer causes the default value to be stored in the container; and GetList() results in a read-only list - an explicit SetList() call is required to store any modifications. However, a nested properties container still is connected with its parent, and any changes done to the nested container will get saved without having to call the SetNestedProperties() method.

    The property service now uses XAML serialization instead of XML serialization. This might require some changes to your classes to ensure they get serialized correctly:

    • Make sure the class being serialized is public so that it can be constructed from XAML code
    • Do not expose public fields; use properties instead.

    We decided on XAML serialization because it works OK to preserve settings when upgrading to another SharpDevelop version that adds additional properties to the deserialized class (and has a different full assembly name due to the version number). It also has a much faster startup time than the XmlSerializer.

    SD.ParserService

    The result of a parser run (ParseInformation) now may contain a fully parsed AST.
    The ParserService may cache such full ASTs, but may also drop them from memory at any time. Currently, this is implemented by keeping the last 5 accessed files in the cache.
    Every parse information also contains an IUnresolvedFile instance with the type system information. This IUnresolvedFile is stored permanently (both in the ParserService and in the IProjectContents).

    Solution model

    The class 'Solution' has been replaced with the interface 'ISolution'.
    The static events that report changes to the solution (e.g. project added) no longer exist on IProjectService; instead the ISolution.Projects collection itself has a changed event.

    Text editor and document services

    In SharpDevelop 4.x, it was possible to use IDocument.GetService(typeof(ITextEditor)) to find the editor that presents the document.
    This is no longer possible in SharpDevelop 5, as the same IDocument may be used by multiple editors (once issue #303 is implemented).

    ITextEditor and IDocument now use separate service containers.
    ITextEditor.GetService() will also return document services, but not the other way around.

    The attributes [DocumentService] and [TextEditorService] are used to mark the service interfaces that are available in the document and in the editor respectively. The attributes exist purely for documentation purposes.

    View content services

    Instead of casting a view content to an interface "var x = viewContent as IEditable;",
    SD5 must use "var x = viewContent.GetService<IEditable>()".

    This allows the view content implementation to be flexible where the interface is implemented (it no longer is necessary to implement everything in the same class).

    Interfaces that are supposed to be used as view content services are documented using the [ViewContentService] attribute. In the case of the AvalonEditViewContent, all text editor and document services are also available via IViewContent.GetService().

    OpenedFile models

    The SD-1234 refactoring still makes life difficult for IViewContent implementations. We have concrete plans for fixing this in the 5.0 release, but this didn't make it into Beta 1. All view contents will likely need to be adjusted once this change lands.

     

  • NRefactory 5 article

    I have published a CodeProject article documenting NRefactory 5: Using NRefactory for analyzing C# code

    While NRefactory 5 itself is pretty much stable, there's still a lot of work to do for SharpDevelop 5 to use it -- pretty much everything in SharpDevelop dealing with source code needs to be ported to the new NRefactory 5 API.

    But I think this article shows that NRefactory 5 will open up a lot of new feature possibilities. For example, we could detect problems like the missing StringComparison described in the article directly in the SharpDevelop editor while you're typing, no need to recompile. In fact, the MonoDevelop project has two GSoC students working on such live issue detection and other code actions (small refactorings), and most of their code will be usable in both IDEs.

    So you can look forward to a much improved C# editing experience in SharpDevelop 5 :)

  • Decompiling Async/Await

    Just after the ILSpy 2.0 release, I started adding support for decompiling C# 5 async/await to ILSpy.

    You can get the async-enabled ILSpy build from our build server.

    The async support is not yet complete; for example decompilation fails if the IL evaluation stack is not empty at the point of the await expression.

    The decompilation logic highly depends on the patterns produced by the C# 5 compiler - it only works with code compiled with the C# compiler in the .NET 4.5 beta release, not with any previous CTPs. Also, it is likely that ILSpy will need adjustments for the final C# 5 compiler.

    While testing, I found that the .NET 4.5 beta BCL was not compiled with the beta compiler - where the beta compiler uses multiple awaiter fields, the BCL code uses a single field of type object and uses arrays of length 1. This is similar to the code generated by the .NET 4.5 developer preview, so my guess is that Microsoft used some internal version in between the developer preview and the beta for compiling the .NET 4.5 beta BCL. For more information, take a look at Jon Skeet's description of the async codegen changes.
    This means the ILSpy cannot decompile async methods in the .NET 4.5 beta BCL. This problem should disappear with the next .NET 4.5 release (.NET 4.5 RC?).

    So how does ILSpy decompile async methods, then? Consider the compiler-generated code of the move next method:

    // Async.$AwaitInLoopCondition$d__17
    void IAsyncStateMachine.MoveNext()
    {
        try
        {
            int num = this.$1__state;
            TaskAwaiter<bool> taskAwaiter;
            if (num == 0)
            {
                taskAwaiter = this.$u__$awaiter18;
                this.$u__$awaiter18 = default(TaskAwaiter<bool>);
                this.$1__state = -1;
                goto IL_7C;
            }
            IL_23:
            taskAwaiter = this.$4__this.SimpleBoolTaskMethod().GetAwaiter();
            if (!taskAwaiter.IsCompleted)
            {
                this.$1__state = 0;
                this.$u__$awaiter18 = taskAwaiter;
                this.$t__builder.AwaitUnsafeOnCompleted<TaskAwaiter<bool>, Async.$AwaitInLoopCondition$d__17>(ref taskAwaiter, ref this);
                return;
            }
            IL_7C:
            bool arg_8B_0 = taskAwaiter.GetResult();
            taskAwaiter = default(TaskAwaiter<bool>);
            if (arg_8B_0)
            {
                Console.WriteLine("Body");
                goto IL_23;
            }
        }
        catch (Exception exception)
        {
            this.$1__state = -2;
            this.$t__builder.SetException(exception);
            return;
        }
        this.$1__state = -2;
        this.$t__builder.SetResult();
    }

    The state machine works similar to the one used by yield return; so we could reuse a lot of the code from the yield return decompiler.
    Each try block begins with a state dispatcher: depending on the value of this.$1__state, the code jumps to the appropriate location. If the async method involves exception handling, there will be a separate state dispatcher at the beginning of each try block.
    In this case, there are only two states: the initial state (state = -1) and the state at the await expression (state = 0). The state dispatcher consists only of the two statements "int num = this.$1__state; if (num == 0)". We rely on the fact that in the actual IL code, the state dispatcher is a contiguous sequence of IL instructions, in front of any of the method's actual code.

    Note that the async/await decompiler step runs on the ILAst very early in the decompiler pipeline, immediately after the yield return transform, which is prior to any control flow analysis. We're basically still dealing with IL instructions here; but I'm explaining it in terms of C# as that is easier to read (and makes the code much shorter).

    The analysis of the state dispatcher works using symbolic execution; it is described in more detail in the yield return decompiler explanation. In our example, the result of the analysis is that the beginning of the first if statement is reached for state==0, and label IL_23 is reached for all other states.

    With this information, we start cleaning up the control flow of the method. We look for any 'return;' statements and analyze the instructions directly in front:

                this.$1__state = 0;
                this.$u__$awaiter18 = taskAwaiter;
                this.$t__builder.AwaitUnsafeOnCompleted<TaskAwaiter<bool>, Async.$AwaitInLoopCondition$d__17>(ref taskAwaiter, ref this);
                return;

    We then replace this piece code with an instruction that represents the AwaitUnsafeOnCompleted call (represented as "await ref taskAwaiter;" in the following code), followed by a goto to the label for the target state (using the information gained from the symbolic execution). We also remove the boilerplate associated with the $t__builder and the state dispatcher. For demonstration purposes, I'll skip the remaining steps of the async/await decompiler and resume the pipeline to decompile the ILAst to C#, producing the following code:

    public async void AwaitInLoopCondition()
    {
        while (true)
        {
            TaskAwaiter<bool> taskAwaiter = this.$4__this.SimpleBoolTaskMethod().GetAwaiter();
            if (!taskAwaiter.IsCompleted)
            {
                await ref taskAwaiter;
                taskAwaiter = this.$u__$awaiter18;
                this.$u__$awaiter18 = default(TaskAwaiter<bool>);
                this.$1__state = -1;
            }
            bool arg_8B_0 = taskAwaiter.GetResult();
            taskAwaiter = default(TaskAwaiter<bool>);
            if (!arg_8B_0)
            {
                break;
            }
            Console.WriteLine("Body");
        }
    }

    As you can see, this transformation has simplified the control flow of the method dramatically.

    We now just perform some finishing touches on the method:

    • Access to the state machine fields is replaced with local variable access, e.g. "this.$4__this" becomes "this".
    • We detect the "GetAwaiter() / if (!taskAwaiter.IsCompleted) / GetResult() / clear awaiter" pattern and replace it with a simple await expression

    Mind that all of this isn't done on the C# representation, but in an early stage of the ILAst pipeline. After some simplifications (variable inlining, copy propagation), the resulting ILAst looks like this:

    br(IL_23)
    IL_16:
    call(Console::WriteLine, ldstr("Body"))
    IL_23:
    brtrue(IL_16, await(callvirt(Async::SimpleBoolTaskMethod, ldloc(this))))
    ret()

    Apart from the 'await' opcode, this is exactly the same as the while-loop would look in a non-async method. The remainder of the decompiler pipeline will detect the loop and translate it to the C# code you've seen in the introductory screenshot.

  • MSBuild Multi-Targeting in SharpDevelop

    SharpDevelop has had multi-targeting support for a long time - for example, SharpDevelop 2.0 supported targeting .NET 1.0, 1.1 and 2.0. Our original multi-targeting implementation would not only change the target framework, but also use the matching C# compiler version*.

    When Visual Studio 2008 and MSBuild 3.5 came along and introduced official multi-targeting support, we separated the 'target framework' and 'compiler version' settings. The 'target framework' setting uses the <TargetFrameworkVersion> MSBuild property, which is the official multi-targeting support as in Visual Studio 2008. The 'compiler version' setting determines the MSBuild ToolsVersion, which controls the version of the C# compiler to use - Visual Studio does not have this feature.

    I'll call the latter feature MSBuild Multi-Targeting, as this allows us to pick the MSBuild version to use, and thus enables SharpDevelop to open and edit VS 2005 or 2008 projects without having to upgrade them to the VS 2010 project format.

    Unfortunately, life isn't as simple as that. It turns out that MSBuild 4.0 is unable to compile projects with a ToolsVersion lower than 4.0 if the Windows SDK 7.1 is not installed. To allow users to use SharpDevelop without downloading the Windows SDK, we implemented a simple fix: we use MSBuild 3.5 to compile projects with a ToolsVersion of 2.0 or 3.5. This is why SharpDevelop ships with both "ICSharp­Code.Sharp­Develop.Build­Worker40.exe" and "ICSharp­Code.Sharp­Develop.Build­Worker35.exe".

    Now what happens if SharpDevelop is run on a machine without .NET 3.5? If the framework specified by the 'ToolsVersion' is missing, SharpDevelop crashed with an MSBuild error when opening the project. There were also crashes when creating/upgrading projects to missing ToolsVersions. Moreover, in the rare scenario where .NET 2.0 and .NET 4.0 are installed, but .NET 3.5 is missing, SharpDevelop was able to open the project but the build worker would crash when trying to compile.

    For this reason, the SharpDevelop 4.0 and 4.1 setups require both  .NET 3.5 and .NET 4.0 to be installed. This wasn't an issue when we made that decision - .NET 3.5 is likely to be already installed on most machines. However, Windows 8 will change that - .NET 4.5 is installed by default, but .NET 3.5 is missing. So we added the necessary error handling to SharpDevelop 4.2. The SharpDevelop 4.2 setup no longer requires .NET 3.5 - you'll need it only when targeting .NET 2.0/3.0 or 3.5.

    Another issue is that .NET 4.0 does not ship with the Reference Assemblies - you need to install the Windows SDK to get those. This causes MSBuild to reference the assemblies in the GAC instead, which might be a later version (due to installed service packs or in-place upgrades like .NET 4.5), and also emit massive amounts of warnings (one warning per reference). Moreover, it caused the 'Copy Local' flag to default to true for references to .NET assemblies, causing System.dll etc. to be copied into the output directory.

    At the time, the reference assemblies were only available as part of Visual Studio 2010 - the free Windows SDK 7.1 was released later. So it was a high priority for us to work around this problem. For this reason, SharpDevelop injects a custom MSBuild .targets file into the project being built: SharpDevelop.TargetingPack.targets. This file runs a simple custom MSBuild task that detects references to default .NET assemblies and sets the 'Copy Local' flag to false. (we also inject several other custom .targets files; for example for running FxCop or StyleCop as part of a build)

    We used the Microsoft.Build.Utitilies.dll when implementing this custom task. However, that library ships only with .NET 2.0, not with .NET 4.0, so we had to switch to Microsoft.Build.Utitilies.v4.dll to get the C# 4.0 build working without .NET 2.0. This should not be a problem as the copy local workaround is only included when targeting .NET 4.0 or higher, so we won't try to load it the 3.5 build worker process.

     

    To summarize, the SharpDevelop 4.2 setup requires:

    • Windows XP SP2 or higher
    • .NET 4.0 Full (.NET 4.5 Full will also work)
    • VC++ 2008 runtime (part of .NET 3.5 so most people have it already)
    • In the minimal configuration, you can only compile for .NET 4.0 using MSBuild 4.0.

    Additionally:

    • If .NET 4.5 is installed, the C# 5 compiler will replace the C# 4 compiler; and .NET 4.5 will appear as an additional target framework.
    • If .NET 3.5 SP1 is installed, you will be able to use .NET 2.0/3.0/3.5 as target framework, and C# 2 and C# 3 as compiler versions.
    • Installing the Windows SDK 7.1 is highly recommended (provides reference assemblies and documentation for code completion).
    • Some SharpDevelop features might require installation of additional tools such as FxCop, StyleCopF#, TortoiseSVN, SHFB.

    * Everything said about the C# compiler in this post also applies to the VB compiler.

  • ILSpy 2.0 Beta 1

    After a long pause, we have finally released the first Beta of ILSpy 2.0.

    Download:

    New features compared with version 1.0:

    • Assembly Lists
    • Support for decompiling Expression trees
    • Support for lifted operatores on nullables
    • Integrated Debugger
    • Decompile to Visual Basic
    • Search for multiple strings separated by space (searching for "Assembly manager" in ILSpy.exe would find AssemblyListManager)
    • Clicking on a local variable will highlight all other occurrences of that variable
    • Ctrl+F can be used to search within the decompiled code view
  • SharpDevelop 5 - NRefactory 5 + semantic highlighting

    While we have several more releases of SharpDevelop 4.x planned; we are also in parallel working on a big new release: SharpDevelop 5.0.

    The major change in SharpDevelop 5 will be the complete rewrite of the NRefactory and SharpDevelop.Dom libraries, which together implement code-completion and many refactorings. Work on NRefactory 5 originally started in August 2010, as a new library written from scratch.

    The NRefactory C# parser and abstract syntax tree was rewritten by Mike Krüger. The new version of the AST contains all individual tokens, and has positions on every node - this makes the implementation of refactorings a lot easier.

    The new AST is already in use in ILSpy - when we started working on ILSpy in February, we immediately ported the old decompiler (from David's dissertation) to NRefactory 5. As part of the ILSpy development, I added several new features to NRefactory - for example the pattern matching support for the AST, and the visitor that converts the AST back to C# code.

    I also wrote a new resolver for NRefactory (the resolver previously was part of SD.Dom). The main goals here were to:

    • Improve performance by making the step of resolving type references an explicit method call
    • Follow the C# specification more closely
    • Improve performance in general
    • Add support for projects that target framework X but get used in framework Y projects (ITypeResolveContext).

    I started working on the new resolver in October 2010 and completed it in August 2011 (feature-complete for C# 4.0). All this work was besides the work on SD 4.x and ILSpy, not to mention my regular "job" (I'm a student and work on SharpDevelop in my free time).

    Since then I've started integrating NRefactory 5 into SharpDevelop. This still is a huge task, as basically anything that deals with C# code, - i.e. class browser, code completion, go to definition, find references, all refactorings, the forms designer loader, etc. - needs to be rewritten or at least adjusted.

    But today I tried something a little different: a new feature - semantic highlighting.

        struct Color {
            public static void StaticMethod();
            public void InstanceMethod();
        }
        
        class Program
        {
            Color color;
            public Color Color {
                get { return color; }
                set { this.color = value; }
            }
            
            public void X(int value)
            {
                Color.StaticMethod();
                Color.InstanceMethod();
                Color.MissingMethod();
            }

    This code (with syntax highlighting) was copied out of an early SharpDevelop 5 alpha version. As you can see, we now highlight references to types in a blue-greenish color - but only when semantics of the code actually refer to the type, as the difference between the static and instance method calls shows.

    Fields are highlighted in italics, which allows telling fields and local variables apart in a large method. Additionally, the dark blue/bold method highlighting is now only applied when the method exists - not to delegate-typed fields, and not to missing methods. And context-dependent keywords like value are highlighted only when used in the appropriate context.

    The highlighting engine has the full power of the NRefactory resolver available, which allows use to easily experiment with new types of highlightings - for example, it would take only three lines of code to, say, highlight extension methods in a different color than normal methods. Or value types differently than reference types.

    We also plan to detect syntax errors and the most common semantic errors (method not found, missing parameter, cannot convert from type X to Y) and highlight those errors while typing, so that you don't have to recompile as often.

    There are tons of potential highlightings that give useful information - but if we use all of them, the text editor will look way too colorful and noisy. What do you think is most important?

     

  • ILSpy - Decompiler Architecture Overview - Part 2

    The decompiler pipeline can be separated into two phases: the first phase works on a tree representation of the IL code - I described the steps of that phase in the previous post.

    The second phase works on C# code, and is the topic of this blog post.

    To give you a reminder: the ILAst is a tree of IL instructions, with pseudo-opcodes inserted for the higher-level language constructs that were detected. Let's take a look how the example C# code from the previous blog post looks in the final ILAst (after type inference).

    Original C#:

    static IEnumerable<IEnumerable<char>> Test(List<string> list)
    {
        foreach (string current in list) {
            yield return (from c in current where char.IsUpper(c) || char.IsDigit(c) select char.ToLower(c));
        }
        yield return new List<char> { 'E', 'N', 'D' };
    }

    Final ILAst:

    As you can see, ILAst already detected the following language constructs:

    • yield return
    • "while (enumerator.MoveNext())" loop
    • collection initializer

    Still missing are the "foreach" construct, and the contents of the lambdas that were created by the query expression.

    As with the previous blog post, you might want to run a debug build of ILSpy and load the above example into it, so that you can see the full output of the intermediate steps. Only debug builds make the intermediate steps of the decompiler pipeline available in the language dropdown.

    C# Code Generation

    We generate C# code from the ILAst. The C# code is represented using the C# Abstract Source Tree from the NRefactory library. This step uses the type information to insert casts where required, so that the semantics of the generated C# code match the semantics of the original IL code.

    However, some semantics, like the overflow checking context (checked or unchecked) or exact type referenced by a short type name, are not yet translated into C#, but stored as annotations on the C# AST. But apart from those details, the resulting code is valid C#.

    This step is probably the biggest source of bugs, as matching the IL and C# semantics to each other isn't easy. For example, IL explicitly specifies which method is being called; but C# uses overload resolution. Inserting casts so that the resulting C# will call the correct method is not simple - we don't want to insert casts everywhere, as that would make the code hard to read.

    Implementation: AstMethodBodyBuilder.cs

    C# Code Transforms

    All the remaining steps are transformations on the C# AST. NRefactory provides an API similar to System.Xml.Linq for working with the C# source tree, which makes modifications relatively easy. Additionally, the visitor pattern can be used to traverse the AST; and NRefactory implements a powerful pattern-matching construct.

    The decompiler transformations are implemented as classes in the ICSharpCode.Decompiler.Ast.Transforms namespace.

    PushNegation

    This transformation eliminates negations where possible. Some ILAst operations introduce additional negations - for example "brtrue(lbl, condition); ...; lbl:" becomes "if (!condition) { ... }". The transform will remove double negations, and will move remaining negations into other operators. "!(a == b)" becomes "a != b".

    DelegateConstruction

    So far, delegates were compiled into code such as "new D(obj, ldftn(M))". That is, a delegate takes two arguments: the target object, and the method being called. The target object is null if the method is static. This isn't valid C#, so we transform it into "new D(obj.M)". However, if the method is anonymous, then we decompile the target method (up to the DelegateConstruction step) and put the decompiled method body into an anonymous method. Or, if the method body consists of only a single return statement, we use expression lambda syntax.

    Applied to our example code, we get "(char c) => char.IsUpper(c) || char.IsDigit(c)" and "(char c) => char.ToLower(c)". Now, in this example, the lambdas did not capture any variables, and thus got compiled to static methods. The transform gets more complicated if the lambda does capture variables: In this case, the C# compiler would have created an inner class (called "DisplayClass") to represent the closure. The compiler puts all captured variables as fields into that closure.

    To decompile that correctly, the first part of the DelegateConstruction transform will replace any occurrences of "this" within the lambda body with the target object that was passed to the delegate. This makes the resulting code somewhat correct - but the closure is still visible in the decompiled code. For example, a method implementing curried addition of 3 integers would look like this:

    public static Func<int, Func<int, int>> CurriedAddition(int a)
    {
        DelegateConstruction.c__DisplayClass13 c__DisplayClass;
        c__DisplayClass = new DelegateConstruction.c__DisplayClass13();
        c__DisplayClass.a = a;
        return delegate(int b)
        {
            DelegateConstruction.c__DisplayClass13.c__DisplayClass15 c__DisplayClass2;
            c__DisplayClass2 = new DelegateConstruction.c__DisplayClass13.c__DisplayClass15();
            c__DisplayClass2.CS$8__locals14 = c__DisplayClass;
            c__DisplayClass2.b = b;
            return (int c) => c__DisplayClass2.CS$8__locals14.+ c__DisplayClass2.+ c;
        };
    }

    In a second step, the DelegateConstruction transformation looks for such display classes, and removes them by promoting all their fields to local variables. If one of the fields is simply a copy of the function's parameter, no new local variable is introduced, but the parameter is used instead.

    So after this cleanup step is complete, the curried addition example will look exactly like the original C# code:

    public static Func<int, Func<int, int>> CurriedAddition(int a)
    {
        return (int b) => (int c) => a + b + c;
    }

    PatternStatementTransform

    This step does pattern-matching. It defines code patterns for the following language constructs:

    • using
    • foreach (both on generic and on non-generic collections)
    • for
    • do..while
    • lock
    • switch with string
    • Automatic Properties
    • Automatic Events (normal events without explicit add/remove accessor)
    • Destructors
    • try {} catch {} finally

    The expanded code pattern is searched for using NRefactory's pattern matching. When it is found, some additional checks are performed to see whether the transformation is valid (e.g. a 'foreach' loop variable must not be assigned to). If it is valid, the matched code pattern is replaced with the detected construct.

    Since the ILAst has only one loop construct, all generated C# code initially uses only while-loops. But if a loop looks like this: "while (true) { ... if (condition) break; }", then we can change it into "do { } while (!condition);". Using NRefactory's pattern matching, the pattern we look for can be defined easily:

    static readonly WhileStatement doWhilePattern = new WhileStatement {
        Condition = new PrimitiveExpression(true),
        EmbeddedStatement = new BlockStatement {
            Statements = {
                new Repeat(new AnyNode("statement")),
                new IfElseStatement {
                    Condition = new AnyNode("condition"),
                    TrueStatement = new BlockStatement { new BreakStatement() }
                }
            }}};

    Pattern matching reuses some ideas from regular expressions: the "Repeat" node will match any number of nodes (like the star operator in regular expressions ), and the strings passed to the "AnyNode" constructor create capture groups with that names. For a successful match, the "statement" group will contain all statements except for the final "if (condition) break;", and the "condition" group will contain the loop condition.

    ReplaceMethodCallsWithOperators

    This step eliminates invocations of user-defined operators ("string.op_Equality(a,b)") and replaces them with the operator itself ("a == b").

    It also simplifies statements of the form "localVar = localVar + 1;" to use the post-increment operator. Note that this transformation is only valid for statements -- within expressions, we would be required to use pre-increment.

    IntroduceUnsafeModifier

    This transformation looks through the method body and looks for any operations that are valid only in an unsafe context. If any are found, the method is marked with the unsafe modifier.

    The step also contains some code readability improvements for unsafe code: "*(ptr + num)" gets transformed to "ptr[num]", and "(*ptr).Member" gets transformed to "ptr->Member".

    AddCheckedBlocks

    In IL, there are different opcodes for instructions with and without overflow checking (e.g. "add" vs. "add.ovf"). However, in C# the overflow checking cannot be specified on single operators, but only on whole expressions. For example, "a = checked(b + c)" will also evaluate the sub-expressions b and c with overflow checking. If those contain any IL opcodes that didn't use overflow checking, then the C# code must use unchecked expressions within the checked expression.

    Code can quickly get unreadable if you do this around every instruction, so we looked for a way to place the blocks intelligently. We formulated the problem as an optimization problem, with the following goal:

    1. Use the minimum number of checked blocks and expressions
    2. Prefer checked expressions over checked blocks
    3. Make the scope of checked expressions as small as possible
    4. Make the scope of checked blocks as large as possible

    We use dynamic programming to calculate the optimal solution in linear time. Essentially, the algorithm calculates two solutions for each node: both have optimal cost, but one expects the parent context to be checked, the other expects it to be unchecked. This allows composing the whole solution (global optimum) from the partial solutions (optimal solutions for each node in the two contexts).

    DeclareVariables

    So far, all variables were declared at the start of the method. This step aims to make the code more readable by moving the variable declarations so that they have the smallest possible scope.

    This step will introduce multiple declarations for the same variable whenever this is allowable. This might happen if two loops use the same variable, but the value assigned to the variable by the first loop will never be read by the second loop.

    Basically, we split up a variable whenever this is possible without triggering the C# compiler error "Use of unassigned local variable" - if the second code block ensures it always initializes the variable before reading it, it can impossible read the value assigned by the first code block. For this purpose, I implemented C# definite assignment analysis, which is surprisingly complex - the specification is 10 pages long, and makes heavy use of the reachability rules, which take another 10 pages in the C# specification.

    ConvertConstructorCallIntoInitializer

    This step is all about constructors. First, we look at all constructors in the current class. If they all start with the same instruction, and that instruction is assigning to an instance field in the current class, then we convert that statement into a field initializer.

    After that, all constructors should start with a call to the constructor of the base class. We take that call, and change it into an initializer (" : base()" syntax).

    IntroduceUsingDeclarations

    When initially creating C# code from the ILAst, ILSpy always uses short type names (without namespace name). However, it annotates the type references, so that the referenced type is still known.

    This step looks at the annotations and introduces the appropriate using declarations. Then, the step looks at all referenced assemblies, and looks which types were imported by the using declarations. If several types with the same name were imported, that name is marked as ambiguous.

    Now, the transformation again looks at all type references, and fully qualifies those that are ambiguous.

    Note that this transformation step is disabled when you use ILSpy to look at a single method. It is used only when decompiling a whole class, or when decompiling a whole assembly.

    IntroduceExtensionMethods

    This step will replace calls to extension methods with the infix syntax. "Enumerable.Select(a, b)" becomes "a.Select(b)".

    Now, let me show you the decompiled running example after this step:

    private static IEnumerable<IEnumerable<char>> Test(List<string> list)
    {
        foreach (string current in list)
        {
            yield return current.Where((char c) => char.IsUpper(c) || char.IsDigit(c)).Select((char c) => char.ToLower(c));
        }
        yield return new List<char> { 'E',  'N',  'D' };
    }

    IntroduceQueryExpressions

    This step takes a look at method calls, and tries to find patterns that look like the output of C# query expressions. Basically, we apply the same steps as the C# compiler when it translates query expressions into method calls, but in reverse.

    This results in the following decompiled code:

            yield return 
                from c in current
                where char.IsUpper(c) || char.IsDigit(c)
                select char.ToLower(c);

    The IntroduceQueryExpressions step does a mostly literal translation of method calls to query clauses. However, the C# language defined some query expressions to be translated in terms of other query expressions. Examples are "let" clauses and query continuations with "into". Especially let-Clauses are tricky; since they cause the C# compiler to generate so-called transparent identifiers (see C# specification for details). Such a query might look like this:

    from <>h__TransparentIdentifier2b in
        from o in orders
        select new
        {
            o = o, 
            t = o.Details.Sum((QueryExpressions.OrderDetail d) => d.UnitPrice * d.Quantity)
        }
    where <>h__TransparentIdentifier2b.>1000m
    select new
    {
        OrderID = <>h__TransparentIdentifier2b.o.OrderID, 
        Total = <>h__TransparentIdentifier2b.t
    };

     

    CombineQueryExpressions

    This step combines LINQ queries to simplify them (e.g. introduces query continuations); and gets rid of transparent identifiers by re-introducing the original 'let' clause. The above query combined results in the easy-to-understand query:

    from o in orders
    let t = o.Details.Sum((QueryExpressions.OrderDetail d) => d.UnitPrice * d.Quantity)
    where t >1000m
    select new
    {
        OrderID = o.OrderID, 
        Total = t
    };

    This concludes the transformations done by the decompiler.

    There's only one tiny detail left: we run NRefactory's InsertParenthesesVisitor, which introduces both required parentheses, and some additional parentheses to make the code more readable. The parenthesis-inserting step will run even if you use the language drop down to stop the decompilation at a previous step.

    The very last step, of course, is the OutputVisitor, which generates text from the C# AST.

  • ILSpy - Decompiler Architecture Overview

    When ILSpy was only two weeks old, I blogged about the decompiler architecture. The basic idea of the decompiler pipeline (IL -> ILAst -> C#) is still valid, but there were several changes in the details, and tons of additions as ILSpy learned about more features in the C# language.

    The pipeline has grown a lot - there are now 47 separate steps, while in the middle of February (when the previous architecture post was written), there were only 14.

    If you want to follow this post, grab the source code of ILSpy and create a debug build, so that you can take a look at the intermediate steps while I am discussing them. Only debug builds will show all the intermediate steps in the language dropdown.

    It's impossible to give a short sample where every intermediate step does something (the sample would have to use every possible C# feature), but the following sample should show what is going on in the most important steps:

    static IEnumerable<IEnumerable<char>> Test(List<string> list)
    {
        foreach (string current in list) {
            yield return (from c in current where char.IsUpper(c) || char.IsDigit(c) select char.ToLower(c));
        }
        yield return new List<char> { 'E', 'N', 'D' };
    }

    Take this code, compile it, and then decompile it with a debug build of ILSpy, so that you can take a look at the results of the intermediate steps.

    Essentially, the decompiler pipeline can be separated into two phases: the first phase works on a tree representation of the IL code - we call this representation the ILAst. The second phase works on C# code, stored in the C# Abstract Syntax Tree provided by the NRefactory library.

    ILSpy uses the Mono.Cecil library for reading assembly files. Cecil parses the IL code into a flat list of IL instructions, and also takes care of reading all the metadata. Thus, the decompiler's input is Cecil's object model, giving it approximately the same information as you see when you select 'IL' language in the dropdown.

    ILAst

    We construct the intermediate representation ILAst. Basically, every IL instruction becomes one ILAst instruction. The main difference is that ILAst does not use an implicit evaluation stack, but creates temporary variables for every write to a stack location. However, the ILAst also supports additional opcodes (called pseudo-opcodes) which are used by various decompiler steps to represent higher-level constructs.

    Another difference is that we create a tree structure for try-finally blocks - Cecil just provides us with the exception handler table from the metadata.

    Implementation: ILAstBuilder.cs

    Variable Splitting

    Using data flow analysis, we split up variables where possible. So if you had "x = 1; x = add(x, 1);", that will become "x_1 = 1; x_2 = add(x_1, 1)". We do not use SSA form for this (although there's an unused SSA implementation left over in the codebase), we only split variables up when this is possible without having to introduce phi-functions. The goal of this operation is to make compiler-generated variables eligible for inlining.

    Implementation: ILAstBuilder.cs

    ILAst Optimizations

    • Dead code removal. We remove unreachable code, because it's impossible to infer any information about the stack usage of unreachable code. Also, obfuscators tend to put invalid IL into unreachable code sections. This actually already happens as part of the ILAst construction, before variable splitting.
    • Remove redundant code
      • Delete 'nop' instructions
      • Delete 'br' instructions that jump directly to the next instruction
      • Delete 'dup' instructions - since ILAst works with variables for stack locations, we can just read a variable twice, eliminating the 'dup'.
    • Simplify instruction set for branch instructions
      • Replaces all conditional branches with 'brtrue'. This works by replacing the 'b*' instructions (branch instructions) with 'brtrue(c*)' (branch if compare instruction returns true). This step makes use the 'LogicNot' pseudo-opcode.
        The goal simply is to reduce the number of different cases that the following steps have to handle.
    • Copy propagation. This is a classical compiler optimization; however, ILSpy uses it only for two specific cases:
      • Any address-loading instruction is copied to its point of use. This ensures that no decompiler-generated variable has a managed reference as type - "ref int v = someVariable;" wouldn't be valid C# code, so we have to instead use "ref someVariable" in the place where "v" is used.
      • Copies of parameters of the current function are propagated, as long as the parameter is never written to. This mainly exists in order to propagate the "this" parameter, so that the following patterns can detect it more easily.
    • Dead store removal. If a variable is stored and nobody is there to read it, then was it really written?
      Originally we removed all such dead stores; but after some users complained about 'missing code', we restricted this optimization to apply only to stack locations. Dead stores to stack locations occur mainly after the removal of 'pop' instructions.

    The optimizations are primarily meant to even out the differences between debug and release builds, by optimizing away the stuff that the C# compiler adds to debug builds.

    Implementation: ILAstOptimizer.cs

    Inlining

    We perform 'inlining' on the ILAst. That is, if instruction N stores a variable, and instruction N+1 reads it, and there's no other place using that variable, then we move the definition of the variable into the next expression.

    So "stack0 = local1; stack1 = ldc.i4(1); stack2 = add(stack0, stack1); local1 = stack2" will become "local1 = add(local1, ldc.i4(1))". Inlining is the main operation that produces trees from the flat IL.

    Implementation: ILInlining.cs

    Yield Return

    If the method is an iterator (constructs a [CompilerGenerated] type that implements IEnumerator), then we perform the yield-return-transformation.

    Implementation: YieldReturnDecompiler.cs

    Analysis of higher-level constructs

    After inlining, we tend to have a single C# statement in a single ILAst statement. However, some C# expressions compile to a sequence of statements. We now try to detect those constructs, and replace the statement sequence with a single statement using a pseudo-opcode.

    We can detect and replace a construct only if it's represented by consecutive statements, so when one construct is nested in another, we first have to process the nested construct before processing the outer construct. Because constructs can be nested arbitrarily, we run all the analyses in a "do { ... } while(modified);" loop. If you select "ILAst (after step X)" in the language dropdown, decompilation will stop after that step in the first loop iteration.

    • SimplifyShortCircuit: introduces && and || operators.
    • SimplifyTernaryOperator: introduces ?: operator
    • SimplifyNullCoalescing: introduces ?? operator
    • JoinBasicBlocks: The decompiler tries to use the minimal possible number of basic blocks. Some optimizations might remove branches and therefore it is necessary to check whether two consecutive basic blocks can be joined into one after such optimizations. It is important to do this because other optimizations like inlining might not work if the code is split into two basic blocks.
    • TransformDecimalCtorToConstant: changes invocations of the "new decimal(int lo, int mid, int hi, bool isNegative, byte scale)" constructor into literals.
    • SimplifyLdObjAndStObj: replaces "ldobj(ldloca(X))" with "ldloc(X)", and similar for other kinds of address-loading instructions.
    • TransformArrayInitializers: introduces array initializers
    • TransformObjectInitializers: introduces object and collection initializers
    • MakeAssignmentExpression: detects when the result of an assignment is used in another expression, and inlines the stloc-instruction accordingly. This is essential for decompiling loops like "while ((line = r.ReadLine()) !null)", as otherwise the loop condition couldn't be represented as a single expression.
      This step also introduces the 'CompoundAssignment' opcode for C# code like "this.M().Property *10;". Only because this step de-duplicates the expression on the left-hand side of the assignment, the "this.M()" method call can be inlined into it.
    • IntroducePostIncrement: While pre-increments are handled as special case of compound assignments; post-increment expressions need to be handled separately.
    • InlineVariables2: this performs inlining again, since the steps in the loop might have opened up additional inlining possibilities. The next loop iteration depends on the fact that variables are inlined where possible.

    Implementation: ILAstOptimizer.cs, PeepholeTransform.cs, InitializerPeepholeTransform.cs

    To get more of an idea of what is going on, consider the collection initializer "new List<char> { 'E', 'N', 'D' }". In the ILAst, this is represented as 5 separate instructions:

    stloc(g__initLocal0, newobj(List`1<char>::.ctor))
    callvirt(List`1<char>::Add, ldloc(g__initLocal0), ldc.i4(69))
    callvirt(List`1<char>::Add, ldloc(g__initLocal0), ldc.i4(78))
    callvirt(List`1<char>::Add, ldloc(g__initLocal0), ldc.i4(68))
    yieldreturn(ldloc(g__initLocal0))

    The collection initializer transformation will change this into:

    stloc(g__initLocal0, initcollection(newobj(List`1<char>::.ctor), callvirt(List`1<char>::Add, initializedobject(), ldc.i4(69)), callvirt(List`1<char>::Add, initializedobject(), ldc.i4(78)), callvirt(List`1<char>::Add, initializedobject(), ldc.i4(68))))
    yieldreturn(ldloc(g__initLocal0))

    Now after this transformation, the value g__initLocal0 is written to exactly once, and read from exactly one. This allows us to inline the 'initcollection' expression into the 'yieldreturn' statement, thus combining all of the 5 original statements into a single one.

    Loop Detection and Condition Detection

    Using control flow analysis (finding dominators and dominance frontiers), we detect loops in the control flow graph. A heuristic on a control flow graph is used to find the most likely loop body.

    We also build 'if' statements from the remaining conditional branch instructions.

    Implementation: LoopsAndConditions.cs

    Goto Removal

    Goto statements are removed when they are made redundant by the control flow structures built up in the previous step. Remaining goto statements are converted into 'break;' or 'continue;' statements where possible.

    Implementation: GotoRemoval.cs

    Reduce If Nesting

    We try to re-arrange the if statements to reduce the nesting level. For example, if the end of the then-block is unreachable (e.g. because the then-block ends with 'return;'), we can move the else block below the if statement.

    Remove Delegate Initialization

    The C# compiler will use static fields (and in some cases also local variables) to cache the delegate instances associated with lambda expressions. This step will remove such caching, which opens up additional inlining opportunities. In fact, we will have to move this step into the big 'while(modified)' loop so that we can correctly handle lambda expressions within object/collection initializers.

    Introduce Fixed Statements

    .NET implements fixed statements as special 'pinned' local variables. As there isn't any representation for those in C#, we translate them into 'fixed' statements.

    Variable Recombination

    Split up variables were useful for inlining and some other analyses; but now we don't need them any more. This step simply recombines the variables that we split up earlier.

    Type Analysis

    Here, finally, comes the semantic analysis. All previous steps just transformed the IL code. Some were introducing some higher-level constructs, but those were defined as pseudo-IL-opcodes, which pretty much just are shorthands for certain IL sequences. Semantic analysis now figures out whether "ldc.i4(1)" means "1" or "true" or "StringComparison.CurrentCultureIgnoreCase".

    This is formulated as a type inference problem: we determine the expected type and the actual type for each expression in the ILAst. In case some decompiler-generated variables (for the stack locations) weren't removed by the ILAst transformations, we also need to infer types for those.

    Implementation: TypeAnalysis.cs

    This concludes our discussion of the first phase of the decompiler pipeline. In the next post, I will describe the translation to C# and the remaining transformations.

  • ILSpy - Query Expressions

    ILSpy supports LINQ query expression - we added that feature shortly before the M2 release.

    Today, I implemented support for decompiling object initializers and fixed some bugs related to deeply nested lambdas. With these two improvements, query expression translation becomes possible in several more cases.

    This screenshot shows Luke Hoban's famous LINQ ray-tracer.

    Why are queries related to object initializers? Simple: LINQ queries allow only the use of expressions. When an object initializer is decompiled into multiple statements, there's no way to fit those into a "let" or "select" clause, so query expression translation has to abort.

    Another issue with this sample was the deep nesting of the compiler-generated lambdas. Once closures are nested more than two levels deep, the C# compiler starts copying the parent-pointer from one closure into its subclosure ("localsZ.localsY = localsX.localsY;"). This case was missing from the lambda decompilation, so some references to the closure classes were left in the decompiled code. This bug has now been fixed, so nested lambdas should decompile correctly.

    We're now close to supporting all features in C# 3.0, the only major missing item is expression tree support. So LINQ queries currently decompile into query syntax only if they're compiled into delegates (LINQ-to-Objects, Parallel LINQ), not if they're compiled into expression trees (LINQ-to-SQL etc.).

  • ILSpy - yield return

    This weekend, I worked on decompiling 'yield return' statements. The C# compiler is performing quite a bit magic to make 'yield return' work, and the decompiler must be aware of all this magic and be able to revert it.

    After two days of hard work, I'm happy to announce that ILSpy (starting with 1.0.0.528) can now decompile enumerators.

    Grab the new ILSpy build while it's hot, or just look at the obligatory screenshot:

    If you want to understand the code generated by the compiler, you can disable this new feature in the new 'View > Options' dialog. Or you could read Jon Skeet's great article on this topic: Iterator block implementation details: auto-generated state machines.

    Here's the generated MoveNext() code for the SelectMany implementation:

        private bool MoveNext()
        {
            bool flag;
            try {
                int i = this.$1__state;
                if (i == 0) {
                    this.$1__state = -1;
                    this.$7__wrap17 = this.source.GetEnumerator();
                    this.$1__state = 1;
                    goto IL_B0;
                }
                if (!3) {
                    goto IL_C6;
                }
                this.$1__state = 2;
                IL_9D:
                if (this.$7__wrap19.MoveNext()) {
                    this.<subElement>5__16 = this.$7__wrap19.Current;
                    this.$2__current = this.<subElement>5__16;
                    this.$1__state = 3;
                    flag = true;
                    return flag;
                }
                this.$m__Finally1a();
                IL_B0:
                if (this.$7__wrap17.MoveNext()) {
                    this.<element>5__15 = this.$7__wrap17.Current;
                    this.$7__wrap19 = this.selector.Invoke(this.<element>5__15).GetEnumerator();
                    this.$1__state = 2;
                    goto IL_9D;
                }
                this.$m__Finally18();
                IL_C6:
                flag = false;
            } catch { // in IL, this is a try-fault block, but C# doesn't have those...
                this.Dispose();
                throw;
            }
            return flag;
        }

    Now how can one map the generated code back to the original C#? The general idea is simple (the devil is in the details...):

    Every time the code assigns to this.current, this.state and then returns, we transform that into a "yield return" instruction and a "goto" instruction to the label belonging to the new state. Because we run this transformation very early in the decompiler's pipeline (prior to any control flow analysis), the following steps will pick up on the "goto"s and be able to detect loops and simplify the "goto"s away.

    However, how do we determine the label that is responsible for (to give an example) state 3? The answer is 'IL_9D', but figuring this out is non-trivial: the C# compiler makes use of if-statements (to be exact: beq and bne.un), switch statements, and mixtures of both. Moreover, switch statements are usually preceded by subtractions, as the IL switch only deals with cases 0 to n-1. The ILAst for the beginning of the above MoveNext() method looks like this:

        stloc(var_1_06, ldfld(Enumerable/<SelectManyIterator>d__14`2<TSource, TResult>::<>1__state, ldarg(0)))
        brtrue(IL_17, ceq(ldloc(var_1_06), ldc.i4(0)))
        brtrue(IL_96, ceq(ldloc(var_1_06), ldc.i4(3)))
        br(IL_C6)
        IL_17:
        stfld(Enumerable/<SelectManyIterator>d__14`2<TSource, TResult>::<>1__state, ldarg(0), ldc.i4(-1))
        ...

    If you haven't been following the previous posts: the ILAst is an intermediate data structure used in the decompiler. It represents an IL program using nested expressions, thus eliminating the IL evaluation stack. At the point where the "yield return" transformation runs, opcodes have already been simplified, so "beq" now is "brtrue(ceq)".

    To determine where MoveNext() will branch to in a given state, ILSpy will simulate the execution of the beginning of the MoveNext() method. It does this symbolically: "this.$1__state" evaluates to (state+0). In general, "values" in this symbolic execution are (x), (state+x), (state==x) and (this), where x is an int32. The execution will go linearly through the ILAst; it works on the assumption that there are no backward branches. Execution stops once it encounters a statement it doesn't understand - usually, this is the assignment "this.$1__state = -1;", which indicates that the enumerator started executing. For each statement in the ILAst, the range of states that can lead to that value is stored.

    So the result of the analysis is the following table:
        IL_17: state 0 to 0
        IL_96: state 3 to 3
        IL_C6: state int.MinValue to -1; 1 to 2, and 4 to int.MaxValue

    This allows us to reconstruct the control flow in the MoveNext() method. However, one piece of the puzzle is still missing: the try-finally blocks. The C# compiler doesn't compile any of those into the MoveNext() method. Instead, it puts each finally block into its own method, and calls them in the MoveNext() method only on the regular exit of the try blocks. In case of an exception, the try-fault handler simply calls Dispose(), which takes care of calling the finally blocks depending on the current state:

        void System.IDisposable.Dispose()
        {
            switch (this.$1__state) {
                case 1:
                case 2:
                case 3{
                    try {
                        switch (this.$1__state)
                        {
                            case 2:
                            case 3:
                            {
                                try {
                                } finally {
                                    this.$m__Finally1a();
                                }
                                break;
                            }
                        }
                    } finally {
                        this.$m__Finally18();
                    }
                    return;
                }
            }
        }
        private void $m__Finally18()
        {
            this.$1__state = -1;
            if (this.$7__wrap17 !null) {
                this.$7__wrap17.Dispose();
            }
        }
        private void $m__Finally1a()
        {
            this.$1__state = 1;
            if (this.$7__wrap19 !null) {
                this.$7__wrap19.Dispose();
            }
        }

    We analyze the Dispose() method using the same symbolic execution that we used for the jump code at the beginning of MoveNext(). This tells us that $m__Finally1a is called in states 2 and 3; and that $m__Finally18 is called in states 1 to 3. Using this information, we can reconstruct the try-finally blocks within MoveNext(). The remaining parts of the ILAst pipeline then take care to replace the "goto"s with loop and if structures. Finally, the C# pattern transformations take care of translating the code back to the foreach pattern, resulting in the highly readable code in the screenshot at the beginning of this post.

     

  • ILSpy - decompiler architecture

    In my introductory post two weeks ago, I said we had not decided on a decompiler engine yet. The decision was made soon after: we found some issues in the Cecil.Decompiler design, and so decided to move forward with our own decompiler engine (based on David's dissertation). David will post more about those issues and how we can avoid them in the ILSpy design.

    Here, I will write about the ILSpy architecture. But actually, the best way to learn about it is to download the source code, compile a debug build, and run it:

    As you can see in this screenshot, the debug builds offer several more "languages" than the release builds (which show C# and IL only). These additional languages represent the intermediate steps between the IL code and the decompiled C# code.

    The decompiler works on two different representations: first, it transforms the IL code into an intermediate language we call "ILAst". In this language, every IL instruction is represented by exactly one function. Such functions calls can be nested when one IL instruction directly consumes the output of another:

    callvirt(UIElement::set_IsEnabled, ldfld(AboutPage/<>c__DisplayClass7::button, ldarg(0)), ldc.i4(0))

    In essence, the ILAst is a structured representation of the IL code. Further transformation steps introduce even more structure into this ILAst, such as loops and if/else constructs.

    The last step done on the ILAst representation is type inference. If you look at the example like above, you'll see that the UiElement.set_IsEnabled method is called with the integer 0 as argument. However, the setter expects a boolean in C#. The issue here is that IL does not contain the full type information of the original program. For methods and fields, and even for local variables, full type information is available in the IL code. But for temporary results (stored on the IL evaluation stack), the CLR uses a less strict type system. In that type system, the type "I4" can stand for any of the following types: int bool uint short ushort byte sbyte, and also for any enum based on one of those integer types. The type analysis step uses inference to determine which of the C# types should be used by the decompiler. This results in the following ILAst:

    callvirt:void(UIElement::set_IsEnabled, ldfld:Button(AboutPage/<>c__DisplayClass7::button, ldarg:AboutPage/<>c__DisplayClass7(0)), ldc.i4:bool(0))

    This information allows the decompiler in the following step to know that the literal 0 was actually the literal false. So in the C# AST (which is built on top of the NRefactory library) we get the following statement:

    this.button.IsEnabled = false;

    Technically, we already have valid C# at this point. But we still proceed to transform this C# code, both to simplify away artifacts introduced by the compiler (or decompiler); and to introduce support for some of the more higher-level constructs in the C# language. In this example, the statement that was disabling the button was actually part of an anonymous method. The "DelegateConstruction" transformation step will inline anonymous methods into the method where they are instanciated. It also removes all traces of the "DisplayClass" (the class used by the C# compiler to represent the closure), leading to the following code:

        button.Click +delegate {
            button.IsEnabled = false;
            ...
        }

    Note that by inlining the anonymous method, the "this" was changed to refer to the closure instance, which was then subsequenly removed. The final code directly works on the local button variable, and none of the display class/closure implementation remains. The decompiled code now is almost identical to the original source code.

  • ILSpy - Disassembler

    Even though the killer feature that everyone is waiting for is the decompiler; it's often still is interesting to directly look at the IL code. Of course, this feature can't be missing in a tool called ILSpy.
    Implementing it also provided me with a way to test the GUI logic while the decompiler is still under construction.

    We use AvalonEdit for the code view, so you can expect some advanced features:
    The most obvious one is syntax highlighting, which is using AvalonEdit's built-in highlighting engine. As usual for AvalonEdit, copying the text into any HTML-capable application (such as this blog's edit box in Firefox) will preserve the highlighting.
     .method public static hidebysig string CreateHtmlFragment

    Code Folding allows you to collapse and expand methods - useful if you're viewing a complete class.
    Finally, there's an important invisible feature: every reference is a hyperlink.
    Click on a branch target (e.g. IL_0063), and the code view jumps to the target IL instruction. Click on a type/member reference, and it will be selected in the tree view and will be decompiled.

    The disassembler itself also can perform a nice trick: it will display the code in an indented form.
    For this purpose, I wrote a very simple detector for try-catch-finally and loop structures.
    Basically, the exception handling table in the method's footer is converted into a tree, and then displayed inline (ILDasm has this feature as well).

    For loops, the detection logic is dead simple: if there's a backwards branch in the code, then that's probably a loop with the body from start to end. To verify whether it's indeed a well-formed loop, I find all branches from outside the loop body jumping into the loop body, and test whether all of those jump to the same instruction. If there are multiple entry points, then the structure is not considered a loop.
    As a last test, any loops that cannot be inserted into the tree structure (because the don't nest properly and overlap with loops detected earlier) are ignored as well.

    So if you have IL code like this: "ABAB" where A is a loop and B is another (with jump instructions from the end of one A to the start of the other), then this simple algorithm will be wrong and detect only one loop "(ABA)B" and incorrectly consider B to be part of the loop body.
    I experimented with some other loop detection algorithms and actually had one which worked pretty well and could handle even the above case. However, the loops detected by that algorithm were not necessarily consecutive code blocks (as with the example above), so they couldn't be displayed in the disassembler view without reording the IL code.
    I don't know of any compilers that generate loops in non-consecutive blocks of code; but some obfuscators employ such unusual code patterns. The decompiler will likely used a more advanced form of loop detection in order to deal with those cases.

  • ILSpy - a new .NET assembly inspector

    First, I'll let the screenshot speak for itself:

    Now that RedGate has announced that Reflector will no longer available be for free; the SharpDevelop team has started to create an open-source replacement: ILSpy

    Just as any of the assembly browsers inspired by ILDasm, the UI is simple: a tree view on the left allows you to view the contents of the assembly; the text view on the right shows the contents of the selected method.

    As for the decompiler engine itself, we still haven't decided - the screenshot above was created with Cecil.Decompiler, and you can clearly see that there are quite a few issues even in such a trivial method. We do have an alternative solution - David Srbecky (who wrote SharpDevelop's debugger) has written his own decompiler in early 2008 as part of his dissertation. However, tests show that whichever library we pick up, it will take a lot of work until the results come anywhere close to what people are used to.

    The new GUI is written in WPF and reuses a few SharpDevelop components - the tree view is SharpTreeView, written by Ivan Shumilin for the SharpDevelop WPF Designer outline view. This tree view is not used in any other place so far, but that will likely change in the future, as it has additional features over the normal treeview: multiselection, support for columns (GridView mode), and a built-in framework for copy/paste and drag'n'drop. The drag'n'drop support is already used in ILSpy to allow the user to reorder the assembly list.

    The text view on the right is, of course, using AvalonEdit, SharpDevelop's text editor (new as of version 4.0).

    If you are interested in contributing, write me a mail; or just join us in #sharpdevelop (on freenode).

  • Creating a localizable WPF dialog

    This post explains how to create a localizable dialog with WPF using the SharpDevelop infrastructure, i.e. when writing a SharpDevelop AddIn. If you are writing a standalone application, you can still get ideas from the post, but you will have to define the referenced styles and markup extensions yourself.

    If we create a simple dialog using the designer, the XAML code might look like this:

    <?xml version="1.0" encoding="utf-8"?>
    <Window
        x:Class="SomeNamespace.MyDialog" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Height="125" Width="300"
        Title="This is my dialog">
        <Grid>
            <Grid.RowDefinitions>
                <RowDefinition Height="1*" />
                <RowDefinition Height="Auto" />
            </Grid.RowDefinitions>
            <TextBox
                Grid.Column="0" Grid.Row="0"
                Margin="8,8,8,8">
            This big text box is representing the main content.
            </TextBox>
            <StackPanel
                Orientation="Horizontal"
                Grid.Column="0" Grid.Row="1"
                HorizontalAlignment="Right" VerticalAlignment="Bottom"
                Margin="0,0,8,4.25"
                Width="160" Height="24.5">
                <Button
                    Content="OK"
                    IsDefault="True"
                    Width="75" Height="23" />
                <Button
                    Content="Cancel"
                    IsCancel="True"
                    Margin="4,0"
                    Width="75" Height="23" />
            </StackPanel>
        </Grid>
    </Window>

    This results in the following window:

    Now, there is a whole bunch of issues with this dialog. The most obvious are that the text on the buttons is blurry, and that the background color is wrong (dialogs are expected to use the Control color, a light grey). Also, this window is shown in the task bar - dialogs shouldn't do that. We can fix all of these issues by applying a style to the window:

    <Window ... xmlns:core="http://icsharpcode.net/sharpdevelop/core" Style="{x:Static core:GlobalStyles.DialogWindowStyle}">

    This style will do the following things:

    • Use 'Control' as background color
    • Hides the window from the task bar
    • Enable WPF 4 layout and text rendering (makes everything less blurry)
    • Enables right-to-left layout if the current language is a right-to-left language

    If you have a normal window (not dialog), then you can also use {x:Static core:GlobalStyles.WindowStyle}, which does not set the background color and does not hide the window in the task bar.

    Now let's get to the localization part: we need to load the translated string resources. The {core:Localize} markup extension can be used to do this (assuming the string resources are registered with the SharpDevelop ResourceService):

    <Window ... Title="{core:Localize SomeNamespace.MyDialog.Title}">
    ...
    <Button
    Content="{core:Localize Global.OKButtonText}" .../>

    There are a few global resources for commonly used elements, but all other strings should be added using keys that indicate where the resource is being used.

    But now we're running into more potential issues: what if the translated string doesn't fit on our button? All explicitly specified widths are potential trouble-makers, so let's get rid of them. Let's trust the automatic layout being done by WPF.

    The title is missing because we forgot to define that string in the resource file. But let's ignore that issue and concentrate on the buttons instead.

    'OK' is way too small, and even 'Abbrechen' (German for Cancel) could use some more spacing between the button border and the text. Fortunately, SharpDevelop again provides a style for this common issue. We could apply that style to each button individually, but let's register it in our window so that it automatically gets picked up by all buttons:

        <Window.Resources>
            <Style TargetType="Button" BasedOn="{x:Static core:GlobalStyles.ButtonStyle}"/>
        </Window.Resources>

    This style will give the button the correct padding (9,1) and minimum width (73).

    These buttons look pretty reasonable - but if you look closely, you'll notice that the "Abbrechen" button is a bit wider than the "OK" button (80 pixels vs. 73 pixels). In this case, it's not a big problem; but the effect might be more pronounced in other languages, or with other text on the buttons; so let's take a look at how to fix this. WPF has the container "UniformGrid" to assign the same size to a set of controls. However, if you try to apply that in this case, you'll notice that the UniformGrid will include the margin in the size calculation, so whichever button has the margin set will appear to be a bit smaller.

    There are two solutions to this problem: either evenly distribute the margin over both buttons (give OK a right-margin of 2; and Cancel a left-margin of 2), or use the UniformGridWithSpacing container. Here, we use the latter approach, which has the advantage that it can be extended to more than 2 buttons without having to think about the distribution of margins.

    UniformGridWithSpacing is defined in ICSharpCode.SharpDevelop.Widgets, so we'll need to import that namespace: xmlns:widgets="http://icsharpcode.net/sharpdevelop/widgets"

    Here's how you use the grid:

            <widgets:UniformGridWithSpacing Columns="2"
                Grid.Column="0" Grid.Row="1"
                HorizontalAlignment="Right"
                Margin="0,0,12,12">
                <Button Content="{core:Localize Global.OKButtonText}" IsDefault="True" />
                <Button Content="{core:Localize Global.CancelButtonText}" IsCancel="True" />
            </widgets:UniformGridWithSpacing>

    The spacing can be defined using the SpaceBetweenColumns property, but that's not necessary in this case as the default value (7) is correct for this purpose. And yes, the Windows User Experience Interaction Guidelines really suggest 7 pixels here; not 4 or 8 as is often mistakenly assumed (I made the same mistake in the Window we started with).

    Finally, you should ensure that your dialog doesn't show a common bug: open your dialog, then switch to another application, then switch back to SharpDevelop. What should happen is that your dialog appears, forcing the user to finish whatever he was doing with your dialog. If the dialog does not appear and the SharpDevelop main window is unresponsive, you forget to give your dialog an owner. In the code creating the window (ideally just before calling w.ShowDialog()), add:

    w.Owner = WorkbenchSingleton.MainWindow;

    If your dialog is triggered by another dialog, then use your immediate parent window as owner instead.

    Posted Nov 05 2010, 04:06 PM by DanielGrunwald with no comments
    Filed under:
More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
Don't contact us via this (fleischfalle@alphasierrapapa.com) email address.