SharpDevelop Community

Get your problems solved!
Welcome to SharpDevelop Community Sign in | Join | Help
in Search

David Srbecký's blog

  • WPF text rendering performance

    We have spent one day during #d^3 trying to make the SharpDevelop text editor (AvalonEdit) faster.  Give the latest version a try - it should be much faster in most of the cases.

    As you might know, WPF is being rendered using DirectX.  The managed part of WPF prepares a retained visual tree which is then rendered using native engine with DirectX.  We, of course, do not have access to the source code and the code is native so ILSpy can not help us either.  However, we can use PIX to find a little bit about what is going on under the covers.  PIX captures all DirectX calls made by the application and allows the user to analyse them.  Is it pretty much like a debugger for the graphics card.

    Here you can see the capture right in the middle of rendering.

    First of all, note that the rendering is not complete yet.  PIX shows us the state of the frame buffer at the given time.  The next thing to be rendered (using the DrawPrimitive command) will be "00".  You can see the black and white "00" stored in the temporary surface (texture).  You can see the quad which will be used to render it.  Let's go through the commands that follow.  Once the "00" is rendered, the colour for the following text (",") will be set (SetRenderState, SetPixelShaderConstantF).  Then the surface (which lives in system memory) will be locked, filled with the bitmap for "," and unlocked.  Once this is done, the surface can be copied to GPU (UpdateSurface) and finally rendered (DrawPrimitive).  We move on to the next segment and so on.

    The rendering is nice and simple, but it definitely is not what I would have expected.  It just seems too slow.

    • First of all, WPF does not really seem to be rendering the text in hardware.  The black and white bitmap is created in system memory for each text segment and then copied to GPU where it is copied again to the appropriate position in buffer.  The GPU is using pixel shader so it does a little bit of additional processing but the main bulk of work seems to be done on CPU.
    • DirectX calls are expensive because they need to be passed to the driver in kernel.  It is therefore important to have as few calls as possible.  Game engine programmers (my day job) go to great lengths to minimize the number of such calls.  And yet WPF uses 8 calls just to render single text run.  The total number of calls for this particular page is 22493.  It should be possible in just couple of dozen by batching draw calls together.  State changes are equally evil (SetRenderState, SetPixelShaderConstantF).  (Direct2D&DirectWrite does some batching)
    • I am really worried about the surface Lock/Unlock.  GPU is usually lagging behind the CPU - sometimes even by several frames.  The code modifies the first surface which is in system memory, issues a copy command from CPU to GPU, issues draw command and then tries to lock it so that it can put the next text in it.  However, it can not overwrite it until the previous commands are finished, so the CPU has to wait - potentially long time.  I do not know whether DirectX does some tricks to avoid this.
    • There will be exactly one memory copy from CPU to GPU for each text segment.  It might make more sense to just copy several at a time into one big surface.  It would also make sense to keep some texts around as cache so that they do not have to be copied next time again.  The natural approach would be to just copy the whole alphabet into a texture and use that.

    So how does this help us to make AvalonEdit faster?  The most important observation is that WPF issues a lot of calls even though we have prepared every line into WPF TextLine and invalidate it only when the line is changed.  We knew that creation of the line was expensive, but it turns out that just rendering it is quite expensive as well.  There does not seem to be any caching.

    WPF does not repaint the whole window.  It only repaints the visuals that have been modified.  In our case the visual is the AvalonEdit text layer so the whole text is repainted.  To fix this, we have used one DrawingVisual for each line and invalidate it only when the line changes.  This separation means that lines which were not changed will not be repainted.  Obvious, but not so simple to achieve in WPF.  The performance improvement depends on the actual text in the editor.  In our (very demanding) test case we saw an improvement of factor of 20 (on the average case it will probably not be that impressive).

    Here is the example of just a single line being rendered (just after the initial Clear method).

  • ILSpy - Ensuring correctness of ILAst

    In this blog post, I will discuss the data model that ILSpy uses to store the program and how we are ensuring that the model accurately represents the original program.  One of the main goals of the decompiler is to ensure that the decompiled program accurately represents the original assembly.  You should be able to trust that the produced C# does indeed represent the program.  In the last post, Daniel has briefly described how the type inference works.  It is a lot of work to turn 0 to "false", but it is necessary to ensure that the output is correct rather than merely looking nice.  In this post, I will describe how the stack-based assembly is translated into variable-based AST form.  That is, the stack-based data flow is completely replaced by nested expressions and temporary variables.  This is the very first step of the decompilation so it is important to get it right.

    For a comparison, I have tried to decompile a program with non-trivial control flow in the .NET Reflector to see what code it produces.  It created a reasonably nice output which was however completely wrong (see the end of this post).  None of the difficult problems are addressed - it just takes a best guess and hopes for the best.  This is not the approach that we would want to take.

    Back to the ILSpy - we call the intermediate data model "ILAst" which stands for Intermediate Language Abstract Syntax Tree.  Abstract Syntax Tree (AST) is the standard name given to the data structure used by virtually all compilers to represent high-level form of the source code.  The compilers start with the AST and eventually convert it to some low-level assembly.  The assembly code is register-based when compiled directly for the processor and it is usually stack based when it is destined for some virtual machine.

    For example, let's consider the expression 'x = a * b + c'.

    The compiler will parse the expression and represent it using an AST similar to this:

    This is one valid representation of the data.  If you are comping to a native assembly, the code will be eventually translated to register-based assembly like this:

    mov r1, &a
    mov r2, &b
    mul r1, r1, r2
    mov r2, &c
    add r1, r1, r2
    store r1, &x

    If you are compiling for .NET you will get a stack-based assembly like this:

    ldloc a
    ldloc b
    mul
    ldloc c
    add
    stloc x

    The job of the decompiler is to convert the assembly to readable C# code.  To make this easier, the very first thing we do is to convert the assembly to AST.  We call it ILAst, because it is not a rich AST like one a compiler would have - it does not have a node type for every language construct - it stores everything as "ILExpression" instead.  The semantics of the ILExpression is defined by storing the IL opcode in it (for example, ldloc).  This data model is simple to work with and maps very well to C# code.

    So how can we convert from the stack-based assembly to ILAst?  Fortunately, it is much simpler than if the assembly was register based.  If I were to ask you to store the AST tree of 'x = a * b + c' (see diagram above) to disk in an compact way, how would you do it?  We know how many arguments each node has (for example, "add" has two, "stloc" has one).  My first approach would be to store the identifier of the node and then store its arguments after it.  So I would store it as the following sequence (the brackets are there only for clarification):

    "st X" ("add" ("mul" "ld a" "ld b") "ld c")

    This is almost exactly what the stack-machine does!  The only difference is the arguments are stored first and then followed by the identified.  So the sequence is:

    (("ld a" "ld b" "mul") "ld c" "add") "st X"

    This is a great way to store the program because it very close to the original AST, the order of instructions is in the order in which it will be executed and it can be easily described as a stack-machine.

    That was easy so far, so here is the catch - .NET allows the program to branch when the stack is not empty.  This is generally useful, but makes the job of decompiler more difficult because it is no longer trivial to convert the program to ILAst (or any other high level form).

    So here is an example of what a control flow of a program might look like:

     

    Translating this to AST is not as easy as in the previous case.  It can not be represented as a single expression and it can not be trivially separated into separate expressions because there is data dependency between the nodes.  However, in this case the solution is easy. We store the result of the "ld a" into a temporary variable and then use the temporary variable for the stores.  In fact, this approach can be easily generalized - whenever something is pushed on the stack, store it in temporary variable, and whenever something is loaded from the stack, load the appropriate temporary variable (it is not that difficult to find which one it is).  So the 'x = a * b + c' would look like this:

    tmp1 = ldloc a
    tmp2 = ldloc b
    tmp3 = mul (tmp1, tmp2)
    tmp4 = ldloc c
    tmp5 = add (tmp3, tmp4)
    stloc x (tmp5)

    A few inlineing optimizations and we have the original AST.

    Unfortunately, there is one more problem - the control flow can also converge:

    We have to select a temporary local variable based on which branch was taken.  This problem is well known in the Single Static Assignment (SSA) form that compilers often use (See http://en.wikipedia.org/wiki/Static_single_assignment_form).  The solution is to define a single temporary variable and assign to it at the end of each branch.  The store will then use this temporary variable.

    In general, the problem is quite complicated because the result of the branch might not be immediately consumed - the branch might be nested in another branch.  Also, the converging and diverging control flows can occur at the same time making all the interactions quite complex.

    So without further ado, let's discuss to the approach that ILSpy uses.  It is not necessarily the most optimal solution, but it is a correct one.  The reasoning is that these cases should happen extremely rarely and the decompiler should be able to completely optimize them away.  So the goal is to just make sure the decompiler produces the correct output if some exotic compiler or obfuscator is used.

    Here is an example of a quite a complicated control flow.  The "ld a" and "ld b" converge and then the result is either used or converges again with "ld c".  In the later case, the control flow diverges before the value is used.

    ILSpy assigns a temporary variable for each argument (input) of an instruction.  So in this case, there will be three temporary variables - one for each of the store instructions.  When a load instruction is executed, its result will be stored to all the necessary temporary variables.  For example, the value of "ld a" may end up in any of the stores so its result will be assigned to all three temporary variables right after it is executed.  "ld c" can be assigned to only two of the temporary variables.  Of course, this is the worst case scenario - in most programs, the result will be written to only a single temporary variable and so the variable will be most likely be trivially optimized away by inlineing.

    So how do we find what "appropriate" variables to assign to?  We find the stack state before and after each instruction.  Let's say that the stack is empty just before the "ld a" instruction.  Right after the instruction is executed, the stack will contain one value.  We do not know what value, but we know that there will be one and that it had to be pushed on the stack by the "ld a" instruction.  We note that.  In a similar fashion, we analized the stack after "ld b".  When we get to the branch which immediately follows these two instructions, we know that the stack might be in two different states depending on where the control flow came from.  The stack state before the "br" contains one value and the value was pushed on the stack by either "ld a" or "ld b".  We do not know which one, so we just note both of them.  We do this kind of analysis for the whole program and we will find that the stack before the "st z" will contain one value which could have been pushed on the stack by any of the three loads.  So at this point we create a temporary variable for the argument of "st z" and tell all three of the load instructions that they have to store their result into this temporary variable.  The other store instructions will do the same so at the end of the day, each load instruction will know which variables it has to store its result in.  That is all.  A few inlining optimizations and we have nice AST tree which is guaranteed to be correct.

    Just for the record, here is how ILSpy decompiles program with the control flow we have just discussed (see the diagram above, the only difference is that the program uses return instead of store):

    public static int Test(int switch1, int br1, int br2)
    {
        switch (switch1)
        {
            case 0:
            {
                var expr_14 = 1;
                var arg_31_0 = expr_14;
                var arg_3B_0 = expr_14;
                var arg_42_0 = expr_14;
                goto IL_26;
            }
            case 1:
            {
                var expr_1A = 2;
                arg_31_0 = expr_1A;
                arg_3B_0 = expr_1A;
                arg_42_0 = expr_1A;
                goto IL_26;
            }
            case 2:
            {
                var expr_20 = 3;
                arg_3B_0 = expr_20;
                arg_42_0 = expr_20;
                goto IL_33;
            }
        }
        return 0;
        IL_26:
        if (br1 == 0)
        {
            return arg_31_0 + 10;
        }
        IL_33:
        if (br2 == 0)
        {
            return arg_3B_0 + 20;
        }
        else
        {
            return arg_42_0 + 30;
        }
    }

    It is not nice but it can not be - the control flow is complicated.  It does take some temporary variables and gotos to represent it correctly.

    For comparison, here is the output from .NET Reflector:

    public static int Test(int switch1, int br1, int br2)
    {
        switch (switch1)
        {
            case 0:
            case 1:
                if (br1 == 0)
                {
                    return (3 + 10);
                }
                break;

            case 2:
                break;

            default:
                return 0;
        }
        if (br2 == 0)
        {
            return (2 + 20);
        }
        return (1 + 30);
    }

    The code looks much nicer, but it is completely wrong.  The non-trivial control flow made the Reflector quite confused.  It is easy to see that the code is wrong - the program has three possible inputs and three possible outputs with branches in between.  So there should be 8 possible results (3 * 3 - 1 because there is no path from "ld c" to "st x" and all other paths are possible).  However, the Reflector's code can return only 3 values: 13, 22, 31 (ignoring the default switch case).  Furthermore, the value 13 is the one for the path from "ld c" to "st x" - the one that is not supposed to happen! :-)

    Well done for reading the whole post.  I hope this gives you some insight into how we are ensuring the correctness and that we are taking it very seriously.  Next time I will discuss how we are ensuring correctness during code movement and during restoring of high level constructs like loops and conditions.

  • ILSpy - Decompiler

    If you haven't read Daniel's post already - ILSpy is open-source alternative to the .NET Reflector licensed under MIT/X11.

    I have uploaded my decompiler engine to the github repository and Daniel has already integrated it with the ILSpy GUI.  The decompiler is based on my university dissertation which you can read on github if you are interested.  This was actually a piece of code that was just lying on my harddrive for two years.  Since .NET Reflector will no longer be free, I finally have the proper motivation to finish it.  See below for a screenshot of the first version.

    I am currently in the process of cleaning up the code to make sure that it will be maintainable in the long term and that other people can more easily contribute.  One of the main goals of the decompiler will be to ensure correctness of the decompiled program.  The decompiled code should be accurate representation of IL bytecode.  I will write several more posts about the decompiler in the following days.

    Decompiled Quicksort

     

  • Internals of SharpDevelop's Debugger

    In this blog post, I will explain the internal workings of the SharpDevelop's managed debugger.  It might be useful for anyone who wants to contribute to SharpDevelop or to use the SharpDevelop's debugger for any other purpose (the debugger is LGPL library independent of SharpDevelop so you can easily reuse it).

    Fundamentals

    Some terminology first.  A program which debugs an another program is called the debugger.  The program which is being debugged is called the debuggee.  In this case SharpDevelop is the debugger and your HelloWorld program is the debuggee.

    The debugger can start a new debuggee process or it can attach to an existing one.  While the debuggee is running, there is not much that the debugger can do.  Almost all operations are forbidden.  The debugger has to wait until the debuggee pauses - usually because user's breakpoint is hit.  Once the debuggee is paused, the debugger can investigate its state - it can look at the callstack, read local variables and so on.  Stepping or pressing "Continue" will put the debuggee into running state again.

    Investigating variables

    The first most important thing to grasp is that the debugger and the debuggee are completely separate processes.  Memory spaces of different processes are strictly separated by the operating system and therefore the debugger cannot obtain a reference (pointer) to any object in the debuggee.  You might as well imagine the processes being on two different computers.  If the debugger wants to investigate the debuggee it needs to use some form of interprocess communication.  The low level COM API takes care of this and debugging library provides the functionality in the Value class.  You can obtain an instance of the Value class by, for example, calling StackFrame.GetLocalVariableValue(string name).  The Value does not hold the actual value of the local variable; it instead acts as a reference to the value in the remote process.  If the value is a primitive type like string or integer, you can simply request the actual content.  However if the value is a class, you will have to enumerate its fields and properties and get the values for the ones that you are interested in.  You are of course free to get fields of the new values as well and drill down as much as you want to.

    There is one more good reason why this model is appropriate.  When the debugger's code was compiled, it did not know that the user will create field "myHelloWorldMessage" and therefore it could not reference it.  Even if direct reference to the object in the other process was somehow available, the debugger would still have to use reflection to figure out what fields the object contains and then get their values one by one.  In fact, most of the debugger's API inherits from the abstract reflection classes so if you are familiar with reflection, you should not have any problems with the debugging API.

    Lifespan of values

    The .NET garbage collector (GC) presents a significant complication to the debugger.  When the debuggee is paused no code can be executed including the garbage collector so it is safe to investigate it as much and as long as we want.  However, if the debugger is resumed even for just a few instructions, the GC might have been run and it might have moved all variables around in memory.  The GC takes care to update all references within the debuggee so that it does not even notice.  However, it unfortunately does not tell the debugger.  This means that whenever debuggee is resumed, all debugger's Values become invalid because they might be pointing to the wrong memory.  The next time the debuggee is paused, it has to obtain all values again.  This problem is more problematic than it might initially seem - getting value of a property or calling Object.ToString() both require that the debuggee is resumed for a while so that the methods can be injected into the debuggee and executed.  Imagine that you have used the tooltips to drill down to object "foo.bar.Person" which contains two properties - FirstName and Surname.  After you evaluate the "FirstName" property, all values will become invalid and you will have to obtain "foo.bar.Person" again just so that you can evaluate "Surname".

    To get around this problem SharpDevelop is using expressions to obtain the values.  That is, whenever it might need to use value later it stores the string expression rather than the value.  So when user has "foo.bar" open and expands "Person", SharpDevelop will first generate the expression "foo.bar.Person" and then it will evaluate it.  The expression evaluator has a cache which ensures that any recently evaluated values will be reused rather than obtained again (if they are still valid).

    At one point in the past, the Value class was designed so that it would remember how it was obtained and automatically reevaluate itself if needed.  However, this approach turned out to be quite difficult to debug since a relatively simple call could cause complicated chain of events.  The expression based approach is more explicit and thus allows better reasoning about the program - both in terms of behaviour and performance.

    Debugger's components

     - COM API:  The low-level unmanaged debugging API of the .NET framework.  It contains interfaces such as ICorDebug or ICorDebugManagedCallback.

     - COM wrappers:  Auto-generated thin layer over the COM one which makes it a bit easier to use.  It converts 'out' parameters to return values and tracks returned COM objects so that they can be explicitly released (this is necessary so that the debugger does not lock assemblies).  The layer also contains several hand-written methods that handle marshaling of strings and other objects.

     - NDebugger:  The debugging library itself.  It provides access to variables and types via reflection-like interface.  It provides commands for setting breakpoints, stepping and pretty much everything you would expect from debugger.

     - ExpressionEvaluator:  Extension on top of NDebugger which can evaluate C# expressions.  It depends on SharpDevelop's NRefactory.

     - AbstractTree:  This provides GUI-independent model for the tree that you can see in Local Variables pad or in the Tooltips.

     - GUI: The actual GUI in SharpDevelop.  This level connects SharpDevelop with the debugging library.

     - Visualizers: Extensions in the GUI.

     

     

  • SharpDevelop and the Google Summer of Code 2010

    Google Summer of Code 2010 has been officially announced and SharpDevelop intends to participate again.

    UPDATE: SharpDevelop has been offically accepted as a mentoring organization in Google Summer of Code 2010

    Google Summer of Code is an opportunity for students to earn 5000 USD over the summer by working on an open-source project.

    We have created a list of ideas, but you can work on anything you want as long as it is related to SharpDevelop.  You can find the list of ideas and any further information on the SharpDevelop wiki page for the Google Summer of Code.  The application deadline is 19:00 UTC on April 9th.  In the meantime, you can start coding and talking to us on the forums, via email or on the IRC (#SharpDevelop at irc.freenode.net).

    Please spread the word and tell your friends about the Summer of Code as well.

    If you have any project ideas for the students, feel free to mention them in the comments below the post.

    Posted Mar 20 2010, 02:00 PM by DavidSrbecky with no comments
    Filed under: ,
  • Accepted GSoC students

    These projects have been accepted into SharpDevelop. You can learn more about each project by visiting the links below.

    Student Title Mentor
    Siegfried Pammer XAML code completion for SharpDevelop Bernhard Spuida
    Philipp Maihart Application "Database Tools Add-In" Peter Forstmeier
    Martin Konicek Debugger visualizer for SharpDevelop David Srbecky
    Sergej Andrejev Common keyboard shortcuts handling and management in SharpDevelop Matt Ward
    Tomasz Tretkowski Integration of C++/CLI Daniel Grunwald


    Our goals for GSoC 2009 are simple:

    • provide students with a learning experience in advanced .net application programming
    • motivate students to become long term contributors
    • improve SharpDevelop in code areas the core team does not have time to work on currently

    We look forward to having a great experience with all involved.

    And don't worry, even if you were not selected for one of the slots available, we value your efforts and thoughts. If you want to contribute to SharpDevelop nevertheless, you are welcome and also will receive our support. Of course, there won't be money involved, but if you enjoy coding as much as we do, we will be glad to help and mentor you - just drop us a line to let us know and we will be there for you. If you don't, we look forward to seeing you around again in the next GSoC.

    Thanks for all your applications and great project proposals. It is wonderful for us to see what you are thinking and planning.

    Posted Apr 24 2009, 06:11 PM by DavidSrbecky with no comments
    Filed under:
  • SharpDevelop and the Google Summer of Code 2009

    SharpDevelop is participating in the Google Summer of Code this year.

    It is an opportunity for students to earn 4500 USD over the summer by working on an open-source project.

    We have created a list of ideas, but you can work basically on anything you want as long as it is relevant to SharpDevelop.  You can find the list of ideas and any further information on the SharpDevelop wiki page for the Google Summer of Code.  You can submit your application here.

    If you are not a student or if you do not have enough spare time, you can benefit as well.... share your ideas with us and maybe some student will pick them up and implement them.

    Feel free to contact us on the forums or via email.

    Please spread the word and tell your friends about the Summer of Code...

  • Part of debugger BSD licensed

    The SharpDevelop debugger consists of two parts: the Debugger.Core library and the Debugger.AddIn.

    The Debugger.Core library is high level library for debugging of .NET applications.  It is released under the LGPL license and it is intended to be used in other applications (including closed-source ones).

    Debugger.AddIn is the glue that connects the Debugger.Core library and SharpDevelop.  It implements the debugging GUI including the main menu commands.  As such it is a great example on how to use the debugger library and therefore we have licensed it under the BSD license so that it can be reused by anyone intending to use the debugger in closed-source application.

    Note that this applies only to the Montferrer 3.0 branch (ie the trunk).  The 2.x version of the debugger is intended to be deprecated as soon as the 3.0 version is stable.

Powered by Community Server (Commercial Edition), by Telligent Systems
Don't contact us via this (fleischfalle@alphasierrapapa.com) email address.