We have spent one day during #d^3 trying to make the SharpDevelop text editor (AvalonEdit) faster. Give the latest version a try - it should be much faster in most of the cases.
As you might know, WPF is being rendered using DirectX. The managed part of WPF prepares a retained visual tree which is then rendered using native engine with DirectX. We, of course, do not have access to the source code and the code is native so ILSpy can not help us either. However, we can use PIX to find a little bit about what is going on under the covers. PIX captures all DirectX calls made by the application and allows the user to analyse them. Is it pretty much like a debugger for the graphics card.
Here you can see the capture right in the middle of rendering.
First of all, note that the rendering is not complete yet. PIX shows us the state of the frame buffer at the given time. The next thing to be rendered (using the DrawPrimitive command) will be "00". You can see the black and white "00" stored in the temporary surface (texture). You can see the quad which will be used to render it. Let's go through the commands that follow. Once the "00" is rendered, the colour for the following text (",") will be set (SetRenderState, SetPixelShaderConstantF). Then the surface (which lives in system memory) will be locked, filled with the bitmap for "," and unlocked. Once this is done, the surface can be copied to GPU (UpdateSurface) and finally rendered (DrawPrimitive). We move on to the next segment and so on.
The rendering is nice and simple, but it definitely is not what I would have expected. It just seems too slow.
- First of all, WPF does not really seem to be rendering the text in hardware. The black and white bitmap is created in system memory for each text segment and then copied to GPU where it is copied again to the appropriate position in buffer. The GPU is using pixel shader so it does a little bit of additional processing but the main bulk of work seems to be done on CPU.
- DirectX calls are expensive because they need to be passed to the driver in kernel. It is therefore important to have as few calls as possible. Game engine programmers (my day job) go to great lengths to minimize the number of such calls. And yet WPF uses 8 calls just to render single text run. The total number of calls for this particular page is 22493. It should be possible in just couple of dozen by batching draw calls together. State changes are equally evil (SetRenderState, SetPixelShaderConstantF). (Direct2D&DirectWrite does some batching)
- I am really worried about the surface Lock/Unlock. GPU is usually lagging behind the CPU - sometimes even by several frames. The code modifies the first surface which is in system memory, issues a copy command from CPU to GPU, issues draw command and then tries to lock it so that it can put the next text in it. However, it can not overwrite it until the previous commands are finished, so the CPU has to wait - potentially long time. I do not know whether DirectX does some tricks to avoid this.
- There will be exactly one memory copy from CPU to GPU for each text segment. It might make more sense to just copy several at a time into one big surface. It would also make sense to keep some texts around as cache so that they do not have to be copied next time again. The natural approach would be to just copy the whole alphabet into a texture and use that.
So how does this help us to make AvalonEdit faster? The most important observation is that WPF issues a lot of calls even though we have prepared every line into WPF TextLine and invalidate it only when the line is changed. We knew that creation of the line was expensive, but it turns out that just rendering it is quite expensive as well. There does not seem to be any caching.
WPF does not repaint the whole window. It only repaints the visuals that have been modified. In our case the visual is the AvalonEdit text layer so the whole text is repainted. To fix this, we have used one DrawingVisual for each line and invalidate it only when the line changes. This separation means that lines which were not changed will not be repainted. Obvious, but not so simple to achieve in WPF. The performance improvement depends on the actual text in the editor. In our (very demanding) test case we saw an improvement of factor of 20 (on the average case it will probably not be that impressive).
Here is the example of just a single line being rendered (just after the initial Clear method).