SharpDevelop Community

Get your problems solved!
Welcome to SharpDevelop Community Sign in | Join | Help
in Search

Daniel Grunwald

March 2010 - Posts

  • File Encoding in SharpDevelop 4.0

    Today I have implemented support for choosing the file encoding when loading and saving files inside SharpDevelop.

    The option to specify an encoding while opening a file is placed in the "Open With" dialog. In earlier SharpDevelop versions, this dialog was available only in the project browser's context menu; in SharpDevelop 4 you can now "open with" any file using the main menu:

    Inside the "Open With" dialog, pick the entry "Text editor (choose encoding)".

    After that, SharpDevelop will prompt you to specify the file encoding to use for loading the file. The initial selection of the combo box is the automatically detected encoding that SharpDevelop would normally use.

    How does encoding auto-detection work? First, SharpDevelop checks whether the file has a byte order mark for UTF-8, UTF-16 or UTF-32. If it has, the encoding is trivially detected. Otherwise, SharpDevelop will parse the file and check whether the file is valid UTF-8. If it is and has some bytes >=128, the file is auto-detected to be UTF-8. The UTF-8 encoding is quite restrictive, so false positives are rare.

    If the file is invalid UTF-8, or if it is a plain ASCII file (no bytes >=128, which is always also valid UTF-8), then we will avoid reading the file as Unicode. Instead, we pick the encoding specified as default in the SharpDevelop Load/Save options. However if that encoding happens to be a Unicode encoding (it'll be UTF-8 by default), then we pick the current Windows ANSI codepage instead. This is done because we already detected the file to be non-UTF8.

    This means when loading a plain ASCII file and with the SharpDevelop encoding setting of UTF-8, we will still fall back to the ANSI codepage. The reason for this is simple: SharpDevelop always includes the byte order mark when saving files, and we want to avoid adding one to plain ASCII files.

    When saving a file, SharpDevelop will always use the encoding that was detected (or specified) when loading the file. Starting with SharpDevelop 4, you can use "File > Save with encoding" to save the file using a different encoding instead.

    A related feature that already existed in SharpDevelop 3.x is that SharpDevelop will warn you if a text file cannot be saved using the current encoding:

    In this screen shot, Martin's last name cannot be represented in the file's current encoding. Clicking on "Continue" would replace the 'č' with a 'c'.

    So let's summarize what the "default file encoding" option in the Load/Save options panel does: It's main job, of course, is to specify the encoding used for new files created with SharpDevelop. As a side role, if you use a non-Unicode encoding for this option, then files that were auto-detected as non-Unicode will be opened using that encoding.

Powered by Community Server (Commercial Edition), by Telligent Systems
Don't contact us via this ( email address.