SharpDevelop Community

Get your problems solved!
Welcome to SharpDevelop Community Sign in | Join | Help
in Search

Question about MSZIP

Last post 09-08-2016 3:57 PM by Rmanson. 9 replies.
Page 1 of 1 (10 items)
Sort Posts: Previous Next
  • 06-04-2015 7:13 PM

    • penev
    • Not Ranked
    • Joined on 06-04-2015
    • Posts 6

    Question about MSZIP

    Hello.

    I am trying to extract files from a Microsoft .CAB file. I read the file format specifications (https://msdn.microsoft.com/en-us/library/bb417343.aspx#cabinet_format) and now I have a byte array for each file in the CAB.

    The data, however, is compressed with MSZIP (and indeed adheres to the specifications at http://download.microsoft.com/download/5/D/D/5DD33FDF-91F5-496D-9884-0A0B0EE698BB/%5BMS-MCI%5D.pdf).

    I already use SharpZipLib in my project for dealing with ZIP files. My question is can I use it to decompress the data I got from the CAB file.

    Thanks!

  • 06-08-2015 4:03 AM In reply to

    Re: Question about MSZIP

    Hi,

    I haven't come across this question before. Looks to me from the PDF that the 0x43 0x4B at the start of each block is unique to that MSZIP. And this is not the same as PKZip/Winzip.

    Taking the PDF and the RFC 1951 together, it looks to me that Microsoft have stuck the 0x43 0x4B in front of a standard DEFLATE block. (Well, possibly more than one. At first glance it wasn't obvious how multiple blocks are separated).

    We have used #ziplib before to unpack pure DEFLATE blacks, so if you can get hold of the stream of the right length, it should be straightforward.

    Have a look at the "PDF flatedecode decompression with #ziplib?" topic at http://community.sharpdevelop.net/forums/t/1009.aspx   the post with the "FlateDecode" method should be perfect for use here.

    As i say, the only trick would be getting hold the right block. I would go for the bytes immediately folowing the 0x43 0x4B.

    Let me know how it goes; I'll be keen to include this in the wiki if it works.

    Regards,
    David

     

     

  • 06-09-2015 2:45 PM In reply to

    • penev
    • Not Ranked
    • Joined on 06-04-2015
    • Posts 6

    Re: Question about MSZIP

    Thanks!

    Using that method, with some alterations, I managed to inflate the files to the expected size. (I remove the CFData 0x43 0x4B signature before inflating, of course).

    But it turned out the MD5 of the extracted file is different than the MD5 of the same file when I extract it manually. I did some investigation, and it turned out the inflater is giving me wrong bytes.

    In short, what I do is:

     - Take all CFData entries for a given CFFolder.

     - Foreach over them call `FlateDecode()` for every one of them with the entry's Data (as specified in the PDF).

     - Once I have the inflated data from the CFData entry (say a byte[N]), I read N bytes from the same file, extracted by hand, and compare both arrays byte by byte for missmatches.

     

    The first CFData entry inflated and compared with no issues. The second, however, produced this:

    After some debugging it turned out that all the missmatching bytes (at least as far as I could test) were 0s when they were supposed to be something else.

    Here is my slightly modified inflation method:

                public static byte[] FlateDecode(byte[] inp, bool strict)
                {
                    byte[] inflatedData;

                    using (var stream = new MemoryStream(inp))
                    using (var inflater = new InflaterInputStream(stream, new Inflater(true)))
                    using (var outputStream = new MemoryStream())
                    {
                        var b = new byte[strict ? 4092 : 1];
                        try
                        {
                            int n;

                            while ((n = inflater.Read(b, 0, b.Length)) > 0)
                                outputStream.Write(b, 0, n);

                            inflatedData = outputStream.ToArray();

                            inflater.Close();
                            outputStream.Close();
                        }
                        catch
                        {
                            inflatedData = new byte[0];
                            if (!strict)
                                inflatedData = outputStream.ToArray();

                            inflater.Close();
                            outputStream.Close();
                        }
                    }

                    return inflatedData;
                }

     

    I can confirm that `inflater.Read(b, 0, b.Length)` reads 0 for the three bytes in the screenshot above, instead of the expected 66, 49, 35.

    Any suggestions would be much appreciated.

  • 08-02-2015 1:32 PM In reply to

    • penev
    • Not Ranked
    • Joined on 06-04-2015
    • Posts 6

    Re: Question about MSZIP

    Hello again.

    I finally got around to picking this up again yesterday, and after some debugging I found the source of (but not the reason for) the error.

    Looking at ICSharpCode.SharpZipLib.Zip.Compression.Inflater.DecodeHuffman(), line 316, the call to litlenTree.GetSymbol(input) gives me symbol = 257. This breaks the loop and execution goes all the way to the bottom of the method.

    In the end it gets to line 399 - outputWindow.Repeat(repLength, repDist);  and then tries to copy 3 symbols from nearly the end of window[.

    But since this is the fourth symbol in this series, all but the first 3 symbols are still zeros (still empty), so it copies 3 zeros, which is wrong.

    Now, this is what the code does. I looked around and from what I could gather from https://msdn.microsoft.com/en-us/library/bb417343.asp 257 is a "special" symbol for LZX compression, but the part about MSZIP doesn't mention it.

     

    At this point I am stuck because I know nothing of the decoding itself, and I couldn't say if this is an issue with the code not respecting some custom MSZIP rules or anything.

  • 08-03-2015 2:46 AM In reply to

    Re: Question about MSZIP

    Hi,

    Interesting about the possible LZW in there. That link to https://msdn.microsoft.com/en-us/library/bb417343.asp isn't working for me, it comes with a no such page. Can you check that link ?

    I had to look up LZW because I really wasn't sure if we support it in the standard Zip code,
    Wiki says "LZW was used in the public-domain program compress, which became a more or less standard utility in Unix systems circa 1986. It has since disappeared from many distributions, both because it infringed the LZW patent and because gzip produced better compression ratios using the LZ77-based DEFLATE algorithm"

    it also says it's out of patent now, but if it really is LZW, I'm not sure why MS would use it over LZ77.

  • 08-03-2015 3:09 AM In reply to

    • penev
    • Not Ranked
    • Joined on 06-04-2015
    • Posts 6

    Re: Question about MSZIP

    Well the link is  https://msdn.microsoft.com/en-us/library/bb417343.aspx#cabinet_format

    It talks about LZX ( https://en.wikipedia.org/wiki/LZX_%28algorithm%29 ), not LZW ( https://en.wikipedia.org/wiki/Lempel–Ziv–Welch ).

    Shall I take it you support LZX (LZ77)? And is it possible the errors I'm getting is because the inflater is unaware of something MSZIP-specific and handles that `257` incorrectly?

  • 08-03-2015 3:39 AM In reply to

    Re: Question about MSZIP

    Thanks for the heads-up, yes i was misreading it is LZW, not LZX which is new to me. Will check it.

  • 08-18-2015 2:36 AM In reply to

    • penev
    • Not Ranked
    • Joined on 06-04-2015
    • Posts 6

    Re: Question about MSZIP

    Hey, have you had a chance to look into this?

     

  • 11-19-2015 12:38 PM In reply to

    • penev
    • Not Ranked
    • Joined on 06-04-2015
    • Posts 6

    Re: Question about MSZIP

    Hello. It has been a while and I decided to give this another try. Unfortunately I'm still stuck at the same place as last time. Would you be able to find the time to help me out please?

  • 09-08-2016 3:57 PM In reply to

    Re: Question about MSZIP

    i know this is kind of old but in case it's worth anything to you or anyone else here's what i know.

    each block in an mszip file is the result of separate deflate operation, the decoding trees are discarded and the end of the block is marked as the end of the stream BUT the history buffer is maintained between operations and so must also be maintained when decompressing the blocks

Page 1 of 1 (10 items)
Powered by Community Server (Commercial Edition), by Telligent Systems
Don't contact us via this (fleischfalle@alphasierrapapa.com) email address.