SharpDevelop Community

Get your problems solved!
Welcome to SharpDevelop Community Sign in | Join | Help
in Search

Unseekable output stream to zip to can not unzip due to Zip64

Last post 10-26-2008 12:59 PM by Stas. 5 replies.
Page 1 of 1 (6 items)
Sort Posts: Previous Next
  • 06-02-2008 8:36 PM

    Unseekable output stream to zip to can not unzip due to Zip64

    Hi,

    I am using SharpZipLib to zip to an output stream that happens to be unseekable.  Which then forces Zip64 on me. The problem, however is that when reading the resulting file using the ZipInputStream class (not the ZipFile class) the getNextEntry() method has no way of determining that Zip64 was used, because the compressedSize and Size in the header are set to 0, and at the time that putNextEntry in the ZipOutputStream set the extra data size it too was zero.  So no extra data existed to help determine at input-time that the file had  Zip64 information in it.  This caused the ZipInputStream to only read the two long values as two int values and thus was off by 8bytes causing the "Wrong Local header signature" exception.

    Below is sample code to replicate the problem for outputting a file.  IMPORTANT:  The underlying stream MUST be non-seekable. To replicate that I just created a tiny class (Class1) that extends FileStream and overrode the CanSeek method to return false. 

    public void zip(){ 

                String sourceDir = "C:\\Adrian Programming\\Test Data\\directory with sub directories\\subdirectory 2";

                Class1 fsOut = new Class1("C:\\Adrian Programming\\Test Data Output\\test.zip", FileMode.Create, FileAccess.Write);
                ZipOutputStream zipper = new ZipOutputStream(fsOut);

                int i = 0;
                List<String> files = new List<String>();
                foreach (String dir in Directory.GetDirectories(sourceDir)) {
                    files.Add(Path.GetFileName(dir));
                }

                foreach (String file in Directory.GetFiles(sourceDir)) {
                    files.Add(Path.GetFileName(file));
                }


                while (i < files.Count) {
                    FileInfo file = new FileInfo(Path.Combine(sourceDir, files[i]));

                    if ((file.Attributes & FileAttributes.Directory) == FileAttributes.Directory) {
                        DirectoryInfo di = new DirectoryInfo(file.FullName);

                        DirectoryInfo[ subDirs = di.GetDirectories();
                        FileInfo[ subFiles = di.GetFiles();

                        if (subDirs.Length == 0 && subFiles.Length == 0) {
                            ZipEntry ze = new ZipEntry(files[i] + "\\");
                            zipper.PutNextEntry(ze);
                            zipper.CloseEntry();
                        } else {
                            //list all files in this dir in the list
                            foreach (DirectoryInfo subdir in subDirs) {
                                files.Add(Path.Combine(files[i], subdir.Name));
                            }

                            foreach (FileInfo subfile in subFiles) {
                                files.Add(Path.Combine(files[i], subfile.Name));
                            }
                        }
                    } else {
                        FileStream fsIn = file.OpenRead();
                        ZipEntry ze = new ZipEntry(ZipEntry.CleanName(files[i]));

                        if (!TimeZone.CurrentTimeZone.IsDaylightSavingTime(file.LastWriteTime)) {
                            ze.DateTime = file.LastWriteTime.AddHours(1.0);
                        } else {
                            ze.DateTime = file.LastWriteTime;
                        }

                        zipper.PutNextEntry(ze);

                        int bufferInBytes = (int)file.Length;
                        byte[ buffer = new byte[bufferInBytes];
                        int len = 0;
                        while ((len = fsIn.Read(buffer, 0, bufferInBytes)) > 0) {
                            zipper.Write(buffer, 0, len);
                        }
                        fsIn.Close();
                        zipper.CloseEntry();
                    }

                    i++;
                }

                zipper.Close();

    Small notes about the above zip and code -- the folder being compressed contains directories and files and some of the directories are empty. Also for this example I've just hard coded the buffer size, because I know that no files in that folder happen to be above 10k so no memory issues.

    Below is the code used to extract the data which fails with the Wrong local header signature because it read in the value of the filesize as the header because of not detecting Zip64.

    public void unzip(){ 

                FileStream fsIn = new FileStream("C:\\Adrian Programming\\Test Data Output\\test.zip", FileMode.Open, FileAccess.Read);

                ZipInputStream zipInput = new ZipInputStream(fsIn);
                ZipEntry entry = zipInput.GetNextEntry();

                while (entry != null) {
                    String fileName = Path.Combine("C:\\Adrian Programming\\Test Data Output\\test", entry.Name);

                    if (entry.IsDirectory) {
                        Directory.CreateDirectory(fileName);
                    } else {
                        Directory.CreateDirectory(Path.GetDirectoryName(fileName));
                        FileStream curFsOut = new FileStream(fileName, FileMode.Create, FileAccess.Write);

                        int len = 0;
                        int bufferSize = (int)fsIn.Length;
                        byte[ buffer = new byte[bufferSize];
                        while ((len = zipInput.Read(buffer, 0, bufferSize)) > 0) {
                            curFsOut.Write(buffer, 0, len);
                        }

                        curFsOut.Close();
                    }

                    entry = zipInput.GetNextEntry();
                }

                zipInput.Close();

    }

    Now the result of this is that the above method for unzipping the file will fail, but if I were to use the ZipFile class and interate through the ZipEntries in that file this works.  The problem, however, is that the inputstream I am using for reading the zip file is also non-seekable, which the ZipFile class does not support.  So my only option is to use the ZipInputStream directly.

    Having dug into the code for SharpZipLib I tried many things to try to fix the problem and have finally come up with a solution.  The solution is in how the zip file is written and one place where the ZipFile class will need to be modified to account for it.

    In the ZipOutputStream.putNextEntry() method the following lines currently read:

                    // For local header both sizes appear in Zip64 Extended Information
                    if (entry.LocalHeaderRequiresZip64 && patchEntryHeader) {
                        WriteLeInt(-1);
                        WriteLeInt(-1);
                    } else {
                        WriteLeInt(0);    // Compressed size
                        WriteLeInt(0);    // Uncompressed size
                    }

    I changed this to (only change being the && to a ||):

                    // For local header both sizes appear in Zip64 Extended Information
                    if (entry.LocalHeaderRequiresZip64 || patchEntryHeader) {
                        WriteLeInt(-1);
                        WriteLeInt(-1);
                    } else {
                        WriteLeInt(0);    // Compressed size
                        WriteLeInt(0);    // Uncompressed size
                    }

     Then lower in that same routine currently:

    if (entry.LocalHeaderRequiresZip64 && (headerInfoAvailable || patchEntryHeader)) {
                    ed.StartNewEntry();
                    if (headerInfoAvailable) {
                        ed.AddLeLong(entry.Size);
                        ed.AddLeLong(entry.CompressedSize);
                    } else {
                        ed.AddLeLong(-1);
                        ed.AddLeLong(-1);
                    }
                    ed.AddNewEntry(1);

                    if (!ed.Find(1)) {
                        throw new ZipException("Internal error cant find extra data");
                    }

                    if (patchEntryHeader) {
                        sizePatchPos = ed.CurrentReadIndex;
                    }
                } else {
                    ed.Delete(1);
                }

    changed to (again with the && to ||):

     if (entry.LocalHeaderRequiresZip64 || (headerInfoAvailable || patchEntryHeader)) {
                    ed.StartNewEntry();
                    if (headerInfoAvailable) {
                        ed.AddLeLong(entry.Size);
                        ed.AddLeLong(entry.CompressedSize);
                    } else {
                        ed.AddLeLong(-1);
                        ed.AddLeLong(-1);
                    }
                    ed.AddNewEntry(1);

                    if (!ed.Find(1)) {
                        throw new ZipException("Internal error cant find extra data");
                    }

                    if (patchEntryHeader) {
                        sizePatchPos = ed.CurrentReadIndex;
                    }
                } else {
                    ed.Delete(1);
                }

    The reasons for this change is because the original way caused the entry to be written to the file without any indication at all that Zip64 was actually being used, so when the ZipInputStream read the first entry it reads it fine and decompresses it but the DataDescriptor at the end is read incorrectly due to thinking it is Zip32 (reading 4 bytes then 4 bytes instead of 8 then 8).  So the remaining 8 bytes of the Descriptor are what is next in the file and when it comes time to read the header for the next entry it is 8 bytes away and fails due to invalid header.

    Making this change fixes all of that.  And the zip file is not corrupted and can still be read by WinZip, WinRar, and Windows XP Explorer (the only three third-party apps I tested with).  Also as a test I tried to unzip the file using the ZipFile class in SharpZipLib, this began to fail with my changes.  I toyed with not changing the above locations and trying to make the InputStream read the file correctly, instead what I landed on was a small change to the ZipFile class too since the problem wasn't with the file at all but with the verification steps inside the TestLocalHeader method.  Here all I added was a way to skip the tests for size==entry.Size and compressedSize == entry.CompressedSize because the size read from the header would have been -1 for both and the size of the entry (due to now having the extradata populated) is the actual sizes of the file.  

    ZipFile.TestLocalHeader Code before: 

    // Extra data / zip64 checks
                    if (ed.Find(1)) {
                        // TODO Check for tag values being distinct..  Multiple zip64 tags means what?

                        // Zip64 extra data but 'extract version' is too low
                        if (extractVersion < ZipConstants.VersionZip64) {
                            throw new ZipException(
                                string.Format("Extra data contains Zip64 information but version {0}.{1} is not high enough",
                                extractVersion / 10, extractVersion % 10));
                        }

                        // Zip64 extra data but size fields dont indicate its required.
                        if (((uint)size != uint.MaxValue) && ((uint)compressedSize != uint.MaxValue)) {
                            throw new ZipException("Entry sizes not correct for Zip64");
                        }

                        size = ed.ReadLong();
                        compressedSize = ed.ReadLong();
                    } else {
                        // No zip64 extra data but entry requires it.
                        if ((extractVersion >= ZipConstants.VersionZip64) &&
                            (((uint)size == uint.MaxValue) || ((uint)compressedSize == uint.MaxValue))) {
                            throw new ZipException("Required Zip64 extended information missing");
                        }
                    }

    Code After:

     // Extra data / zip64 checks
                    if (ed.Find(1)) {
                        // TODO Check for tag values being distinct..  Multiple zip64 tags means what?

                        // Zip64 extra data but 'extract version' is too low
                        if (extractVersion < ZipConstants.VersionZip64) {
                            throw new ZipException(
                                string.Format("Extra data contains Zip64 information but version {0}.{1} is not high enough",
                                extractVersion / 10, extractVersion % 10));
                        }

                        // Zip64 extra data but size fields dont indicate its required.
                        if (((uint)size != uint.MaxValue) && ((uint)compressedSize != uint.MaxValue)) {
                            throw new ZipException("Entry sizes not correct for Zip64");
                        }

                        size = ed.ReadLong();
                        compressedSize = ed.ReadLong();

                        size = (size == -1 ? 0 : size);
                        compressedSize = (compressedSize == -1 ? 0 : compressedSize);
                    } else {
                        // No zip64 extra data but entry requires it.
                        if ((extractVersion >= ZipConstants.VersionZip64) &&
                            (((uint)size == uint.MaxValue) || ((uint)compressedSize == uint.MaxValue))) {
                            throw new ZipException("Required Zip64 extended information missing");
                        }
                    }

    Above the change is setting the size to 0 if it read a -1 from the stream.  This is so that later in that same routine the checks for equality on these fields are skipped.  Another option could be to simply set those the actual value found in entry.Size and entry.CompressedSize and allow the equality checks to work -- I like this way better because it skips unnecessary equality checking when you know it will be the same.

    All of the above changes fixes everything I had and still allows for files >4GB to be compressed and decompressed to and from non-seekable input and output streams.

    -Whew.. I know that was long, I hope you're still reading!

    BTW. GREAT job on this library it makes my life so much easier -- this recent headache not withstanding ;)

  • 06-08-2008 11:11 AM In reply to

    Re: Unseekable output stream to zip to can not unzip due to Zip64

    Great work!  Thanks for the feedback.

    I even got to the still reading bit :-)

    I have updated the code based largely on what you have presented here.

     

    Cheers, -jr-

     

  • 07-29-2008 2:47 PM In reply to

    Re: Unseekable output stream to zip to can not unzip due to Zip64

    Any idea on when this code update will be out - caus I am finding myself unable to use the library at the current moment.

     regards

    Anders 

  • 08-07-2008 12:53 PM In reply to

    Re: Unseekable output stream to zip to can not unzip due to Zip64

    Very soon.  Before the end of the ice age for sure :-)

  • 10-25-2008 6:45 PM In reply to

    • Stas
    • Not Ranked
    • Joined on 02-29-2008
    • Posts 5

    Re: Unseekable output stream to zip to can not unzip due to Zip64

    Something is wrong with this code:

        // Extra data / zip64 checks
        if (localExtraData.Find(1))
        {
         // TODO Check for tag values being distinct..  Multiple zip64 tags means what?

         // Zip64 extra data but 'extract version' is too low
         if (extractVersion < ZipConstants.VersionZip64)
         {
          throw new ZipException(
           string.Format("Extra data contains Zip64 information but version {0}.{1} is not high enough",
           extractVersion / 10, extractVersion % 10));
         }

    Because of it, I cannot access entries of a certain .zip (actually, .jar) file, which is handled perfectly by WinZip, WinRar, and other programs. I'm getting an error in this line of code:

    Dim zStream As Stream = zFile.GetInputStream(zEntry)

    This is file info from WinZip:

     Testing ...
    Current Location part 1 offset 266878
    Archive: C:\...\BDMV\JAR\00000.jar   266900 bytes   2008-07-22 04:30:48
    End central directory record PK0506 (4+18)
    ==========================================
        current  location of end-of-central-dir record: 266878 (0x0004127e) bytes
        expected location of end-of-central-dir record: 266878 (0x0004127e) bytes
          based on the size of the central directory of
          19099 and its relative offset of 247779 bytes
        part number of this part (0000):                1
        part number of start of central dir (0000):     1
        number of entries in central dir in this part:  267
        total number of entries in central dir:         267
        size of central dir:                            19099 (0x00004a9b) bytes
        relative offset of central dir:                 247779 (0x0003c7e3) bytes
        zipfile comment length:                         0
    Current Location part 1 offset 247779
    Central directory entry PK0102 (4+42): #1
    ======================================
        part number in which file begins (0000):        1
        relative offset of local header:                0 (0x00000000) bytes
        version made by operating system (11):          NTFS
        version made by zip software (23):              2.3
        operat. system version needed to extract (00):  MS-DOS, OS/2, NT FAT
        unzip software version needed to extract (20):  2.0
        general purpose bit flag (0x0000) (bit 15..0):  0000.0000 0000.0000
          file security status  (bit 0):                not encrypted
          extended local header (bit 3):                no
        compression method (08):                        deflated
          compression sub-type (deflation):             normal
        file last modified on (0x000038f5 0x00005354):  2008-07-21 10:26:40
        32-bit CRC value:                               0xf5c3a4e5
        compressed size:                                350 bytes
        uncompressed size:                              661 bytes
        length of filename:                             7 characters
        length of extra field:                          17 bytes
        length of file comment:                         0 characters
        internal file attributes:                       0x0000
          apparent file type:                           binary
        external file attributes:                       0x81b60020
          non-MSDOS external file attributes:           0x81b600
          MS-DOS file attributes (0x20):                arc
    Current Location part 1 offset 247825
        filename:a.class
    Current Location part 1 offset 247832
        extra field 0x4453 (Security Descriptor), 4 header and 4 data bytes:
        bc 00 00 00                                     ¼...
        extra field 0x5455 (universal time), 4 header and 5 data bytes:
        07 d0 c6 84 48                                  .ÐÆ„H
    Current Location part 1 offset 247849
    Central directory entry PK0102 (4+42): #2
    ======================================
        part number in which file begins (0000):        1

     And for local header:

    Current Location part 1 offset 0
    Local directory entry PK0304 (4+26): #1
    ------------------------------------
        operat. system version needed to extract (00):  MS-DOS, OS/2, NT FAT
        unzip software version needed to extract (20):  2.0
        general purpose bit flag (0x0000) (bit 15..0):  0000.0000 0000.0000
          file security status  (bit 0):                not encrypted
          extended local header (bit 3):                no
        compression method (08):                        deflated
          compression sub-type (deflation):             normal
        file last modified on (0x000038f5 0x00005354):  2008-07-21 10:26:40
        32-bit CRC value:                               0xf5c3a4e5
        compressed size:                                350 bytes
        uncompressed size:                              661 bytes
        length of filename:                             7 characters
        length of extra field:                          142 bytes
    Current Location part 1 offset 30
        filename:a.class
    Current Location part 1 offset 37
        extra field 0x4453 (Security Descriptor), 4 header and 101 data bytes:
        bc 00 00 00 00 08 00 d0 ad fd 5a 63 64 60 69 10 ¼......ЭýZcd`i.
        61 60 60 30 60 80 00 1f 20 66 64 05 33 59 45 81 a``0`... fd.3YE.
        84 b7 ce 1e a9 1f f3 b7 94 38 8b 73 19 7d 61 c4 „·Î.©.ó·”8‹s.}aÄ
        2d c7 c8 c4 c0 c0 c4 50 c0 c0 02 96 96 60 f8 cf -ÇÈÄÀÀÄPÀÀ.––`øÏ
        28 cf 00 12 03 a9 55 00 12 0a 60 b6 08 44 9c 11 (Ï...©U...`¶.Dœ.
        22 2e 04 a6 54 20 62 78 ec 04 99 b7 92 41 08 c5 "..¦T bxì..·’A.Å
        3c 45 20 1b 00                                  <E ..
        extra field 0x5455 (universal time), 4 header and 13 data bytes:
        07 d0 c6 84 48 c5 c7 84 48 d0 c6 84 48          .ÐÆ„HÅÇ„HÐÆ„H
        extra field 0x0001 (ZIP64 Tag), 4 header and 16 data bytes:
        95 02 00 00 00 00 00 00 5e 01 00 00 00 00 00 00 •.......^.......
        ZIP64 Tag Value(s):
          Value #1: 661
          Value #2: 350
    Current Location part 1 offset 179
    Testing a.class                  OK

     

    As you can see, this file does contain Zip64 extentions, but the program which reads the file does not have to support them, as SharpZipLib's code implies.

    You can download the file for testing from here.

  • 10-26-2008 12:59 PM In reply to

    • Stas
    • Not Ranked
    • Joined on 02-29-2008
    • Posts 5

    Re: Unseekable output stream to zip to can not unzip due to Zip64

    OK, I changed the code as below, recompiled the dll, and now it works perfectly! :)

                     // Extra data / zip64 checks
                    if (localExtraData.Find(1))
                    {
                        if (((uint)size == uint.MaxValue) || ((uint)compressedSize == uint.MaxValue))
                        {
                            size = localExtraData.ReadLong();
                            compressedSize = localExtraData.ReadLong();

                            if ((localFlags & (int)GeneralBitFlags.Descriptor) != 0)
                            {
                                // These may be valid if patched later
                                if ( (size != -1) && (size != entry.Size)) {
                                    throw new ZipException("Size invalid for descriptor");
                                }

                                if ((compressedSize != -1) && (compressedSize != entry.CompressedSize)) {
                                    throw new ZipException("Compressed size invalid for descriptor");
                                }
                            }
                        }
                    }
                    else
                    {
                        // No zip64 extra data but entry requires it.
                        if (((uint)size == uint.MaxValue) || ((uint)compressedSize == uint.MaxValue))
                        {
                            throw new ZipException("Required Zip64 extended information missing");
                        }
                    }

Page 1 of 1 (6 items)
Powered by Community Server (Commercial Edition), by Telligent Systems
Don't contact us via this (fleischfalle@alphasierrapapa.com) email address.