Hi,
I am using SharpZipLib to zip to an output stream that happens to be unseekable. Which then forces Zip64 on me. The problem, however is that when reading the resulting file using the ZipInputStream class (not the ZipFile class) the getNextEntry() method has no way of determining that Zip64 was used, because the compressedSize and Size in the header are set to 0, and at the time that putNextEntry in the ZipOutputStream set the extra data size it too was zero. So no extra data existed to help determine at input-time that the file had Zip64 information in it. This caused the ZipInputStream to only read the two long values as two int values and thus was off by 8bytes causing the "Wrong Local header signature" exception.
Below is sample code to replicate the problem for outputting a file. IMPORTANT: The underlying stream MUST be non-seekable. To replicate that I just created a tiny class (Class1) that extends FileStream and overrode the CanSeek method to return false.
public void zip(){
String sourceDir = "C:\\Adrian Programming\\Test Data\\directory with sub directories\\subdirectory 2";
Class1 fsOut = new Class1("C:\\Adrian Programming\\Test Data Output\\test.zip", FileMode.Create, FileAccess.Write);
ZipOutputStream zipper = new ZipOutputStream(fsOut);
int i = 0;
List<String> files = new List<String>();
foreach (String dir in Directory.GetDirectories(sourceDir)) {
files.Add(Path.GetFileName(dir));
}
foreach (String file in Directory.GetFiles(sourceDir)) {
files.Add(Path.GetFileName(file));
}
while (i < files.Count) {
FileInfo file = new FileInfo(Path.Combine(sourceDir, files[i]));
if ((file.Attributes & FileAttributes.Directory) == FileAttributes.Directory) {
DirectoryInfo di = new DirectoryInfo(file.FullName);
DirectoryInfo[ subDirs = di.GetDirectories();
FileInfo[ subFiles = di.GetFiles();
if (subDirs.Length == 0 && subFiles.Length == 0) {
ZipEntry ze = new ZipEntry(files[i] + "\\");
zipper.PutNextEntry(ze);
zipper.CloseEntry();
} else {
//list all files in this dir in the list
foreach (DirectoryInfo subdir in subDirs) {
files.Add(Path.Combine(files[i], subdir.Name));
}
foreach (FileInfo subfile in subFiles) {
files.Add(Path.Combine(files[i], subfile.Name));
}
}
} else {
FileStream fsIn = file.OpenRead();
ZipEntry ze = new ZipEntry(ZipEntry.CleanName(files[i]));
if (!TimeZone.CurrentTimeZone.IsDaylightSavingTime(file.LastWriteTime)) {
ze.DateTime = file.LastWriteTime.AddHours(1.0);
} else {
ze.DateTime = file.LastWriteTime;
}
zipper.PutNextEntry(ze);
int bufferInBytes = (int)file.Length;
byte[ buffer = new byte[bufferInBytes];
int len = 0;
while ((len = fsIn.Read(buffer, 0, bufferInBytes)) > 0) {
zipper.Write(buffer, 0, len);
}
fsIn.Close();
zipper.CloseEntry();
}
i++;
}
zipper.Close();
}
Small notes about the above zip and code -- the folder being compressed contains directories and files and some of the directories are empty. Also for this example I've just hard coded the buffer size, because I know that no files in that folder happen to be above 10k so no memory issues.
Below is the code used to extract the data which fails with the Wrong local header signature because it read in the value of the filesize as the header because of not detecting Zip64.
public void unzip(){
FileStream fsIn = new FileStream("C:\\Adrian Programming\\Test Data Output\\test.zip", FileMode.Open, FileAccess.Read);
ZipInputStream zipInput = new ZipInputStream(fsIn);
ZipEntry entry = zipInput.GetNextEntry();
while (entry != null) {
String fileName = Path.Combine("C:\\Adrian Programming\\Test Data Output\\test", entry.Name);
if (entry.IsDirectory) {
Directory.CreateDirectory(fileName);
} else {
Directory.CreateDirectory(Path.GetDirectoryName(fileName));
FileStream curFsOut = new FileStream(fileName, FileMode.Create, FileAccess.Write);
int len = 0;
int bufferSize = (int)fsIn.Length;
byte[ buffer = new byte[bufferSize];
while ((len = zipInput.Read(buffer, 0, bufferSize)) > 0) {
curFsOut.Write(buffer, 0, len);
}
curFsOut.Close();
}
entry = zipInput.GetNextEntry();
}
zipInput.Close();
}
Now the result of this is that the above method for unzipping the file will fail, but if I were to use the ZipFile class and interate through the ZipEntries in that file this works. The problem, however, is that the inputstream I am using for reading the zip file is also non-seekable, which the ZipFile class does not support. So my only option is to use the ZipInputStream directly.
Having dug into the code for SharpZipLib I tried many things to try to fix the problem and have finally come up with a solution. The solution is in how the zip file is written and one place where the ZipFile class will need to be modified to account for it.
In the ZipOutputStream.putNextEntry() method the following lines currently read:
// For local header both sizes appear in Zip64 Extended Information
if (entry.LocalHeaderRequiresZip64 && patchEntryHeader) {
WriteLeInt(-1);
WriteLeInt(-1);
} else {
WriteLeInt(0); // Compressed size
WriteLeInt(0); // Uncompressed size
}
I changed this to (only change being the && to a ||):
// For local header both sizes appear in Zip64 Extended Information
if (entry.LocalHeaderRequiresZip64 || patchEntryHeader) {
WriteLeInt(-1);
WriteLeInt(-1);
} else {
WriteLeInt(0); // Compressed size
WriteLeInt(0); // Uncompressed size
}
Then lower in that same routine currently:
if (entry.LocalHeaderRequiresZip64 && (headerInfoAvailable || patchEntryHeader)) {
ed.StartNewEntry();
if (headerInfoAvailable) {
ed.AddLeLong(entry.Size);
ed.AddLeLong(entry.CompressedSize);
} else {
ed.AddLeLong(-1);
ed.AddLeLong(-1);
}
ed.AddNewEntry(1);
if (!ed.Find(1)) {
throw new ZipException("Internal error cant find extra data");
}
if (patchEntryHeader) {
sizePatchPos = ed.CurrentReadIndex;
}
} else {
ed.Delete(1);
}
changed to (again with the && to ||):
if (entry.LocalHeaderRequiresZip64 || (headerInfoAvailable || patchEntryHeader)) {
ed.StartNewEntry();
if (headerInfoAvailable) {
ed.AddLeLong(entry.Size);
ed.AddLeLong(entry.CompressedSize);
} else {
ed.AddLeLong(-1);
ed.AddLeLong(-1);
}
ed.AddNewEntry(1);
if (!ed.Find(1)) {
throw new ZipException("Internal error cant find extra data");
}
if (patchEntryHeader) {
sizePatchPos = ed.CurrentReadIndex;
}
} else {
ed.Delete(1);
}
The reasons for this change is because the original way caused the entry to be written to the file without any indication at all that Zip64 was actually being used, so when the ZipInputStream read the first entry it reads it fine and decompresses it but the DataDescriptor at the end is read incorrectly due to thinking it is Zip32 (reading 4 bytes then 4 bytes instead of 8 then 8). So the remaining 8 bytes of the Descriptor are what is next in the file and when it comes time to read the header for the next entry it is 8 bytes away and fails due to invalid header.
Making this change fixes all of that. And the zip file is not corrupted and can still be read by WinZip, WinRar, and Windows XP Explorer (the only three third-party apps I tested with). Also as a test I tried to unzip the file using the ZipFile class in SharpZipLib, this began to fail with my changes. I toyed with not changing the above locations and trying to make the InputStream read the file correctly, instead what I landed on was a small change to the ZipFile class too since the problem wasn't with the file at all but with the verification steps inside the TestLocalHeader method. Here all I added was a way to skip the tests for size==entry.Size and compressedSize == entry.CompressedSize because the size read from the header would have been -1 for both and the size of the entry (due to now having the extradata populated) is the actual sizes of the file.
ZipFile.TestLocalHeader Code before:
// Extra data / zip64 checks
if (ed.Find(1)) {
// TODO Check for tag values being distinct.. Multiple zip64 tags means what?
// Zip64 extra data but 'extract version' is too low
if (extractVersion < ZipConstants.VersionZip64) {
throw new ZipException(
string.Format("Extra data contains Zip64 information but version {0}.{1} is not high enough",
extractVersion / 10, extractVersion % 10));
}
// Zip64 extra data but size fields dont indicate its required.
if (((uint)size != uint.MaxValue) && ((uint)compressedSize != uint.MaxValue)) {
throw new ZipException("Entry sizes not correct for Zip64");
}
size = ed.ReadLong();
compressedSize = ed.ReadLong();
} else {
// No zip64 extra data but entry requires it.
if ((extractVersion >= ZipConstants.VersionZip64) &&
(((uint)size == uint.MaxValue) || ((uint)compressedSize == uint.MaxValue))) {
throw new ZipException("Required Zip64 extended information missing");
}
}
Code After:
// Extra data / zip64 checks
if (ed.Find(1)) {
// TODO Check for tag values being distinct.. Multiple zip64 tags means what?
// Zip64 extra data but 'extract version' is too low
if (extractVersion < ZipConstants.VersionZip64) {
throw new ZipException(
string.Format("Extra data contains Zip64 information but version {0}.{1} is not high enough",
extractVersion / 10, extractVersion % 10));
}
// Zip64 extra data but size fields dont indicate its required.
if (((uint)size != uint.MaxValue) && ((uint)compressedSize != uint.MaxValue)) {
throw new ZipException("Entry sizes not correct for Zip64");
}
size = ed.ReadLong();
compressedSize = ed.ReadLong();
size = (size == -1 ? 0 : size);
compressedSize = (compressedSize == -1 ? 0 : compressedSize);
} else {
// No zip64 extra data but entry requires it.
if ((extractVersion >= ZipConstants.VersionZip64) &&
(((uint)size == uint.MaxValue) || ((uint)compressedSize == uint.MaxValue))) {
throw new ZipException("Required Zip64 extended information missing");
}
}
Above the change is setting the size to 0 if it read a -1 from the stream. This is so that later in that same routine the checks for equality on these fields are skipped. Another option could be to simply set those the actual value found in entry.Size and entry.CompressedSize and allow the equality checks to work -- I like this way better because it skips unnecessary equality checking when you know it will be the same.
All of the above changes fixes everything I had and still allows for files >4GB to be compressed and decompressed to and from non-seekable input and output streams.
-Whew.. I know that was long, I hope you're still reading!
BTW. GREAT job on this library it makes my life so much easier -- this recent headache not withstanding ;)