SharpDevelop Community

Get your problems solved!
Welcome to SharpDevelop Community Sign in | Join | Help
in Search

Getting the most from SharpZipLib

Last post 01-11-2011 9:05 AM by DavidPierson. 3 replies.
Page 1 of 1 (4 items)
Sort Posts: Previous Next
  • 01-07-2011 12:44 AM

    Getting the most from SharpZipLib

    We've been using SharpZipLib on our project for a while now. Among one or two other uses, our closed-box application uses SharpZipLib to collect log files from a variety of places into a support zip. Along the way, there were things we couldn't figure out how to do and punted on them. We're now going back and trying to see what we can do about them.

    Right now, our code looks roughly like this - exception handling, return value logic, etc. are trimmed out for clarity:

    Create() {
    zipfile = ZipFile.Create(_name);
    _zipfile.BeginUpdate();
    }

    AddEntry(string name, string filepath){
    _zipfile.Add(filepath, name);

    Dispose() {
    _zipfile.CommitUpdate();
    _zipfile.Close();
    }

    Brain-dead simple, right?

    Issues:

    • This implementation currently does everything interesting during the CommitUpdate() and gives me no progress feedback.
    • I can't use FastZip to get feedback because the files I am collecting are spread out here and there all over the file system. 
    • Because the zip gets to be quite large, I need to create it in one shot, not have it repeatedly copied and grown, so I can't break up the operation. 
    • It doesn't read from files with the sharing bits set, so I can't grab logs that are open for write, for instance. I have to copy them first. Since I have a lot of files, this is a lot of copying and it takes significant time.

    I would like to be able to do the following:

     

    • Open files with read sharing. 
    • Get some kind of event as each file is opened.
    • Get notification of any file open failures that I log without aborting the process of adding files to the zip.
    • Copy files from all over the disks into a single zip file.
    • In a second pass, without re-copying the entire zip, add a file that includes a list of the paths of the files it couldn't copy and why each one couldn't be opened.
    I also have the following questions:
    • Handles: Does SharpZipLib open the file handles at AddEntry time or right before it processes the file inside CommitUpdate?
    • Errors: If I want to append an error file to the zip after the rest of the processing is complete, is there a way I can do that without forcing it to copy the zip file again? (Perhaps if I supply a zero-length file as the last file initially and then update it?)
    • Scale: How big will SharpZipLib permit these files to be? (Assume the 64 flag is on.) How many files can they contain? Will a single Zip manage hundreds of gigabytes of 100Kb files, for instance? Or are there system resources that are held onto during the creation of the zip that limit how big we can go?
    • Version: I am currently using version 0.86 from back in May when I first put it into my project. Does the CommitUpdate bug with silent file open failures in that version affect both ZipFIle and FastZip? Or just FastZip? It might help explain some odd behaviors we've seen.
    If there are things I simply can't do, I need to know that so I can make alternate plans. If I need to make some architectural adjustments, my project needs to tackle them sooner rather than later. 
    Thanks,
    Lupestro

     

  • 01-07-2011 2:45 AM In reply to

    Re: Getting the most from SharpZipLib

    Hi Lupestro,

    Firstly, thank you for a most interesting post. I wish all our posters were this good at describing things :-)

    I don't have a lot of time spare today so this post will be brief, but hope to give you something to go on.

    Have you seen the sample code in the wiki? I have done a lot of work on these, and I'm sure they would not have been there when you were writing your application originally. The most recent sample covers "Create a zip with FastZip using progress events".

    SharpZipLib - Code Reference
    FastZip
    Create a Zip with full control over contents

    You could use the Progess Events as illustrated in the FastZip sample, but that still won't solve the "I can't grab logs that are open for write" problem.

    If it were me, I would change to using ZipFile or ZipOutputStream instead. Of those two, ZipFile is the better choice for your case.

    I suggest using the code in the "Create a Zip with full control over contents" sample. You can see how straight away it gives you total control over every aspect of your operation. You can change the File.Open parameters so you can read the in-use files, you can add error handling in a much easier way, and you can have progress events in whichever level of granularity suits you.

    Re your questions:

    • Handles: Does SharpZipLib open the file handles at AddEntry time or right before it processes the file inside CommitUpdate?
      - will get back to you on that.
      .
    • Errors: If I want to append an error file to the zip after the rest of the processing is complete, is there a way I can do that without forcing it to copy the zip file again? (Perhaps if I supply a zero-length file as the last file initially and then update it?)
      - with ZipFile you just add that file at the end of the process. No problem.
      .
    • Scale: How big will SharpZipLib permit these files to be? (Assume the 64 flag is on.) How many files can they contain? Will a single Zip manage hundreds of gigabytes of 100Kb files, for instance? Or are there system resources that are held onto during the creation of the zip that limit how big we can go?
      - The limit is gigantic, way bigger than any disk drive.
      .
    • Version: I am currently using version 0.86 from back in May when I first put it into my project. Does the CommitUpdate bug with silent file open failures in that version affect both ZipFIle and FastZip? Or just FastZip? It might help explain some odd behaviors we've seen.
      - bug was FastZip only. my bad, bug was in CommitUpdate which is in ZipFile. Only affected BeginUpdate - CommitUpdate. And the bug is fixed in 0.86 in case that's not clear.

    Got to go. More later.
    Regards,
    David

  • 01-08-2011 4:10 AM In reply to

    Re: Getting the most from SharpZipLib

    DavidPierson:

    Have you seen the sample code in the wiki? I have done a lot of work on these, and I'm sure they would not have been there when you were writing your application originally. The most recent sample covers "Create a zip with FastZip using progress events".

    I hadn't. The samples are great.

    DavidPierson:

    I suggest using the code in the "Create a Zip with full control over contents" sample. You can see how straight away it gives you total control over every aspect of your operation. You can change the File.Open parameters so you can read the in-use files, you can add error handling in a much easier way, and you can have progress events in whichever level of granularity suits you.

    Thanks! That was precisely what I needed.

    I reworked my code today to operate that way and tested it with 50,000 100Kb-150Kb files. It took about 1/2 hour to collect the 6 Gb of files. The files are already individually compressed media and I verified this afternoon that with a compression level of 3 on the zip, no further significant compression occurs.

    Question: If I were to drop the compression from 3 to 0 on the zip, should I expect a significant improvement in the time it takes to produce the zip file? Or is it unlikely to matter? (Yes, I will try it Monday.) Any suggestions on other measures I might try for squeezing the most speed I can out of the operation?

  • 01-11-2011 9:05 AM In reply to

    Re: Getting the most from SharpZipLib

    Lupestro:
    If I were to drop the compression from 3 to 0 on the zip, should I expect a significant improvement in the time it takes to produce the zip file? Or is it unlikely to matter? (Yes, I will try it Monday.) Any suggestions on other measures I might try for squeezing the most speed I can out of the operation?

    I'll have to admit I have not done any comparative timings, but it would have to help setting the level to 0. Ultimately it will depend on the relative bottlenecks of CPU versus bus and disk. But I would recommend setting to zero, when you know the file type won't compress. Do let us know if you try it what happens!

Page 1 of 1 (4 items)
Powered by Community Server (Commercial Edition), by Telligent Systems
Don't contact us via this (fleischfalle@alphasierrapapa.com) email address.