1. Let's talk about zip files

Another thing I'm working on and have half-completed (along with structs and hashes) is zip files. I've got miniz implemented in the backend and I'm exposing its API to the frontend via a sequence of pointers and define_c_func/proc.

Question is: what does a good, simple, Euphoria-like zip API look like? Let's discuss! (For reference, the miniz archive API lives here: miniz_zip.h) I'm thinking that a zip file library is basically a combination of two things: file system (dirs/files, names/sizes, etc.) and file I/O (open/close, read/write, etc.).

With that, the typical operations are going to be: creating zip files, inspecting zip files, and extracting zip files. So to start with, here's what I propose. (There will be additional routines for doing more advanced things with zip files, but I think this covers most use cases).

Zip file routines

include std/zip.e 
 
--** 
-- Open a zip file for reading or writing. 
public function zip_open( sequence filename, sequence mode, integer flags=0, atom archive_start=0, atom archive_size=0 ) 
 
--** 
-- Close (and if writing, finalize) a zip file. 
public function zip_close( atom zip, integer finalize=TRUE ) 
 
--** 
-- Add an entry to a zip file. 
public function zip_add( atom zip, sequence source_name, sequence archive_name="", integer flags=ZIP_DEFAULT_COMPRESSION, sequence comment="" ) 
 
--** 
-- Extract an entry from a zip file. 
public function zip_extract( atom zip, sequence archive_name, sequence destination_name="", integer flags=0 ) 
 
--** 
-- Locate an entry in a zip file. 
public function zip_find( atom zip, sequence filename, integer flags=0, sequence comment="" ) 
 
--** 
-- List the entries in a zip file, filtered by an optional pattern. 
public function zip_dir( atom zip, sequence pattern="" ) 

Creating zip files

include std/zip.e 
 
-- open zip file for writing 
atom zip = zip_open( "test.zip", "w" ) 
 
-- add entry using its base name "file1.txt" 
zip_add( zip, "path/to/file1.txt" ) 
 
-- add entry using specific name "path/file2.txt" 
zip_add( zip, "path/to/file2.txt", "path/file2.txt" ) 
 
-- add entry using its base name and no compression 
zip_add( zip, "path/to/file3.txt",, ZIP_NO_COMPRESSION ) 
 
-- finalize (write directory) and close the zip file 
zip_close( zip ) 

Inspecting zip files

include std/zip.e 
 
-- open zip file for reading 
atom zip = zip_open( "test.zip", "r" ) 
 
-- find a specific file (returns ordinal number or -1 if not found) 
? zip_find( zip, "file3.txt" ) 
 
-- list all entries, with optional wildcard (same input/output as dir()) 
sequence files = zip_dir( zip ) 
for i = 1 to length( files ) do 
    printf( 1, "name: \"%s\", size: %d bytes\n", {files[i][D_NAME],files[i][D_SIZE]} ) 
end for 
 
-- close the zip file 
zip_close( zip ) 

Extracting zip files

include std/zip.e 
 
-- open zip file for reading 
atom zip = zip_open( "test.zip", "r" ) 
 
-- extract the entry (and its path) to the current directory 
zip_extract( zip, "path/file2.txt" ) 
 
-- extract the entry to a specific destination (to ignore path) 
zip_extract( zip, "path/file2.txt", "file.txt" ) 
 
-- close the zip file 
zip_close( zip ) 

Thoughts, comments, opinions?

-Greg

new topic     » topic index » view message » categorize

2. Re: Let's talk about zip files

I was curious what Pete was doing for zip files in Phix and I found LiteZip. I can offer a few advantages with what I'm planning:

  • Miniz is more liberally licensed (MIT) than LiteZip (LGPL3)
  • Miniz compiles directly into the backend of Euphoria on all platforms
  • Miniz is much smaller (~100KB on the backend and ~50KB to library)
  • Miniz includes zlib routines which will also be included in library (std/zlib.e)
  • Miniz is still actively maintained by its developers (LiteZip is from 2008?)

-Greg

new topic     » goto parent     » topic index » view message » categorize

3. Re: Let's talk about zip files

That's pretty cool Greg. Having zip support will be nice. Although I think making a zip/compression algorithm that can say compress a 1GB file to a mere 500-1mb file would be amazing.

new topic     » goto parent     » topic index » view message » categorize

4. Re: Let's talk about zip files

ghaberek said...

I was curious what Pete was doing for zip files in Phix and I found LiteZip. I can offer a few advantages with what I'm planning:

  • Miniz is more liberally licensed (MIT) than LiteZip (LGPL3)
  • Miniz compiles directly into the backend of Euphoria on all platforms
  • Miniz is much smaller (~100KB on the backend and ~50KB to library)
  • Miniz includes zlib routines which will also be included in library (std/zlib.e)
  • Miniz is still actively maintained by its developers (LiteZip is from 2008?)

-Greg

I expect you've also seen https://github.com/kuba--/zip (which wraps miniz)
Apparently that is easily compiled into a dll (/so?) but for me it's just an exercise in frustration.
I would quite probably quite happily jump ship to that or similar if only I could actually build it.
As I've noted there is an annoying niggle with LiteZip that will probably never ever be fixed,
plus it's still 32-bit only and completely untested on Linux, both of which are not exactly ideal.
Not that it really matters but technically LiteZip is 99K all in, or just 59K for extract-only,
though I accept that argument would be blown out of the water for anyone shipping a 64-bit app.
Regarding "directly into the backend" I trust you've considered and covered the eu2c implications.

PS Good find, it astonishes me how hard it is to find a decent zip component.

PPS One thing you may have missed is the ability to delete entries from a zip file? (no biggie)

new topic     » goto parent     » topic index » view message » categorize

5. Re: Let's talk about zip files

petelomax said...

I expect you've also seen https://github.com/kuba--/zip (which wraps miniz)

Yes I came across that as well. I'm basically doing the same thing by using miniz for the "low level" functions and crafting the "high level" functions directly in std/zip.e.

petelomax said...

Apparently that is easily compiled into a dll (/so?) but for me it's just an exercise in frustration.

Both miniz and "kubazip" (that doesn't really have a name, does it?) both use CMake but it's not really necessary.

Kubazip already includes the amalgamated source for miniz which makes it easier. A quick test on my Ubuntu system:

$ git clone https://github.com/kuba--/zip kubazip 
$ cd kubazip 
$ gcc -shared -fPIC -O2 -s -D_GNU_SOURCE -Isrc/ -o libkubazip.so src/zip.c 

If you just want miniz without "kubazip" you could build with all the source files. You just need to create miniz_export.h first (running CMake would do this).

$ git clone https://github.com/richgel999/miniz 
$ cd miniz 
$ printf "#ifndef MINIZ_EXPORT\n#define MINIZ_EXPORT\n#endif\n" > miniz_export.h 
$ gcc -shared -fPIC -O2 -s -D_GNU_SOURCE -o libminiz.so miniz*.c 

Note: Adding -D_GNU_SOURCE is required for large file support on 64-bit. I don't think it causes any harm to leave in when building for 32-bit either.

petelomax said...

I would quite probably quite happily jump ship to that or similar if only I could actually build it.

Here's a Makefile that should work on Windows or Linux. Hope that helps.

CC = gcc$(EXE_EXT) 
CFLAGS = -fPIC -O2 -s 
TARGET = $(LIB_PRE)miniz$(LIB_EXT) 
 
ifeq ($(OS),Windows_NT) 
    EXE_EXT = .exe 
    LIB_EXT = .dll 
else 
    LIB_PRE = lib 
    LIB_EXT = .so 
    CFLAGS += -D_GNU_SOURCE 
endif 
 
$(TARGET) : $(wildcard miniz*.c) 
	$(CC) -shared $(CFLAGS) -o $@ $^ 

petelomax said...

As I've noted there is an annoying niggle with LiteZip that will probably never ever be fixed,
plus it's still 32-bit only and completely untested on Linux, both of which are not exactly ideal.

Miniz hasn't had an actual tagged release in a while, but it still sees very frequent commits. Pretty sure it'll run on darn near anything.

petelomax said...

Not that it really matters but technically LiteZip is 99K all in, or just 59K for extract-only,
though I accept that argument would be blown out of the water for anyone shipping a 64-bit app.

Using the Makefile I provided above libminiz.so is 95K on my 64-bit Ubuntu system. When building into the interpreter, be_miniz.o is only 86K. libkubazip.so is about 120K.

Keep in mind that miniz also provides the lower-level zlib deflate/inflate functions as well, which I'm going to wrap separately in std/zlib.e.

petelomax said...

Regarding "directly into the backend" I trust you've considered and covered the eu2c implications.

Yes, indeed. What I've done package the amalgamated files into be_miniz.h and be_miniz.c and that gets compiled into the interpreter directly and into the translator's static library.

petelomax said...

PS Good find, it astonishes me how hard it is to find a decent zip component.

There are a few out there but I wanted to provide some functionality directly Euphoria and leave the "Swiss Army tool" functionality to shared libraries.

If you're looking for a one-stop library for all your archiving and compression needs, I'd recommend libarchive. For encryption and hashing, I'd recommend libtomcrypt.

petelomax said...

PPS One thing you may have missed is the ability to delete entries from a zip file? (no biggie)

I wouldn't say I've missed it; none of this is fully-baked yet. What I showed in the first post was just an example of what I'm putting together.

Zip files are weird in that "deleting" entries isn't really a thing. You basically have three options:

  1. Remove the entry from the central directory and zero-out its data in the file (does not reduce the file size at all).
  2. Perform step #1 above but then "shift" all of the other entries down to close the gap and rewrite the central directory (complicated, possibly destructive).
  3. Create a new zip file and copy all the entries, excluding what you're removing, and then delete the original zip file (simpler, slower, uses more disk space).

Obviously #3 is the safest approach so I'll probably implement that in std/zip.e and document it similarly to db_compress() which basically does the same thing.

-Greg

new topic     » goto parent     » topic index » view message » categorize

6. Re: Let's talk about zip files

Icy_Viking said...

That's pretty cool Greg. Having zip support will be nice. Although I think making a zip/compression algorithm that can say compress a 1GB file to a mere 500-1mb file would be amazing.

This is only using basic DEFLATE compression algorithms so I wouldn't expect huge gains. For really good compression we'd have to look at either LZMA or LZ4.

As I mentioned to Pete, libarchive is a good solution for this that someone should be able to wrap for use with Euphoria.

-Greg

new topic     » goto parent     » topic index » view message » categorize

7. Re: Let's talk about zip files

ghaberek said...

Here's a Makefile that should work on Windows or Linux. Hope that helps.

Sadly, and I could almost have predicted this, or at least the brash attitude part of it:

makefile:15: *** missing separator.  Stop. 

new topic     » goto parent     » topic index » view message » categorize

8. Re: Let's talk about zip files

petelomax said...
ghaberek said...

Here's a Makefile that should work on Windows or Linux. Hope that helps.

Sadly, and I could almost have predicted this, or at least the brash attitude part of it:

makefile:15: *** missing separator.  Stop. 

Ah, Makefiles are a fickle beast. Apparently I didn't copy/paste hard enough and lost a tab.

Sorry about that. Replace the leading spaces on that line with a hard tab and it should work.

I've updated my post with the correct tabbage.

-Greg

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu