1. compression problem...
- Posted by sephiroth _ <euman2376 at yahoo.com> Sep 25, 2001
- 387 views
i'm developing a new compression algorithm designed for plain text documents(no ASCII values above 7 bits, or ASCII 127). here's how it works: my program takes all the bits of every byte, replaces the last bit with the first bit of the next byte. then that first bit of the next byte is removed and a 0 inserted at the end. this process is repeated until the last byte - 1. this should leave some ASCII 0 bytes after a while, right? well for some odd reason, it doesn't...i dunno, maybe i'm having an off day and made a mistake in my code. in either case, here it is: [comp1.ex] include file.e include get.e include machine.e function lof(integer fn) integer pos,eof pos=where(fn) if seek(fn,-1) then end if eof=where(fn) if seek(fn,pos) then end if return eof end function with trace sequence stuff,cmd,inf,outf,stuff2,tmp,tmp2 integer fn,decompress decompress=0 cmd=command_line() if length(cmd) < 4 then puts(1,"Experimental Plain Text Compression\n") puts(1,"usage: ex eptc [-d] infile outfile\n") puts(1,"\nDefault behavior is compression. -d causes it to decompress\n") abort(0) end if if equal(cmd[3],"-d") then decompress=1 cmd=cmd[1..2]&cmd[4..length(cmd)] end if inf=cmd[3] outf=cmd[4] if not decompress then puts(1,"Compressing...\n") fn=open(inf,"rb") if fn=-1 then printf(1,"Unable to open `%s'\n",inf) abort(1) end if stuff=get_bytes(fn,lof(fn)) close(fn) stuff2="" for i = 1 to length(stuff)-1 do tmp=int_to_bits(stuff[i],8) if tmp[8] != 0 then puts(1,"Error: this file is not all plain text. Convert it to plain text(if it's a Word document or something) and remove all graphics characters, then try again\n") abort(3) end if tmp2=int_to_bits(stuff[i+1],7) tmp=tmp[1..7]&tmp2[1] stuff[i+1]=bits_to_int(tmp2[2..length(tmp2)]&0) stuff2 &= bits_to_int(tmp) end for stuff2 &= stuff[length(stuff)] stuff={} for i = 1 to length(stuff2) do stuff2[i]=int_to_bits(stuff2[i],8) ? stuff2[i] if not equal(stuff2[i],repeat(0,8)) then stuff &= bits_to_int(stuff2[i]) end if end for fn=open(outf,"wb") if fn=-1 then printf(1,"Unable to open `%s' for writing\n",outf) abort(2) end if puts(fn,stuff) close(fn) else puts(1,"Decompressing...") fn=open(inf,"rb") if fn=-1 then printf(1,"Unable to open `%s'\n",inf) abort(1) end if stuff=get_bytes(fn,lof(fn)) close(fn) for i = 1 to length(stuff) do stuff[i]=int_to_bits(stuff[i],8) end for for i = 1 to length(stuff)-1 do stuff[i+1]=stuff[i][8]&stuff[i+1] stuff[i]=stuff[i][1..length(stuff[i])-1]&0 end for for i = 1 to length(stuff) do stuff[i]=bits_to_int(stuff[i]) end for fn=open(outf,"wb") puts(fn,stuff) close(fn) end if
3. Re: compression problem...
- Posted by Kat <gertie at PELL.NET> Sep 25, 2001
- 364 views
On 25 Sep 2001, at 15:04, Sabal.Mike at notations.com wrote: > > Maybe I'm missing something here, but it looks like all you're trying to do is > left-shift every word. I don't see where the compression would come in that. > Also, if you're trying to bit-shift yourself into a number of null-bytes, this > method would only work on byte=#80. Ditto. > It seems what you'd rather do for this simple algorithm is compress 8-bit > bytes > into 7-bit bytes by bit-shifting across the entire file, not just a single > word. > It would guarantee you a minimum 12.5% compression, and leave a greater > chance > of null or further compressable bytes. It might look something like this: This would be useful only on those languages that use only the lower 7 bits, too, how many languages would this leave out? Kat