1. compression problem...

i'm developing a new compression algorithm designed for plain text 
documents(no ASCII values above 7 bits, or ASCII 127). here's how it 
works: my program takes all the bits of every byte, replaces the last 
bit with the first bit of the next byte. then that first bit of the next 
byte is removed and a 0 inserted at the end. this process is repeated 
until the last byte - 1. this should leave some ASCII 0 bytes after a 
while, right? well for some odd reason, it doesn't...i dunno, maybe i'm 
having an off day and made a mistake in my code. in either case, here it 
is:

[comp1.ex]
include file.e
include get.e
include machine.e
function lof(integer fn)
    integer pos,eof
    pos=where(fn)
    if seek(fn,-1) then end if
    eof=where(fn)
    if seek(fn,pos) then end if
    return eof
end function
with trace
sequence stuff,cmd,inf,outf,stuff2,tmp,tmp2
integer fn,decompress
decompress=0
cmd=command_line()
if length(cmd) < 4 then
    puts(1,"Experimental Plain Text Compression\n")
    puts(1,"usage: ex eptc [-d] infile outfile\n")
    puts(1,"\nDefault behavior is compression. -d causes it to 
decompress\n")
    abort(0)
end if
if equal(cmd[3],"-d") then
    decompress=1
    cmd=cmd[1..2]&cmd[4..length(cmd)]
end if
inf=cmd[3] outf=cmd[4]
if not decompress then
    puts(1,"Compressing...\n")
    fn=open(inf,"rb")
    if fn=-1 then
	printf(1,"Unable to open `%s'\n",inf)
	abort(1)
    end if
    stuff=get_bytes(fn,lof(fn))
    close(fn)
    stuff2=""
    for i = 1 to length(stuff)-1 do
	tmp=int_to_bits(stuff[i],8)
	if tmp[8] != 0 then
	    puts(1,"Error: this file is not all plain text. Convert it to plain 
text(if it's a Word document or something) and remove all graphics 
characters, then try again\n")
	    abort(3)
	end if
	tmp2=int_to_bits(stuff[i+1],7)
	tmp=tmp[1..7]&tmp2[1]
	stuff[i+1]=bits_to_int(tmp2[2..length(tmp2)]&0)
	stuff2 &= bits_to_int(tmp)
    end for
    stuff2 &= stuff[length(stuff)]
    stuff={}
    for i = 1 to length(stuff2) do
	stuff2[i]=int_to_bits(stuff2[i],8)
	? stuff2[i]
	if not equal(stuff2[i],repeat(0,8)) then
	    stuff &= bits_to_int(stuff2[i])
	end if
    end for
    fn=open(outf,"wb")
    if fn=-1 then
	printf(1,"Unable to open `%s' for writing\n",outf)
	abort(2)
    end if
    puts(fn,stuff)
    close(fn)
else
    puts(1,"Decompressing...")
    fn=open(inf,"rb")
    if fn=-1 then
	printf(1,"Unable to open `%s'\n",inf)
	abort(1)
    end if
    stuff=get_bytes(fn,lof(fn))
    close(fn)
    for i = 1 to length(stuff) do
	stuff[i]=int_to_bits(stuff[i],8)
    end for
    for i = 1 to length(stuff)-1 do
	stuff[i+1]=stuff[i][8]&stuff[i+1]
	stuff[i]=stuff[i][1..length(stuff[i])-1]&0
    end for
    for i = 1 to length(stuff) do
	stuff[i]=bits_to_int(stuff[i])
    end for
    fn=open(outf,"wb")
    puts(fn,stuff)
    close(fn)
end if

new topic     » topic index » view message » categorize

2. Re: compression problem...


new topic     » goto parent     » topic index » view message » categorize

3. Re: compression problem...

On 25 Sep 2001, at 15:04, Sabal.Mike at notations.com wrote:

> 
> Maybe I'm missing something here, but it looks like all you're trying to do is
> left-shift every word.  I don't see where the compression would come in that. 
> Also, if you're trying to bit-shift yourself into a number of null-bytes, this
> method would only work on byte=#80.  

Ditto.

> It seems what you'd rather do for this simple algorithm is compress 8-bit
> bytes
> into 7-bit bytes by bit-shifting across the entire file, not just a single
> word.
>  It would guarantee you a minimum 12.5% compression, and leave a greater
>  chance
> of null or further compressable bytes.  It might look something like this:

This would be useful only on those languages that use only the lower 7 bits, 
too, how many languages would this leave out?
 
Kat

new topic     » goto parent     » topic index » view message » categorize

4. Re: compression problem...


new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu