Re: compression
Lewis wrote:
>Just in case you or anyone else is interested, I was wanting
>this function for a compression algorithm I have been working on.
>Here is how it works:
> -- snip--
>I have got up to 17%
>compression with this. Has anyone done this before? Can
>anyone see any potential flaws and/or innefficiencies?
If the data to be compressed is a text file then how about compressing the text
in the
headers? ie:
a to z = 26
A to Z = 26
total 52 (5.7 BITS)
if the output is written to a disk file as 1 character per byte(8 bits) then
this will
save some space. Of course you will have to have a length/delimiter character of
sorts..
maybe you could group the header text into length sizes and do away with a local
delimiter
in favour of a length index at the very start of the compressed file - or will
this
approach spike the algorithm? If not then perhaps the principle could be applied
to the
compressed data itself.
Actually,
>I have got up to 17% compression with this.
Does that means that a 100k file was reduced to 83k or that 100k -> 17k?
If the former is meant then a pure text file of, say,
a to z = 26
A to Z = 26
0 to 9 = 10
!@#$%^&*()-_=+\|]}[{;:’”,<.>/? = 30
<space> <tab> = 2
TOTAL 94 ( 6.584962500721 bits)
could be compressed to about 82% just by 'packing' each char into the said
number of
bits - to make it easy the total number of unique values could be boosted to 96
making
each group of 13 bits = 2 chars ( i think that's right)
>It finds all substrings that are repeated in a string of bytes
>and sorts them by how many matches were found in descending
> order.(This was done with Michael's code) Then I re-sort these
>strings based on a "score"..
Would it be possible to amalgamate the 2 sorting processes with a single
comprehensive
sort? ie, loop through each matched group and calculate the "score" then sort
them (once)
.
Yours Truly
Mike
vulcan at win.co.nz
|
Not Categorized, Please Help
|
|