Re: find repeated sub-strings
----- Original Message -----
From: "Lewis Townsend" <keroltarr at HOTMAIL.COM>
To: <EUPHORIA at LISTSERV.MUOHIO.EDU>
Sent: Friday, April 28, 2000 1:32 PM
Subject: Re: find repeated sub-strings
> Hello Kat,
>
> >Look at the strtoks.e file in WIN Sockets Code for Mirc on the User
> >Contributions page. It was written specifically for playing with the
words
> >in a sentence. You could:
> >(UNtested code)
> >
> >wordnum = 1
> >wordcount = numtok(text, 32)
> >while ( wordnum < wordcount ) do
> > oneword = gettok(text,wordnum,32)
> > sprintf(1,oneword & : & space & findtok(text,oneword,0,32) & /n )
> > inc wordnum
> >end while
> >
> >That will list each word with how many matches there are for it. There is
> >also a wildmatch, a rem(ove)tok, a ins(ert)tok, etc for manipulating the
> >words in the sentence.
> >
> >Kat
>
> I need something that will ignore words and delimiters such as spaces and
> cariage returns.
> in this example: "the coyote ate the cat"
> The segments "the c" and "te " would be repeated which doesn't considder
> whole words. Does strtoks.e allow this sort of pattern
> matching?
You can use the wildmatch to find tokens (words) containing "c" and to find
words with "te" in them. You'd need to write the loop, picking out what
parts of words you wish to look for. I suggest looking for the biggest parts
first. And be recursive. As will all compression schemes, the tighter the
compression, the longer it takes to compress it, cause the more particles
you haveto look for in the entire text.
> in this example: "the coyote ate the cat"
you'd scan for:
*the*
*th*
*he*
*coyote*
*coyot*
*coyo*
*coy*
*co*
*oyote*
*yote*
*ote*
*te*
*oyot*
*oyo*
*oy*
*yo*
*ote*
*ate*
*at*
*the*
*th*
*cat*
*ca*
*te*
....etc....
I'd compress the whole words, including words with afixes, then compress the
afixes. Only then would i compress the insides of the words.
Kat
|
Not Categorized, Please Help
|
|