Re: find repeated sub-strings

new topic     » goto parent     » topic index » view thread      » older message » newer message

----- Original Message -----
From: "Lewis Townsend" <keroltarr at HOTMAIL.COM>
To: <EUPHORIA at LISTSERV.MUOHIO.EDU>
Sent: Thursday, April 27, 2000 10:06 PM
Subject: find repeated sub-strings


> Hello all,
>
> I need a function that finds the most repeated segments in a string.
> For example:
> If I had a string: "the quick brown fox jumped over the lasy brown dog"
> our hypothetical function would find the repeated sub-strings:
> " brown " and "the "
> I would like this function to also keep track of how many times each
> multiple match was matched; like so:
> {{2," brown "}, {2, "the "}} -- prefered return format
> Also, don't bother returning a string that is less than 2 characters
> long. Am I making sense?
> Does anyone have code that does this or something very similar?
> As you might have guessed, it is for a compression algorithm I have
> in mind but I am stumped at this first vital funtion.
> I always run up against possible problems and try to redesign all
> over again just to realize another possible flaw in my algorithm.

Look at the strtoks.e file in WIN Sockets Code for Mirc on the User
Contributions page. It was written specifically for playing with the words
in a sentence. You could:
(UNtested code)

wordnum = 1
wordcount = numtok(text, 32)
while ( wordnum < wordcount ) do
  oneword = gettok(text,wordnum,32)
  sprintf(1,oneword & : & space & findtok(text,oneword,0,32) & /n )
  inc wordnum
end while

That will list each word with how many matches there are for it. There is
also a wildmatch, a rem(ove)tok, a ins(ert)tok, etc for manipulating the
words in the sentence.

Kat

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu