Re: find/match not working

new topic     » goto parent     » topic index » view thread      » older message » newer message

>
>On 20 Aug 2004, at 16:10, Derek Parnell wrote:
>
> >
> > posted by: Derek Parnell <ddparnell at bigpond.com>
> >
> > Kat wrote:
> > >
> > > On 20 Aug 2004, at 2:14, don cole wrote:
> > >
> > > > > Kat wrote:
> > > > > Like i said, no. The code is running as is *now*, looping thru 
> the list
> > > > > and using match on each item in the list. I don't want to 
> interrupt it and
> > > > > waste internet bandwidth on a test, in case someone is waiting on the
> > > > > results of the code run.
> > > >
> > > > I'm a fairly newbbie to this board and might be missing something 
> here, but
> > > > I don't see how do you expect to resolve this issue if you are 
> unwilling to
> > > > CHANGE YOUR CODE.
> > >
> > > I DID bloody change the code, i am not using find() anymore, i am using
> > > match(). I have said that repeatedly, what's the problem?
> >
> > Problem: Kat says find() doesn't work. Gives example using match(),
> > which also is faulty.
>
>How is that faulty?? For the strings i am using, it is working.
>
> > Solution: Multiple examples in which find() is shown to work.
> >
> > Result: Kat refuses to try find() again. That is Kat refuses to change her
> > code (yet again) to use find() as per any of the solutions.
>
>See below.
>
> > Consequence: Confusion as to whether or not Kat really wants help.
> >
> > Kat, if don't want help with find(), why bring it up? If you do want
> > help with find(), why don't do accept it?
>
>I had a problem with find(). I reported it. I recoded the apps to use 
>match().
>Programs are now running fine, as of when i gave up on find() and began
>using match() and reported the problem here. My mistakes were reporting
>the problem here, and not saving the code and data that didn't run; but i was
>in a hurry (i noticed the dupes after 4 hrs of it running, which put me 4 hrs
>behind), and gave up on find() after 30 min or so, and simply over-wrote the
>bad code with something which works. As an aside, the person i was
>counting on to have working demo code for the data i obtained, hasn't written
>a line of code YET. So much for a tight schedule and being reliable. And i
>am am pretty sure i won't report any problems here again if i am in a hurry.
>This thread really takes the cake.
>
>I just sent Derek a screen shot of how busy the computer is. At this time, i
>don't have any free cpu clocks, memory, or bandwidth to test anything.
>Unless i shut things down,, and become the person who isn't getting things
>done.
>
>Kat

-- NODUPES.EX
include sort.e
include misc.e


sequence text  atom t

integer fn, l  sequence fname  object line
--fname = "findmatch not working.txt"
--fname = "DUPES.TXT"
fname = "FINDMACH.TXT" --file of URLs Kat posted
--fname = "randdata.txt"
fn = open(fname, "r")
if fn = -1 then
         printf(1, "Unable to open %s\n", {fname})
         abort(0)
end if

-- with trace
-- trace(1)
text = {}
-- remove new-line
while 1 do
         line = gets(fn)
         if atom(line) then
                 exit
         end if
         l = length(line)
         if equal(line[l], '\n') then
                 line = line[1..l - 1]
         end if
         -- putting all the data into this is, of course, unnecessary.
         -- I just did it to aid development.
         text = append(text, line)
end while
printf(1, "length(text) = %d\n", {length(text)})

-- I dont know the significance of "<done>" in the data, so I'm removing it.
for i = 1 to length(text) do
         line = text[i]
         l = length(line) - 7
         if match(" <done>", line) = l + 1 then
                 text[i] = line[1..l]
         end if
end for

sequence uniqueUrls  uniqueUrls = {}
printf(1, "collecting uniqueUrls...\n", {})  t = time()
for i = 1 to length(text) do
         if not find(text[i], uniqueUrls) then
                 uniqueUrls = append(uniqueUrls, text[i])
         else
                 printf(1, "DUP FOUND: %s\n", {text[i]})
         end if
end for
t = time() - t  printf(1, "elapsed time is %f seconds.\n", {t})

printf(1, "length(text) = %d\n", {length(text)})
printf(1, "length(uniqueUrls) = %d\n", {length(uniqueUrls)})

uniqueUrls = sort(uniqueUrls)
pretty_print(1, uniqueUrls, {2})  puts(1, "\n\n")



This spits out one duplicate: DUP FOUND: http://www.ed.gov

Also, I noticed such things as:

"http:// www.buydirectory.com"  has an embedded space;
"http://www.hub..terc.edu"         2 periods in a row;
"http://www.iee org.uk"              has an embedded space;
"http://www.ipl,irg/reading/books" has an embedded comma

Sorry for the delay.  I was trying to come up with a solution that wouldn't 
take hours to run.
Please tell me if you want me to continue.


Bob

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu