Re: find/match not working
- Posted by Bob Elia <bobelia200 at netzero.net> Aug 21, 2004
- 520 views
> >On 20 Aug 2004, at 16:10, Derek Parnell wrote: > > > > > posted by: Derek Parnell <ddparnell at bigpond.com> > > > > Kat wrote: > > > > > > On 20 Aug 2004, at 2:14, don cole wrote: > > > > > > > > Kat wrote: > > > > > Like i said, no. The code is running as is *now*, looping thru > the list > > > > > and using match on each item in the list. I don't want to > interrupt it and > > > > > waste internet bandwidth on a test, in case someone is waiting on the > > > > > results of the code run. > > > > > > > > I'm a fairly newbbie to this board and might be missing something > here, but > > > > I don't see how do you expect to resolve this issue if you are > unwilling to > > > > CHANGE YOUR CODE. > > > > > > I DID bloody change the code, i am not using find() anymore, i am using > > > match(). I have said that repeatedly, what's the problem? > > > > Problem: Kat says find() doesn't work. Gives example using match(), > > which also is faulty. > >How is that faulty?? For the strings i am using, it is working. > > > Solution: Multiple examples in which find() is shown to work. > > > > Result: Kat refuses to try find() again. That is Kat refuses to change her > > code (yet again) to use find() as per any of the solutions. > >See below. > > > Consequence: Confusion as to whether or not Kat really wants help. > > > > Kat, if don't want help with find(), why bring it up? If you do want > > help with find(), why don't do accept it? > >I had a problem with find(). I reported it. I recoded the apps to use >match(). >Programs are now running fine, as of when i gave up on find() and began >using match() and reported the problem here. My mistakes were reporting >the problem here, and not saving the code and data that didn't run; but i was >in a hurry (i noticed the dupes after 4 hrs of it running, which put me 4 hrs >behind), and gave up on find() after 30 min or so, and simply over-wrote the >bad code with something which works. As an aside, the person i was >counting on to have working demo code for the data i obtained, hasn't written >a line of code YET. So much for a tight schedule and being reliable. And i >am am pretty sure i won't report any problems here again if i am in a hurry. >This thread really takes the cake. > >I just sent Derek a screen shot of how busy the computer is. At this time, i >don't have any free cpu clocks, memory, or bandwidth to test anything. >Unless i shut things down,, and become the person who isn't getting things >done. > >Kat
-- NODUPES.EX include sort.e include misc.e sequence text atom t integer fn, l sequence fname object line --fname = "findmatch not working.txt" --fname = "DUPES.TXT" fname = "FINDMACH.TXT" --file of URLs Kat posted --fname = "randdata.txt" fn = open(fname, "r") if fn = -1 then printf(1, "Unable to open %s\n", {fname}) abort(0) end if -- with trace -- trace(1) text = {} -- remove new-line while 1 do line = gets(fn) if atom(line) then exit end if l = length(line) if equal(line[l], '\n') then line = line[1..l - 1] end if -- putting all the data into this is, of course, unnecessary. -- I just did it to aid development. text = append(text, line) end while printf(1, "length(text) = %d\n", {length(text)}) -- I dont know the significance of "<done>" in the data, so I'm removing it. for i = 1 to length(text) do line = text[i] l = length(line) - 7 if match(" <done>", line) = l + 1 then text[i] = line[1..l] end if end for sequence uniqueUrls uniqueUrls = {} printf(1, "collecting uniqueUrls...\n", {}) t = time() for i = 1 to length(text) do if not find(text[i], uniqueUrls) then uniqueUrls = append(uniqueUrls, text[i]) else printf(1, "DUP FOUND: %s\n", {text[i]}) end if end for t = time() - t printf(1, "elapsed time is %f seconds.\n", {t}) printf(1, "length(text) = %d\n", {length(text)}) printf(1, "length(uniqueUrls) = %d\n", {length(uniqueUrls)}) uniqueUrls = sort(uniqueUrls) pretty_print(1, uniqueUrls, {2}) puts(1, "\n\n")
This spits out one duplicate: DUP FOUND: http://www.ed.gov Also, I noticed such things as: "http:// www.buydirectory.com" has an embedded space; "http://www.hub..terc.edu" 2 periods in a row; "http://www.iee org.uk" has an embedded space; "http://www.ipl,irg/reading/books" has an embedded comma Sorry for the delay. I was trying to come up with a solution that wouldn't take hours to run. Please tell me if you want me to continue. Bob