Re: find/match not working
>
>On 20 Aug 2004, at 16:10, Derek Parnell wrote:
>
> >
> > posted by: Derek Parnell <ddparnell at bigpond.com>
> >
> > Kat wrote:
> > >
> > > On 20 Aug 2004, at 2:14, don cole wrote:
> > >
> > > > > Kat wrote:
> > > > > Like i said, no. The code is running as is *now*, looping thru
> the list
> > > > > and using match on each item in the list. I don't want to
> interrupt it and
> > > > > waste internet bandwidth on a test, in case someone is waiting on the
> > > > > results of the code run.
> > > >
> > > > I'm a fairly newbbie to this board and might be missing something
> here, but
> > > > I don't see how do you expect to resolve this issue if you are
> unwilling to
> > > > CHANGE YOUR CODE.
> > >
> > > I DID bloody change the code, i am not using find() anymore, i am using
> > > match(). I have said that repeatedly, what's the problem?
> >
> > Problem: Kat says find() doesn't work. Gives example using match(),
> > which also is faulty.
>
>How is that faulty?? For the strings i am using, it is working.
>
> > Solution: Multiple examples in which find() is shown to work.
> >
> > Result: Kat refuses to try find() again. That is Kat refuses to change her
> > code (yet again) to use find() as per any of the solutions.
>
>See below.
>
> > Consequence: Confusion as to whether or not Kat really wants help.
> >
> > Kat, if don't want help with find(), why bring it up? If you do want
> > help with find(), why don't do accept it?
>
>I had a problem with find(). I reported it. I recoded the apps to use
>match().
>Programs are now running fine, as of when i gave up on find() and began
>using match() and reported the problem here. My mistakes were reporting
>the problem here, and not saving the code and data that didn't run; but i was
>in a hurry (i noticed the dupes after 4 hrs of it running, which put me 4 hrs
>behind), and gave up on find() after 30 min or so, and simply over-wrote the
>bad code with something which works. As an aside, the person i was
>counting on to have working demo code for the data i obtained, hasn't written
>a line of code YET. So much for a tight schedule and being reliable. And i
>am am pretty sure i won't report any problems here again if i am in a hurry.
>This thread really takes the cake.
>
>I just sent Derek a screen shot of how busy the computer is. At this time, i
>don't have any free cpu clocks, memory, or bandwidth to test anything.
>Unless i shut things down,, and become the person who isn't getting things
>done.
>
>Kat
-- NODUPES.EX
include sort.e
include misc.e
sequence text atom t
integer fn, l sequence fname object line
--fname = "findmatch not working.txt"
--fname = "DUPES.TXT"
fname = "FINDMACH.TXT" --file of URLs Kat posted
--fname = "randdata.txt"
fn = open(fname, "r")
if fn = -1 then
printf(1, "Unable to open %s\n", {fname})
abort(0)
end if
-- with trace
-- trace(1)
text = {}
-- remove new-line
while 1 do
line = gets(fn)
if atom(line) then
exit
end if
l = length(line)
if equal(line[l], '\n') then
line = line[1..l - 1]
end if
-- putting all the data into this is, of course, unnecessary.
-- I just did it to aid development.
text = append(text, line)
end while
printf(1, "length(text) = %d\n", {length(text)})
-- I dont know the significance of "<done>" in the data, so I'm removing it.
for i = 1 to length(text) do
line = text[i]
l = length(line) - 7
if match(" <done>", line) = l + 1 then
text[i] = line[1..l]
end if
end for
sequence uniqueUrls uniqueUrls = {}
printf(1, "collecting uniqueUrls...\n", {}) t = time()
for i = 1 to length(text) do
if not find(text[i], uniqueUrls) then
uniqueUrls = append(uniqueUrls, text[i])
else
printf(1, "DUP FOUND: %s\n", {text[i]})
end if
end for
t = time() - t printf(1, "elapsed time is %f seconds.\n", {t})
printf(1, "length(text) = %d\n", {length(text)})
printf(1, "length(uniqueUrls) = %d\n", {length(uniqueUrls)})
uniqueUrls = sort(uniqueUrls)
pretty_print(1, uniqueUrls, {2}) puts(1, "\n\n")
This spits out one duplicate: DUP FOUND: http://www.ed.gov
Also, I noticed such things as:
"http:// www.buydirectory.com" has an embedded space;
"http://www.hub..terc.edu" 2 periods in a row;
"http://www.iee org.uk" has an embedded space;
"http://www.ipl,irg/reading/books" has an embedded comma
Sorry for the delay. I was trying to come up with a solution that wouldn't
take hours to run.
Please tell me if you want me to continue.
Bob
|
Not Categorized, Please Help
|
|