[Date Prev][Date Next][Subject Prev][Subject Next][
Date Index][
Subject Index]
Re: Duplicates Pattern Search
- Subject: Re: Duplicates Pattern Search
- From: "J. R. Fox" jr_fox@xxxxxxxxxx
- Date: Wed, 19 Dec 2001 13:38:18 -0800
Carl Distefano wrote:
> XPL can do this handily. Try the attached frame FDU
> (for Find Duplicate URLs).
Thanks for putting this together, Carl -- I do appreciate your
efforts. Based on a couple of trial runs, I find this approach awfully
slow. (Contrast to the way SPELL works on a named but Unopen file: a
Spell.Tmp file is generated before you know it.) Secondly, there are
no brakes on this thing, as far as I could see; once it starts, you
can't bail out of it. And if I go down past the point where it stops,
to initiate a new run (which you suggested), the routine returns to TOF
and resumes by covering ground already covered in the prior pass. Then
again, maybe I'm doing something wrong . . . . Please excuse if I
was just too dense in that regard.
> It doesn't spit out a list; it stops when it finds the first
> duplicate.
Given my observations above, it feels like I would be better off with a
generated list ala SPELL.TMP, and then removing the duplications
manually.
I am also wondering about near-matches and variations. Most -- but not
all -- of these are apt to occur in the URL _after_ ".com/" ".net/"
or whatever.
Jordan