[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: Duplicates Pattern Search



Carl Distefano wrote:

> XPL can do this handily. Try the attached frame FDU
> (for Find Duplicate URLs).

Thanks for putting this together, Carl -- I do appreciate your
efforts. Based on a couple of trial runs, I find this approach awfully
slow. (Contrast to the way SPELL works on a named but Unopen file: a
Spell.Tmp file is generated before you know it.) Secondly, there are
no brakes on this thing, as far as I could see; once it starts, you
can't bail out of it. And if I go down past the point where it stops,
to initiate a new run (which you suggested), the routine returns to TOF
and resumes by covering ground already covered in the prior pass. Then
again, maybe I'm doing something wrong . . . .  Please excuse if I
was just too dense in that regard.

> It doesn't spit out a list; it stops when it finds the first
> duplicate.

Given my observations above, it feels like I would be better off with a
generated list ala SPELL.TMP, and then removing the duplications
manually.

I am also wondering about near-matches and variations. Most -- but not
all -- of these are apt to occur in the URL _after_ ".com/" ".net/"
or whatever.

Jordan