[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: Duplicates Pattern Search

Subject: Re: Duplicates Pattern Search
From: cld@xxxxxxxx (Carl Distefano)
Date: Tue, 18 Dec 2001 20:06:17 +0000

Jordan writes:
≪ I was wondering if there might not be some .U2
function that could parse a large file like this and spit
out a list of duplicated or very-near-duplicated URLs ?
(Dates, descriptions, etc. around the URL are probably
irrelevant.) I guess this amounts to a search where
there is no search string that is known or specified in
advance -- maybe too tall an order for xpl ? ≫

XPL can do this handily. Try the attached frame FDU
(for Find Duplicate URLs). It doesn't spit out a list;
it stops when it finds the first duplicate. Delete the
dupe manually and then run FDU again until you get a "No
duplicates" report. I opted to flag dupes rather than
list them with, say, CP locations. I also decided
against automatically deleting dupes, because you'll
probably want to delete some surrounding text as well.
It seemed prudent to let the user do this manually rather
than try to guess what should be deleted.

FDU uses Jumbo U2 function FindNextURL (and therefore
requires the Jumbo U2). Follow the instructions I posted
yesterday for adding a frame to U2. To run it, open the
subject file and issue FDU.

--
Carl Distefano
cld@xxxxxxxx
http://users.datarealm.com/xywwweb/


Attachment:
fdu.u2

Description: Binary data

Prev by Date: Duplicates Pattern Search
Next by Date: Re: Duplicates Pattern Search
Previous by thread: Duplicates Pattern Search
Next by thread: Re: Duplicates Pattern Search
Index(es):
- Date
- Subject