[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: Duplicates Pattern Search



Jordan writes:
≪ I was wondering if there might not be some .U2
function that could parse a large file like this and spit
out a list of duplicated or very-near-duplicated URLs ?
(Dates, descriptions, etc. around the URL are probably
irrelevant.) I guess this amounts to a search where
there is no search string that is known or specified in
advance -- maybe too tall an order for xpl ? ≫

XPL can do this handily. Try the attached frame FDU
(for Find Duplicate URLs). It doesn't spit out a list;
it stops when it finds the first duplicate. Delete the
dupe manually and then run FDU again until you get a "No
duplicates" report. I opted to flag dupes rather than
list them with, say, CP locations. I also decided
against automatically deleting dupes, because you'll
probably want to delete some surrounding text as well.
It seemed prudent to let the user do this manually rather
than try to guess what should be deleted.

FDU uses Jumbo U2 function FindNextURL (and therefore
requires the Jumbo U2). Follow the instructions I posted
yesterday for adding a frame to U2. To run it, open the
subject file and issue FDU.

--
Carl Distefano
cld@xxxxxxxx
http://users.datarealm.com/xywwweb/


Attachment: fdu.u2
Description: Binary data