[Date Prev][Date Next][Subject Prev][Subject Next][
Date Index][
Subject Index]
Re: Cleaning up html
- Subject: Re: Cleaning up html
- From: cld@xxxxxxxx (Carl Distefano)
- Date: Tue, 4 Jun 2002 21:31:54 -0400
Reply to note from Jay McNally Tue, 04 Jun 2002
09:28:44 -0400
> I often need to take text from a web document that has been
> saved in html.
> ...
> Can I run a CHange command with wildcards that would erase the
> whole string between the brackets, such as the following junk?
If you load the Jumbo U2, run DELTAGS with the subject file
in the current window. Working on a copy of the file, it deletes
all , including embedded scripts, comments, etc., and
associated white space (carriage returns, spaces and tabs), while
preserving "preformatted" text (text bounded by ...
tags).
Switch to the alternate screen to view the (umodified) original
file. You can then command HLINKS to extract a list of all
URLs in the file, in HTML or XyWrite file format. The next release
of U2 will include LISTURLS, which produces a sorted list of URLs,
grouping duplicates and similar URLs together -- the utility I wrote
for Jordan late last year.
--
Carl Distefano
cld@xxxxxxxx
http://users.datarealm.com/xywwweb/