[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: Cleaning up html



Reply to note from Jay McNally  Tue, 04 Jun 2002
09:28:44 -0400

> I often need to take text from a web document that has been
> saved in html.
> ...
> Can I run a CHange command with wildcards that would erase the
> whole string between the brackets, such as the following junk?

If you load the Jumbo U2, run DELTAGS with the subject file
in the current window. Working on a copy of the file, it deletes
all , including embedded scripts, comments, etc., and
associated white space (carriage returns, spaces and tabs), while
preserving "preformatted" text (text bounded by 
...
tags). Switch to the alternate screen to view the (umodified) original file. You can then command HLINKS to extract a list of all URLs in the file, in HTML or XyWrite file format. The next release of U2 will include LISTURLS, which produces a sorted list of URLs, grouping duplicates and similar URLs together -- the utility I wrote for Jordan late last year. -- Carl Distefano cld@xxxxxxxx http://users.datarealm.com/xywwweb/