[Date Prev][Date Next][Subject Prev][Subject Next][
Date Index][
Subject Index]
RE: Cleaning up html
- Subject: RE: Cleaning up html
- From: Jay McNally jmcnally@xxxxxxxx
- Date: Tue, 04 Jun 2002 10:28:15 -0400
Many thanks Brian,
This works. I tried a version of this yesterday and screwed it up.
Jay
At 07:00 AM 6/4/02 -0700, you wrote:
Jay, you can use CH \<"wild-string">\\ and everything in brackets,
including the brackets, will be deleted. I just tried this on a 2-meg SGML
file and only one set of brackets was left (it had 53 characters and it
wrapped, maybe that had something to do with it). This technique can be
somewhat dangerous if there's a missing close-bracket.
Then there's always the (blasphemous) technique where you open the HTML
file in a browser and use it to save the file as text...that'll do the
trick too.
Brian Henderson
Print Composition Dept.
Mitchell Repair Information Co.
San Diego, CA
www.mitchell1.com
brian.henderson@xxxxxxxx
(858) 391-5000 - x.6533
-----Original Message-----
From: Jay McNally [mailto:jmcnally@xxxxxxxx]
Sent: Tuesday, June 04, 2002 6:29 AM
To: xywrite@xxxxxxxx
Subject: Cleaning up html
Can anyone offer me some advice for this problem?
I often need to take text from a web document that has been saved in html.
My somewhat tedious but simple process for some years is to simply loop an
xpl routine that defines then deletes everything from the first "less
than" bracket to the next "greater than" bracket. I then manually clean up
the rest of the junk. It works.
Can I run a CHange command with wildcards that would erase the whole string
between the brackets, such as the following junk?
I'm thinking it would be nice to have one command that would clean out
everything between the brackets, then another command deleting the
brackets. I tinkered with it briefly yesterday and got nowhere.
Is there a simpler way around this problem?
Thanks
Jay