[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: Cleaning up html




Thanks Nicholas.
Whenever I can I also copy and past into XyWrite, and this works, as you said, like a charm. But some web sites put some protection of some kind on the file which prevents copying and pasting. The Detroit News (detnews.com) is particularly difficult in this regard.
The only way to get the text, from what I can figure out, is to open the
htm file in Xy and strip out the html tags.





Some newspaper do not allow you to copy and paste


At 10:59 AM 6/4/02 -0400, you wrote:
Your solution is far more ingenious than mine, but I do this all the time from dowloaded newspaper and journal articles. I simply call up the saved htm file in my browser, select what I want, copy it to the Windows clipboard with Control-C, open a file in XyWrite, hit Alt-Enter to show Xywrite in that little box (windows box?) and then use the paste command on box. In goes all the text. Works like a charm, usually.
For some reason, though I save most of the stuff through my Netscape
browser, when I call it up to copy and paste it to XyWrite, I use Mr.
Gates's Internet Explorer, and that seems to work better.

Nicholas Clifford
clifford@xxxxxxxx
PS It would be nice if Windows programs like NBWin or Word for Windows, could translate html files. They say they can, but in fact they can't, and all the html junk appears in the word processing file. At least it does for me.

Jay McNally wrote:

Can anyone offer me some advice for this problem?

I often need to take text from a web document that has been saved in html.
My somewhat tedious but simple process for some years is to simply loop an xpl routine that defines then deletes everything from the first "less than" bracket to the next "greater than" bracket. I then manually clean up the rest of the junk. It works. . . .