[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

OT: saving complete web pages



[Warning: Long-ish Post]

I wanted to revisit an earlier thread, which begins at
 http://xywrite.org/msg00785.htm , to
make sure there is no further comment and that I have
not missed any great solutions that may be out there
which remain unknown to me. This will incidentally
illustrate the great value of XySearch, which some of
us (myself included) have a tendency to overlook from
time to time.

The thread started with consideration of the .CFM and
.MHT formats for saving *entire* web pages, which
turned out *not* to be the answer I hoped they might
be.

Martin J. Osborne said:
> Unless you have some way of converting the web
document to PDF.

David Auerbach replied saying:
> In the Mac world (I'm not sure about the Windows
> world), if you are looking at a web page and hit
> Print, then you have the option, instead of
> printing, to save-as-pdf.

If that caught all the page elements -- charts,
drawings, screenshots, etc. -- and we could do this in
Windows, that could be what I'm looking for.

Then, in
http://xywrite.org/msg00785.htm Carl
mentioned the FireFox (and Mozilla) choice between
"Save as HTML" and "Save as Web Page, Complete." I
have in fact used the latter a few times, mostly by
accident. The problem is, what it does with all the
scattered page elements renders it kind of useless.
It is almost like handing someone a fully disassembled
Swiss watch and saying, 'O.K. -- here's your
timepiece.'

Someone else (I've lost track of the citation)
suggested:
> Does printing to a file using a Postscript printer
> and then using Ghostview to convert to pdf not
> work?

I haven't tried this, but I doubt it's going to pick
up the aforementioned graphic elements.

Patricia cautioned:
> The gotcha there is that some Web pages aren't
> really HTML. I have one that I tried to save that
> way, and what I got was a javascript and a string
> of jpgs and gifs. No way to know which to open
> first without the jScript

And Brian Henderson zeroed in on another aspect of the
problem and possible solutions:
> Converting web pages to PDF is not difficult.
> There are plenty of freeware apps that'll do it.
> The problem is that web pages are generally not
> formatted as 8 1/2 x 11 (or much of anything that
> conforms to non-web conventions), and PDF
> conversion will break a continuous page into
> print-sized pieces...well, the page-breaks can be
> very > "unsatisfactory". Ugly breaks in the middle
> of paragraphs (sometimes parts disappear), breaks
> in the middle of graphics (more than annoying),
> and breaks in the middle of wrapped links (gives
> literal meaning to "break"). Tools that will fix
> or prevent these problems are WAY too expensive
> for the casual user.

Yes, but I've seen it done as an end result: articles
that were online originally as some kind of HTML or
XML (?) pages -- in any case, originally *not* PDF --
and then the hosting site bit the dust, but then
someone turned the article into PDF and made it
available elsewhere. Same article, same graphics
included, no apparent changes. I hope no one spent
weeks of toil in order to accomplish this.

I agreed with David and with Brian that it might have
to come down to the old low-tech solution of 'Just
Print it Out and File It.'

Finally, I had managed to find something, which I'd
since forgotten about, in the form of an extension for
Moz | FF:
http://xywrite.org/msg00792.htm

So, I guess that is what I'll check out next, unless
someone has a better idea.

This was a useful exercise, but following the thread
trail of breadcrumbs in XySearch was perhaps a bit
more difficult than it needed to be . . . .


Jordan