[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: About all xy



** Reply to note from xywrite@xxxxxxxx Wed, 26 May 1999 00:06:24 +0200

> Mr. Holmgren wrote: "Frame "importMSWord" in XYWWWEB.U2 creates, from your
> source file, a new file called MSWRD2XY.### (leaving the original
> untouched), which effectively imports Word texts (without formatting) into
> EDITOR. Simple, fast." I have been unable to find this frame in XyWWWEB.U2
> version 043. Am I wrong? Manuel Castelao

My apologies. You're right; the routine is something I'd been developing
privately and hadn't yet committed to XYWWWEB.U2. It all melds together in
my mind, I'm afraid. Next public version will contain it...

Conversion of MSWord documents is an interesting exercise. The final text of
all MSWord docs commences at 600h, but the preceding code contains a lot
of Ascii-0 characters, which XyWrite ignores when it reads these texts, and
therefore, from XyWrite's perspective, final text begins at a variable
position (600 minus all the Ascii-zeros, of which there are an unknown
number). So instead if just jumping to position 600h, you need to perform
some sort of intelligent analysis of the characters in the file to determine
where binary code ends and human speech begins. I do this by sampling the
characters and insisting that in any run of 50 characters, 95% fall in the
Ascii range 32-127 -- I take that to be the beginning of speech. Where the
final text of MSWord documents _ends_ is easy: at the first occurrence of
Ascii-4. Anyway, the routine was fun to develop, and let's you see the basic
text quickly, even though there may be 2-4 characters of garbage at the
very beginning, or (less frequently) a couple of characters at the beginning
might be omitted.


---------
Robert Holmgren
holmgren@xxxxxxxx
---------