[Date Prev][Date Next][Subject Prev][Subject Next][
Date Index][
Subject Index]
Re: Not entirely OT: ODT format
- Subject: Re: Not entirely OT: ODT format
- From: Harry Binswanger hb@xxxxxxxx
- Date: Thu, 04 Oct 2007 02:09:16 -0400
What interesting (and I did it twice to make sure I wasn't dreaming), is
that the ODT format was actually smaller than TXT. Short of compression,
anyone know how that's possible?
In principle, you can pseudo-compress a fair amount by using the 8th bit.
Wordstar used to do this in the DOS days: instead of putting in spaces
between the words, it "turned on" the (normally unused) 8th bit of the last
letter. Below, I use capitals (illicitly) to indicate a character with its
8th bit turned on, to show you how it saved space, using the sentence:
Now is the time for all good men to come to the aid of their party
NoWiSthEtimEfoRalLgooDmeNtOcomEtOthEaiDoFtheiRpartY
You understand that it isn't really caps--each thing I've represented as a
capital letter (other than the "N" at the start) would be a unique
"garbage" character.
You can do more than this without turning to mathematical compression
algorithms, if you use the extra bits made available by characters not on
the normal typewriter keyboard. For instance, the first 32 characters
(0-31, decimal) can be used for various purposes. (They were originally
"control-characters" that governed such things as onscreen cursor movement.)
The overall idea is that a byte is 8 bits, which gives 256 possibilities,
but text only uses 26 different letters (lot counting caps separately), 9
digits, maybe 12 punctuation marks. So encoding bit-wise, rather than
byte-wise can save a lot of space without mathematical compression.
As I understand it, mathematical compression would do things like
representing a line with 20 e's as "20e" which is only 3 characters instead
of 20. Of course, that is much, much cruder than the actual compression
routines. But it is clear that that is a different animal from bit-wise
encoding.
----- Original Message ----
From: Patricia M. Godfrey
To: xywrite@xxxxxxxx
Sent: Monday, September 24, 2007 5:50:32 PM
Subject: Not entirely OT: ODT format
ODT is the famous "open document format" that the OASIS
consortium has been pushing. Now anything that annoys M$ I tend
to think has merit. And this is alleged to be based on XML. So I
assumed it would be open, not merely in the sense of not being
proprietary, but in the sense that it would be readable as plain
text (even if cluttered with formatting), as HTML, XML, and even
RTF are. But no such thing. Might as well be Lower Slobbovian or
Outer Impish.
After an attempt to export a Xy file (first to RTF, then import
that to Open Office Writer, then save as .ODT) yielded gibberish,
I tried creating a simple text file ("This is a test. Times
Roman, 12 point, italic and bold." and so on with a
few other fonts, including junicode, which I suspected might have
been the spanner in the works. The resulting ODT file, viewed in
a plain text editor (not Xy, for fear there might be some
high-order chars that could provoke trouble), has a couple of
snippets of plain text (apparently part of a doctype sort of
specification:
mimetypeapplication/vnd.oasis.opendocument.textPK), with some
other characters that when I tried to copy them into Tbird sent
it into a tizzy. There the find feature of the text editor could
find NOTHING like test, Times, italic, or any actual words in the
file. I'm not going to risk copying and pasting again, but where
the body of the text should have been was something like this:
cap E grave, cap AE ligature, i, script f, cap A umlaut, cap A
acute, lowercase d, hyphen, tilde, cap A something (too small for
me to see; maybe the overcircle?)... As I said, Outer Impish.
This is an open standard?
--
Patricia M. Godfrey
PriscaMG@xxxxxxxx
Don't let your dream ride pass you by.
http://us.rd.yahoo.com/evt=51200/*http://autos.yahoo.com/index.html;_ylc=X3oDMTFibjNlcHF0BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDYXV0b3MtZHJlYW1jYXI-Make
it a reality with Yahoo! Autos.
Harry Binswanger
hb@xxxxxxxx