[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: Not entirely OT: ODT format




What interesting (and I did it twice to make sure I wasn't dreaming), is that the ODT format was actually smaller than TXT. Short of compression, anyone know how that's possible?
In principle, you can pseudo-compress a fair amount by using the 8th bit. Wordstar used to do this in the DOS days: instead of putting in spaces between the words, it "turned on" the (normally unused) 8th bit of the last letter. Below, I use capitals (illicitly) to indicate a character with its 8th bit turned on, to show you how it saved space, using the sentence: Now is the time for all good men to come to the aid of their party NoWiSthEtimEfoRalLgooDmeNtOcomEtOthEaiDoFtheiRpartY You understand that it isn't really caps--each thing I've represented as a capital letter (other than the "N" at the start) would be a unique "garbage" character. You can do more than this without turning to mathematical compression algorithms, if you use the extra bits made available by characters not on the normal typewriter keyboard. For instance, the first 32 characters (0-31, decimal) can be used for various purposes. (They were originally "control-characters" that governed such things as onscreen cursor movement.) The overall idea is that a byte is 8 bits, which gives 256 possibilities, but text only uses 26 different letters (lot counting caps separately), 9 digits, maybe 12 punctuation marks. So encoding bit-wise, rather than byte-wise can save a lot of space without mathematical compression. As I understand it, mathematical compression would do things like representing a line with 20 e's as "20e" which is only 3 characters instead of 20. Of course, that is much, much cruder than the actual compression routines. But it is clear that that is a different animal from bit-wise encoding.
----- Original Message ---- From: Patricia M. Godfrey To: xywrite@xxxxxxxx Sent: Monday, September 24, 2007 5:50:32 PM Subject: Not entirely OT: ODT format ODT is the famous "open document format" that the OASIS consortium has been pushing. Now anything that annoys M$ I tend to think has merit. And this is alleged to be based on XML. So I assumed it would be open, not merely in the sense of not being proprietary, but in the sense that it would be readable as plain text (even if cluttered with formatting), as HTML, XML, and even RTF are. But no such thing. Might as well be Lower Slobbovian or Outer Impish. After an attempt to export a Xy file (first to RTF, then import that to Open Office Writer, then save as .ODT) yielded gibberish, I tried creating a simple text file ("This is a test. Times Roman, 12 point, italic and bold." and so on with a few other fonts, including junicode, which I suspected might have been the spanner in the works. The resulting ODT file, viewed in a plain text editor (not Xy, for fear there might be some high-order chars that could provoke trouble), has a couple of snippets of plain text (apparently part of a doctype sort of specification: mimetypeapplication/vnd.oasis.opendocument.textPK), with some other characters that when I tried to copy them into Tbird sent it into a tizzy. There the find feature of the text editor could find NOTHING like test, Times, italic, or any actual words in the file. I'm not going to risk copying and pasting again, but where the body of the text should have been was something like this: cap E grave, cap AE ligature, i, script f, cap A umlaut, cap A acute, lowercase d, hyphen, tilde, cap A something (too small for me to see; maybe the overcircle?)... As I said, Outer Impish. This is an open standard? -- Patricia M. Godfrey PriscaMG@xxxxxxxx Don't let your dream ride pass you by. http://us.rd.yahoo.com/evt=51200/*http://autos.yahoo.com/index.html;_ylc=X3oDMTFibjNlcHF0BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDYXV0b3MtZHJlYW1jYXI-Make it a reality with Yahoo! Autos.
Harry Binswanger hb@xxxxxxxx