[Date Prev][Date Next][Subject Prev][Subject Next][
Date Index][
Subject Index]
Re: New XYENC 1/13/09 release
- Subject: Re: New XYENC 1/13/09 release
- From: wbass@xxxxxxxx
- Date: Thu, 22 Jan 2009 01:49:19 -0700 (MST)
Harry Binswanger wrote on Tue, 20 Jan 2009 22:15:52 -0500
>Wally,
>I have a couple of questions [about XYENC/XYDEC] to which
>you must have obvious answers.
>1. Why are underlines used instead of spaces?
A basic rule of the XYDEC program is that you can add whitespace (blanks,
tabs, CRLFs) willy-nilly to its XYENC encoded input file without affecting
the decode back to the original. The XYENC encoder doesn't itself actually
add whitespace, but the provision for allowing whitespace is what allows
the user to do so, and allows for XPL programs like QDF1.PM to do so, to
"format" the material and make it more readable, if and when that is
desired.
In getting back to the original, XYDEC then throws away blanks and other
white space, also willy nilly. But blanks are generally significant in XPL
programs, so XYDEC must preserve exactly those blanks THAT WERE IN THE
ORIGINAL. So whitespace that was in the original has to be translated to
something other than blanks, to differentiate it from blanks that the
users might have added to the encoded file for formatting purpose.
Underscore is I think the best character to use for that translation, to
allow the user to be able to see where there actually were spaces in the
original. Generally, there aren't many blanks in XPL code, so the
underscores aren't that hard to digest.
>2. Why is ;*; encoded to something other than ;*; and,
>given that it is encoded, why to: ',*','^
Well firstly, you should understand that XYENC doesn't "understand" and
encode sequences like ";*;" -- it only encodes individual characters. So
it doesn't encode "the sequence" at all. XYENC encodes individual
"characters" in a uniform way regardless of context. (By "characters" in
the preceding, I'm also including the 3 byte encodings that XyWrite treats
as single characters.)
(Almost) any character in XyWrite XPL can be either 1 byte encoded or 3
byte encoded, and the way a character is encoded in an XPL program can
make a difference in what the XPL program does. In encoding, XYENC must do
away with all 3 byte encodings, because they're not ASCII, and because
they're not useable or acceptable ANYWHERE except inside XyWrite.
Conceptually, XYENC deals with that by simply translating all 3 byte
encodings in its input to the 1 byte representation of the same character.
But in order to get the 3 byte version back during decode, XYENC has to
"mark" the characters in the encoded file that were originally 3 byte
encodings. This allows XYDEC to convert them back to 3 byte encodings
during decode.
The way that XYENC "marks" bytes that were originally encoded as three
bytes is to precede them in the encoded file with a ":" character (if they
were originally the old fashioned XyWrite III+ 3 byte encodings), or a ";"
character (if they were encoded as 3 byte XyWrite IV "red pseudo-wildcard"
characters). Hence, XYENC can no longer allow actual ":" or ";" characters
in the original text to simply be passed through during encoding. If it
allowed them to pass through at encode time, it would have no way at
decode time to tell which ":"'s and ";"'s were "real" ":"'s
and ";"'s, vs.
which ":"'s and ";"'s were generated by the 3 byte encode "marking"
strategy.
So XYENC has to translate "real" ":" and ";" input characters to
something
else. There aren't any free characters to use, so a multibyte sequence has
to be used. For ";", that sequence just happens to be "',", which is
visually about as close to ";" as I could get.
Keep in mind that, although XYDEC will translate a "'," (quote, comma)
sequence back to a ";" (semicolon), there are other encodings that it will
also translate back to a semicolon. One of those encodings is a XyWrite 3
byte encoded semicolon. So, IF THE ENVIRONMENT WHERE THE TRANSLATION IS TO
BE VIEWED IS KNOWN TO BE LIMITED TO XYWRITE, you can use things like the
QDF2.PM XPL program to translate "'," pairs to XyWrite 3 byte encoded ";"
characters, without affecting the ability to decode back to the original.
The same is true for various other translations. Doing so does away with
most of the VISUAL differences of this kind, between an encoded file and
the original.
Now, you may say "Gee, I started with 3 byte encoded stuff, ran XYENC to
get rid of it, and after running QDF2, I once again have 3 byte encoded
stuff in the encoded file. What, then, did I gain in the process?" The
answer to that is (a) you now have a file that you can read in XyWrite
normal mode, rather than to having to read it in expanded mode (which
destroys all formatting), and (b) you have a new format where you can add
whitespace for better visual formatting, without changing the "meaning" of
the file. In fact, you now even have tools which automatically add
whitespace for improved visual formatting. And, there are provisions for
adding XyWrite markup (embedded commands) and comments to the encoded
version -- again, without changing the "meaning" or decoding of the
program back to the original.
>3. Encoding U2 routines went flawlessly. But when I try to
>encode something very simple, I get a lot of apparent
>garbage in the .ENC file produced. E.g., this:
>
> gave this:
> '01-20-2009 22:09:04
> ~~~Z~GH~209...
> '01-20-2009 22:09:04
> ~~~Z~GH~209~224~254~014~139~_~163~203Y~142
> ~F~159~X3~219~137~^6Y~232~128~@X~162~139~_~195~247~F~215
> ~M~F~@u~U~161~209~M',~F4Yu~M~142~F~159~X~198~F~156~K~A
> ~232~E~@~195~232x .......
In the encoding you provided, the first line is a time stamp, and is not
interesting to what we are discussiong. So I'll ignore that.
I don't know what tool was used to create the file that ostensibly
contained the "", but it appears that whatever tool it was
left a lot of garbage in the file, following the EOF mark. There are
programs which do that -- in fact, your XYCOMP program is one of them.
The ~Z about 16 chracters into the second line shows where there was an
EOF (end-of-file, or "Ctrl-Z") mark in the original file after the
"", as one might expect. But it would appear that the file
didn't stop there. Whatever program your were using to create the file
containing "" apparently leaves (lotsa) garbage following the
EOF marker.
EOF markers have been obsolete for about 15 years now, and no modern
program that I know of recognizes them any more (other than perhaps NB
8.0, if you want to call that modern). For example, none of todays web
browsers stop reading at an EOF mark -- they merely include the EOF mark
as part of the data (and typically display it as a small square). Notepad
also simply displays EOFs as data. Etc.
Nowadays, the end of a file is properly defined by the file length, as
indicated in the directory. XyWrite, though, being very old, still uses
EOF marks. In so doing, XyWrite (III+ at least) won't show any junk that
might be present following the first EOF character in a file. Nor does it
show you the EOF char itself. So it may not be immediately apparent when
junk is even there. Which I presume is why you didn't see that there was
junk there, even though (I believe) there was junk there.
XYENC doesn't stop processing at an EOF marker -- it encodes it, and
continues on processing until the real end of the file, as defined by the
file length in the directory entry. On decode, it reconstructs EOF markers
exactly as they were in the original. It doesn't add EOFs, and it doesn't
remove EOFs, and it doesn't stop on EOFs -- doing any of those things
would "break" some XPL programs.
I looked at the garbage following the EOF marker, and it appears to be 16
bit binary executable code. I've disassembled and listed it below, for
what you included. It's clearly code, and pretty clearly old 16 bit code,
but I can't identify it. It's not code from the XYENC or XYDEC modules.
>Interestingly, despite its length--166,432 bytes--it
>decoded back to the original!
For typical binary data, like executable code, XYENC will encode the data
without trouble, but increase the length of the encoded file by a factor
of 2.5 or so. So it looks like the file you encoded was about 64K bytes in
length. XYENCLH would typically increase the lenght of such a file by
about 18%.
Given that EOF characters are obsolete in the modern world, but still
potentially important in the XyWrite world, my general advice would be
that you ought to obtain a toolset that treats EOF characters rationally,
rather than dumping garbage behind the EOF characters and expecting other
programs to hide the garbage. For programs which do use EOF characters,
they should write exactly one at the end of the file, and write no
additional garbage after that (XyWrite III meets that requirement -- I
don't know that much about XyWrite IV.) Failing that, you probably at
least ought to be familiar enough with your tools to know what those tools
do in this regard.
Wally Bass
Here's a listing of what apparently followed the in your
source file.
org 100h
x100 label near
db 0AEh ;left guillemet
db 'SV01,A'
db 0AFh ;right guillemet
db 0AEh ;left guillemet
db 'EX'
db 0AFh ;right guillemet
db 1Ah ;End of File Marker
;garbage starts here
scasw
sbb al,ds:[bx]
dec ax
shl ax,1
dec byte ptr ds:[1F8Bh]
mov ds:[59CBh],ax
mov es,ds:[189Fh]
xor bx,bx
mov ds:[5936h],bx
call x1A5
pop ax
mov ds:[1F8Bh],al
ret
test word ptr ds:[0DD7h],6
jnz x147
mov ax,ds:[0DD1h]
cmp ax,ds:[5934h]
jne x148
mov es,ds:[189Fh]
mov byte ptr ds:[0B9Ch],1
call x14C
x147: ret
x148: call 0C3h
ret
;
x14C proc near
call 77Ch
pushf
xchg dx,ax
push dx
call x2A4
push bx
mov ax,ds:[0DC9h]
dec ax
shl ax,1
cmp bx,ax
jbe x162
mov bx,ax
x162: cmp bx,ds:[59C9h]
jnb x16C
mov bx,ds:[59C9h]
x16C: mov ds:[59CBh],bx
mov word ptr ds:[59C9h],0
pop bx
pop ax
popf
jc x17E
call x2A4
x17E: push ds
pop es
mov cx,bx
shr cx,1
mov di,5936h
cld
xor ax,ax
repne scasw
shl cx,1
sub bx,cx
x190: sub bx,+2
jbe x1A1
mov si,ds:[bx+5936h]
or si,si
jz x190
call x1B4
ret
x1A1: call x1A5
ret
x14C endp
;
x1A5 proc near
xor bx,bx
mov si,ds:[0DD1h]
mov byte ptr ds:[0B9Ch],1
call x1B4
ret
x1A5 endp
;
x1B4 proc near
cmp si,ds:[0DD3h]
jnb x1C2
mov si,ds:[0DD3h]
mov ds:[0DD1h],si
x1C2: cmp si,ds:[0DD5h]
jbe x1CC
mov si,ds:[0DD5h]
x1CC: push bx
call 51h
adc ax,5B36h
call 2D4h
xchg di,ds:[59C4h]
mov ax,ds:[0DC9h]
cmp ax,0FFFFh
jne x1FE
xor ah,ah
mov al,ds:[0E1Dh]
cmp byte ptr ds:[1882h],0Bh
je x1FE
mov al,ds:[1893h]
cmp byte ptr ds:[0DD0h],1Fh
jne x1FA
dec al
x1FA: sub al,ds:[0E1Dh]
x1FE: shl ax,1
mov ds:[59D5h],ax
mov ax,ds:[188Fh]
shl ax,1
mov ds:[59D7h],ax
call 51h
adc al,36h ;"6"
xchg si,ds:[bx+5936h]
mov di,ds:[59C4h]
mov si,189Fh
call 51h
or ds:[bx],bp
mov es,ds:[189Fh]
jc x259
mov ax,ds:[0DC9h]
inc ax
jnz x24B
mov ax,ds:[3621h]
shr ax,1
mov dx,ds:[1893h]
cmp byte ptr ds:[0DD0h],1Fh
jne x23D
dec dx
x23D: sub dx,ds:[0DC5h]
dec dx
cmp dx,ax
jnb x248
mov ax,dx
x248: mov ds:[0DC9h],ax
x24B: cmp byte ptr ds:[0B9Ch],0
je x259
test byte ptr ds:[0B9Ch],2
jz x25A
x259: ret
x25A: mov si,ds:[bx+5936h]
cmp si,ds:[0DD5h]
jb x275
push si
call 0B0h
mov ax,ds:[5936h]
mov ds:[0DD1h],ax
pop si
jc x275
call 0C3h
ret
x275: cmp si,ds:[0E48h]
jb x282
xor si,si
mov ds:[0E48h],si
dec si
x282: mov ds:[0E46h],si
mov ax,ds:[0DE1h]
cmp ax,ds:[0E1Fh]
jnb x292
mov ax,ds:[0E1Fh]
x292: mov ds:[59C7h],ax
mov al,ds:[0E23h]
mov ds:[59C6h],al
clc
ret
mov ax,ds:[0E1Fh]
call x2AD
ret
;x1B4 endp
;
x2A4 proc near
cmp ax,ds:[????]
x2A4 endp
db '[BIGSNIP]'
db 0AEh ;left guillemet
db 'SV01,A'
db 0AFh ;right guillemet
db 0AEh ;left guillemet
db 'EX'
db 0AFh ;right guillemet
db 1Ah ;End of File Marker
end x100