[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: New XYENC 1/13/09 release



Harry Binswanger wrote on Tue, 20 Jan 2009 22:15:52 -0500
>Wally,

>I have a couple of questions [about XYENC/XYDEC] to which
>you must have obvious answers.

>1. Why are underlines used instead of spaces?

A basic rule of the XYDEC program is that you can add whitespace (blanks,
tabs, CRLFs) willy-nilly to its XYENC encoded input file without affecting
the decode back to the original. The XYENC encoder doesn't itself actually
add whitespace, but the provision for allowing whitespace is what allows
the user to do so, and allows for XPL programs like QDF1.PM to do so, to
"format" the material and make it more readable, if and when that is
desired.

In getting back to the original, XYDEC then throws away blanks and other
white space, also willy nilly. But blanks are generally significant in XPL
programs, so XYDEC must preserve exactly those blanks THAT WERE IN THE
ORIGINAL. So whitespace that was in the original has to be translated to
something other than blanks, to differentiate it from blanks that the
users might have added to the encoded file for formatting purpose.
Underscore is I think the best character to use for that translation, to
allow the user to be able to see where there actually were spaces in the
original. Generally, there aren't many blanks in XPL code, so the
underscores aren't that hard to digest.

>2. Why is ;*; encoded to something other than ;*; and,
>given that it is encoded, why to: ',*','^

Well firstly, you should understand that XYENC doesn't "understand" and
encode sequences like ";*;" -- it only encodes individual characters. So
it doesn't encode "the sequence" at all. XYENC encodes individual
"characters" in a uniform way regardless of context. (By "characters" in
the preceding, I'm also including the 3 byte encodings that XyWrite treats
as single characters.)

(Almost) any character in XyWrite XPL can be either 1 byte encoded or 3
byte encoded, and the way a character is encoded in an XPL program can
make a difference in what the XPL program does. In encoding, XYENC must do
away with all 3 byte encodings, because they're not ASCII, and because
they're not useable or acceptable ANYWHERE except inside XyWrite.

Conceptually, XYENC deals with that by simply translating all 3 byte
encodings in its input to the 1 byte representation of the same character.
But in order to get the 3 byte version back during decode, XYENC has to
"mark" the characters in the encoded file that were originally 3 byte
encodings. This allows XYDEC to convert them back to 3 byte encodings
during decode.

The way that XYENC "marks" bytes that were originally encoded as three
bytes is to precede them in the encoded file with a ":" character (if they
were originally the old fashioned XyWrite III+ 3 byte encodings), or a ";"
character (if they were encoded as 3 byte XyWrite IV "red pseudo-wildcard"
characters). Hence, XYENC can no longer allow actual ":" or ";" characters
in the original text to simply be passed through during encoding. If it
allowed them to pass through at encode time, it would have no way at
decode time to tell which ":"'s and ";"'s were "real" ":"'s
and ";"'s, vs.
which ":"'s and ";"'s were generated by the 3 byte encode "marking"
strategy.

So XYENC has to translate "real" ":" and ";" input characters to
something
else. There aren't any free characters to use, so a multibyte sequence has
to be used. For ";", that sequence just happens to be "',", which is
visually about as close to ";" as I could get.

Keep in mind that, although XYDEC will translate a "'," (quote, comma)
sequence back to a ";" (semicolon), there are other encodings that it will
also translate back to a semicolon. One of those encodings is a XyWrite 3
byte encoded semicolon. So, IF THE ENVIRONMENT WHERE THE TRANSLATION IS TO
BE VIEWED IS KNOWN TO BE LIMITED TO XYWRITE, you can use things like the
QDF2.PM XPL program to translate "'," pairs to XyWrite 3 byte encoded ";"
characters, without affecting the ability to decode back to the original.
The same is true for various other translations. Doing so does away with
most of the VISUAL differences of this kind, between an encoded file and
the original.

Now, you may say "Gee, I started with 3 byte encoded stuff, ran XYENC to
get rid of it, and after running QDF2, I once again have 3 byte encoded
stuff in the encoded file. What, then, did I gain in the process?" The
answer to that is (a) you now have a file that you can read in XyWrite
normal mode, rather than to having to read it in expanded mode (which
destroys all formatting), and (b) you have a new format where you can add
whitespace for better visual formatting, without changing the "meaning" of
the file. In fact, you now even have tools which automatically add
whitespace for improved visual formatting. And, there are provisions for
adding XyWrite markup (embedded commands) and comments to the encoded
version -- again, without changing the "meaning" or decoding of the
program back to the original.

>3. Encoding U2 routines went flawlessly. But when I try to
>encode something very simple, I get a lot of apparent
>garbage in the .ENC file produced. E.g., this:

> 

> gave this:

> '01-20-2009 22:09:04
> ~~~Z~GH~209...
> '01-20-2009 22:09:04
> ~~~Z~GH~209~224~254~014~139~_~163~203Y~142
> ~F~159~X3~219~137~^6Y~232~128~@X~162~139~_~195~247~F~215
> ~M~F~@u~U~161~209~M',~F4Yu~M~142~F~159~X~198~F~156~K~A
> ~232~E~@~195~232x .......

In the encoding you provided, the first line is a time stamp, and is not
interesting to what we are discussiong. So I'll ignore that.

I don't know what tool was used to create the file that ostensibly
contained the "", but it appears that whatever tool it was
left a lot of garbage in the file, following the EOF mark. There are
programs which do that -- in fact, your XYCOMP program is one of them.

The ~Z about 16 chracters into the second line shows where there was an
EOF (end-of-file, or "Ctrl-Z") mark in the original file after the
"", as one might expect. But it would appear that the file
didn't stop there. Whatever program your were using to create the file
containing "" apparently leaves (lotsa) garbage following the
EOF marker.

EOF markers have been obsolete for about 15 years now, and no modern
program that I know of recognizes them any more (other than perhaps NB
8.0, if you want to call that modern). For example, none of todays web
browsers stop reading at an EOF mark -- they merely include the EOF mark
as part of the data (and typically display it as a small square). Notepad
also simply displays EOFs as data. Etc.

Nowadays, the end of a file is properly defined by the file length, as
indicated in the directory. XyWrite, though, being very old, still uses
EOF marks. In so doing, XyWrite (III+ at least) won't show any junk that
might be present following the first EOF character in a file. Nor does it
show you the EOF char itself. So it may not be immediately apparent when
junk is even there. Which I presume is why you didn't see that there was
junk there, even though (I believe) there was junk there.

XYENC doesn't stop processing at an EOF marker -- it encodes it, and
continues on processing until the real end of the file, as defined by the
file length in the directory entry. On decode, it reconstructs EOF markers
exactly as they were in the original. It doesn't add EOFs, and it doesn't
remove EOFs, and it doesn't stop on EOFs -- doing any of those things
would "break" some XPL programs.

I looked at the garbage following the EOF marker, and it appears to be 16
bit binary executable code. I've disassembled and listed it below, for
what you included. It's clearly code, and pretty clearly old 16 bit code,
but I can't identify it. It's not code from the XYENC or XYDEC modules.

>Interestingly, despite its length--166,432 bytes--it
>decoded back to the original!

For typical binary data, like executable code, XYENC will encode the data
without trouble, but increase the length of the encoded file by a factor
of 2.5 or so. So it looks like the file you encoded was about 64K bytes in
length. XYENCLH would typically increase the lenght of such a file by
about 18%.

Given that EOF characters are obsolete in the modern world, but still
potentially important in the XyWrite world, my general advice would be
that you ought to obtain a toolset that treats EOF characters rationally,
rather than dumping garbage behind the EOF characters and expecting other
programs to hide the garbage. For programs which do use EOF characters,
they should write exactly one at the end of the file, and write no
additional garbage after that (XyWrite III meets that requirement -- I
don't know that much about XyWrite IV.) Failing that, you probably at
least ought to be familiar enough with your tools to know what those tools
do in this regard.

Wally Bass

Here's a listing of what apparently followed the  in your
source file.

    org   100h
x100  label  near
    db   0AEh  ;left guillemet
    db   'SV01,A'
    db   0AFh  ;right guillemet
    db   0AEh  ;left guillemet
    db   'EX'
    db   0AFh  ;right guillemet
    db   1Ah   ;End of File Marker
;garbage starts here
    scasw
    sbb   al,ds:[bx]
    dec   ax
    shl   ax,1
    dec   byte ptr ds:[1F8Bh]
    mov   ds:[59CBh],ax
    mov   es,ds:[189Fh]
    xor   bx,bx
    mov   ds:[5936h],bx
    call  x1A5
    pop   ax
    mov   ds:[1F8Bh],al
    ret
    test  word ptr ds:[0DD7h],6
    jnz   x147
    mov   ax,ds:[0DD1h]
    cmp   ax,ds:[5934h]
    jne   x148
    mov   es,ds:[189Fh]
    mov   byte ptr ds:[0B9Ch],1
    call  x14C
x147:  ret
x148:  call  0C3h
    ret
;
x14C  proc  near
    call  77Ch
    pushf
    xchg  dx,ax
    push  dx
    call  x2A4
    push  bx
    mov   ax,ds:[0DC9h]
    dec   ax
    shl   ax,1
    cmp   bx,ax
    jbe   x162
    mov   bx,ax
x162:  cmp   bx,ds:[59C9h]
    jnb   x16C
    mov   bx,ds:[59C9h]
x16C:  mov   ds:[59CBh],bx
    mov   word ptr ds:[59C9h],0
    pop   bx
    pop   ax
    popf
    jc   x17E
    call  x2A4
x17E:  push  ds
    pop   es
    mov   cx,bx
    shr   cx,1
    mov   di,5936h
    cld
    xor   ax,ax
    repne  scasw
    shl   cx,1
    sub   bx,cx
x190:  sub   bx,+2
    jbe   x1A1
    mov   si,ds:[bx+5936h]
    or   si,si
    jz   x190
    call  x1B4
    ret
x1A1:  call  x1A5
    ret
x14C  endp
;
x1A5  proc  near
    xor   bx,bx
    mov   si,ds:[0DD1h]
    mov   byte ptr ds:[0B9Ch],1
    call  x1B4
    ret
x1A5  endp
;
x1B4  proc  near
    cmp   si,ds:[0DD3h]
    jnb   x1C2
    mov   si,ds:[0DD3h]
    mov   ds:[0DD1h],si
x1C2:  cmp   si,ds:[0DD5h]
    jbe   x1CC
    mov   si,ds:[0DD5h]
x1CC:  push  bx
    call  51h
    adc   ax,5B36h
    call  2D4h
    xchg  di,ds:[59C4h]
    mov   ax,ds:[0DC9h]
    cmp   ax,0FFFFh
    jne   x1FE
    xor   ah,ah
    mov   al,ds:[0E1Dh]
    cmp   byte ptr ds:[1882h],0Bh
    je   x1FE
    mov   al,ds:[1893h]
    cmp   byte ptr ds:[0DD0h],1Fh
    jne   x1FA
    dec   al
x1FA:  sub   al,ds:[0E1Dh]
x1FE:  shl   ax,1
    mov   ds:[59D5h],ax
    mov   ax,ds:[188Fh]
    shl   ax,1
    mov   ds:[59D7h],ax
    call  51h
    adc   al,36h   ;"6"
    xchg  si,ds:[bx+5936h]
    mov   di,ds:[59C4h]
    mov   si,189Fh
    call  51h
    or   ds:[bx],bp
    mov   es,ds:[189Fh]
    jc   x259
    mov   ax,ds:[0DC9h]
    inc   ax
    jnz   x24B
    mov   ax,ds:[3621h]
    shr   ax,1
    mov   dx,ds:[1893h]
    cmp   byte ptr ds:[0DD0h],1Fh
    jne   x23D
    dec   dx
x23D:  sub   dx,ds:[0DC5h]
    dec   dx
    cmp   dx,ax
    jnb   x248
    mov   ax,dx
x248:  mov   ds:[0DC9h],ax
x24B:  cmp   byte ptr ds:[0B9Ch],0
    je   x259
    test  byte ptr ds:[0B9Ch],2
    jz   x25A
x259:  ret
x25A:  mov   si,ds:[bx+5936h]
    cmp   si,ds:[0DD5h]
    jb   x275
    push  si
    call  0B0h
    mov   ax,ds:[5936h]
    mov   ds:[0DD1h],ax
    pop   si
    jc   x275
    call  0C3h
    ret
x275:  cmp   si,ds:[0E48h]
    jb   x282
    xor   si,si
    mov   ds:[0E48h],si
    dec   si
x282:  mov   ds:[0E46h],si
    mov   ax,ds:[0DE1h]
    cmp   ax,ds:[0E1Fh]
    jnb   x292
    mov   ax,ds:[0E1Fh]
x292:  mov   ds:[59C7h],ax
    mov   al,ds:[0E23h]
    mov   ds:[59C6h],al
    clc
    ret
    mov   ax,ds:[0E1Fh]
    call  x2AD
    ret
;x1B4  endp
;
x2A4  proc  near
    cmp   ax,ds:[????]
x2A4  endp
    db   '[BIGSNIP]'
    db   0AEh  ;left guillemet
    db   'SV01,A'
    db   0AFh  ;right guillemet
    db   0AEh  ;left guillemet
    db   'EX'
    db   0AFh  ;right guillemet
    db   1Ah   ;End of File Marker
    end   x100