[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: SEarch across files (long & boring for non-programmers)



** Reply to note from Harry Binswanger  Thu, 6 Mar 97
08:48:55 +0000

Harry asks:
-> What is that three-byte sixteenth note doing?

It's a part of a trick I use to perform right-side parsing, i.e., to
grab a given number of characters from the end of a string, when the
contents of the string are unknown. (In the following examples, < >
stand for the guillemets, and + stands for the Escape character,
Ascii-27.)

The XS parser is a southpaw. It will readily hand off the leftmost n
chars of a given string ...

  puts "Ha" into S/G 03

.. but to get the same number of characters from the right side of the
string, you have to know either something about its contents ...

  puts "ry" into S/G 04

.. or its length:

 ;*; (assume we know that the length of S/G 01 is 5)
  also puts "ry" into S/G 04.

Of course, you can always get the length of the subject string, and then
concatenate the number of +X's you'll need to leave the desired number
of right-side characters. This is a cumbersome procedure, however, and,
for very long strings, unacceptably slow.

Enter the trick. It's based on the notion that, in many situations, you
can reasonably suppose that the subject string will *not* contain a
certain character, so that when this foreign character is added to the
end of the subject string to create a NewString, you're sure (for
practical purposes) that it's the *first instance* of foreign character
in NewString. On that assumption, getting any number of rightmost
characters of the original string is a straight-forward matter of
parsing NewString with the appropriate number of wildcard +X's prefixed
to the foreign character, then parsing out the foreign character from
the result. In the previous example, if we were sure that S/G 01 would
contain letters but no numbers, then a number could play the role of the
foreign character. So (still assuming that S/G 01 contains "Harry") ...

 +"9">;*; 03 has "ry9"
 

.. again puts "ry" into S/G 04.

In SSE.PM, the problem was how to take the user's input (argument) and
separate the pathname+filespec from the SEarch statement, bearing in
mind that the filespec might contain any character permitted by DOS and
the SEarch statement might contain just about anything. Now, if the
user's SEarch statement is well-constructed (SSE assumes that it is),
then we know that its final (rightmost) character is the chosen
separator. (There is additional code to detect the use of a space as the
separator.) We also know that parsing the entire argument around the
first instance of the SEarch separator (or, better still, around
"separator+Wseparator" -- I should make that change!) teases out the
filespec. In sum, the key to the solution was to get the rightmost
character of the user's input. That's where the trick comes in.

In this application, the foreign character is played by the 3-byte,
reverse-video Ascii-14, which, as you note, consists of Ascii
255+252+142. I chose it because, though legal, it's highly unlikely to
occur in a filespec, and because, in SEarch statements, it's useless,
either as a separator -- 3-byters crash as separators -- or as a search
term -- 255+252 in the first two bytes causes Xy to interpret it as a
(meaningless) SEarch wildcard, so that neither BX se /255+252+142/Q2 nor
BX se/255+"FF"+252+142/Q2 flags the 3-byter in text. So the trick works,
and the problem was solved.

Are you sorry you asked?


--------------
Carl Distefano * * * CLDistefano@xxxxxxxx
--------------