[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: XY search question



Reply to note from "Yo Intl. YK"  Mon, 17 Dec 2001
12:49:57 +0900

> sometimes I have to search for text in messy files, ... I often
> know the text string is there, but I do not know where it might
> be interrupted by some stupid hard return.

So the question, as I understand it, is this: can you search for
phrase, say "Happy New Year", in one of these messy files. It might
be sloppily typed as "Happy New Year" (with more than one space
between words), or it might straddle two lines, like "Happy
New Year" (with a carriage return after "Happy") or, the end of the
line might be padded out with spaces, like "Happy[space][space][Cr] 
New Year", or it might be written idiosyncratically as "Happy, New-
Year", or ... what have you.

In other words, can you SEarch for a phrase even if the words are
separated by an indeterminate number of separators, not necessarily
spaces? The answer (surprise!) is Yes.

You'll need to formulate your SEarch statement using, instead of
literal spaces, the separator wildcard (which looks like a reverse-
video uppercase S) preceded by an appropriate *numeric* wildcard
(which looks like a reverse-video number). Thus, instead of
commanding

SE "Happy New Year"

you'd do, for example,

SE "Happy[5][S]New[5][S]Year"

If you load XY4.KBD, [S] is the wildcard produced by Alt-Shift-S and
[5] is the wildcard produced by Alt-Shift-5. What this command
says, in plain English, is to find the words "Happy New Year"
separated by anywhere from 1 to 5 separators (including spaces,
punctuation and carriage returns). It finds all of the variants
described above.

The "messier" the text, the higher the number should be. You can go
as high as 999 (three [9] wildcards in a row), like this:

SE "Happy[9][9][9][S]New[9][9][9][S]Year"

If you're dealing with someone who's not only a slob but a
nincompoop -- someone who might write, say, "Ha3pppy  Neuw Yaer"
-- you can broaden the search by using [L] (any letter), [A] (any
alphanumeric character) or [X] (any character) instead of, or in
conjuction with, [S]. The ridiculous locution quoted above is
flagged, for example, by:

SE "h[1][0][X]n[1][0][X]y[3][A]r"

Keep your command as simple as possible. Don't put two numeric
wildcard expressions in a row; separate them by at least one literal
character. Thus, if you want to find "Ha3pppy  Neuw", a command
like SE "h[7][A][5][S]n" will fail; whereas SE "h[7][A] [5][S]n"
(with an intermediate literal space) succeeds.

There's no need for any preliminary "cleanup" of messy files when
you have such powerful search tools at your disposal.

--
Carl Distefano
cld@xxxxxxxx
http://users.datarealm.com/xywwweb/