[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

COUNTING WORD






XY-> Jack Shafer wrote:
 -> >My first computer, a Kaypro that ran CP/M, came bundled with a sweet little
 -> >program called Word Freq, which would build an alphabetical list of all the
 -> >words in a file and count them. Does anyone know of an XPL program that takes
 -> >advantage of XyWrite's (DOS and Win) SORT function to build such a list the


What follows is a sloppy little program that counts words and
lists their frequency. I had already written the last half when
Nathan asked me about the recurring save-gets in the DLG. It took
longer programming a successful situation of having each word on
its own line.

It's a little slow: About 4.5 minutes to process a 30K file on my
DX2-66. Probably my cursor moves. Also, it's sloppy because I
didn't take into account another person's system. Calls to
redlining off, insert mode on, and so forth probably would have
been wise. Still. Anyone on this list would probably be aware of
such things and first test such a program on something simple.

I suppose a more elaborate version would save it as a separate
file to avoid the danger of accidentally overwriting the original
version.

{LB-count-al.pm to count identical numbers}DX {SU91,BC es 0XC BC
DO }BC es 1XC {GL-attribute}

At the beginning the display is turned off, error beep is turned
off, and I set these two features to be turned back on at exit,
which allowed multiple exists without having a lot to retype.


{LB-attribute}TF XP BC ci |{WW }||XC {IF{ER}}{GL-
wordbreak}{EI}{GL-attribute}

Attributes such as bold & italic or margins can confuse issues,
so we drop them.

{LB-wordbreak}TP BC ci |WS |0A|XC {IF{ER}}{PRNo separators?
End}{GT91}{EX}{EI}{GL-clean}

We put each word on a separate line; this also has the one kill
to the program: Not a single separator. Better stop it.

{LB-clean}BC ci |0A0A|0A|XC {IF{ER}}{GL-alpha}{EI}{GL-clean}

Then we go ahead and eliminate all the blank lines. WordPerfect
macros, incidentally, doesn't interpret the Xy repetitive going
back to the same command until all instances are wiped out. Pity.

{LB-alpha}TF DF BF DF BC lcXC BC sortXC XD {GL-var}

The block is defined; lowercase is set for majority comparison;
and then it's sorted.

{LB-var}{SX22,1}{SV33, = }{SV99,ooooops22
}BF {PV99}TF DL {GL-begin}

The variables are the counter, sx22 set to 1 to stand for one
word; the equal sign to separate the count from the word; and
"ooooop22" to show the program when it reaches its end. Come to
think of it: If the last line doesn't have a carriage return or
some other separator before inserting oops, the program may not
end on its own. Oh well.

Then we go to the top and define the first line.

{LB-begin}{SV17}{IF({IS17}=={IS99})}{GL-end}{EI}XD {GL-two}

Define the first half, and test end-state.

{LB-two}DL {SV18}{GL-compare}

Define the second half.

{LB-compare}{IF{IS17}=={IS18}}RD {SX22,{PV22}+1}{GL-
two}{EI}{SX22,{IS33}+{IS22}}{GL-place}

The two "lines" are compared; for a successful match, the count
is increased and an extra word is deleted.

{LB-place}CU CU ER {PV22}{SX22,1}CD CD {GL-begin}

If the match isn't successful, we return to the relative position
of the first half of the match pair, insert the number, reset the
counter, and return to where we left off. We then turn the second
half of the comparison into the first half by returning to the
label begin.

{LB-end}RD TF {GT91}{PRdone}{EX}

The terminal ooops is deleted and we go back to the top of the
file. Lots of numbers should remain. Note I didn't put in any
error checking, such what kind of insert or overstrike mode
you're in. I assumed insert.

--Chet
---
 ? SLMR 2.1a ? Art + write + dtp = chet.gottfried@xxxxxxxx