[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: Automatic Indexing (fwd)



I sent this message several days ago, but don't believe it made it
through. If it did, I apologize for resending.


---
Dorothy Day			
School of Library and Information Science
Indiana University
day@xxxxxxxx	


---------- Forwarded message ----------
Date: Wed, 16 Oct 1996 16:47:02 -0500 (EST)
From: Dorothy Day 
To: xywrite@xxxxxxxx
Subject: Re: Automatic Indexing



On Wed, 16 Oct 1996, James Enterline wrote:

> A couple months ago I asked for help regarding automatic indexing, and got
> useful replies from David Auerbach and Phil Ferreira. But nobody mentioned
> an easy way to get started with a list of all indexable words. I think an
> initial list of all unique words in the document would be a good start. It
> seems to me such a list could then be pared down leaving only the words one
> wants to subject to indexing (and add sub-headings to). Maybe the answer
> was too obvious for anyone to mention, but I finally realized how and will
> mention it: Do a SPELL command on the entire document while all dictionaries
> have been unloaded. What you get will be an alphabetical list of all unique
> words. It took my 40 MHz 386 only 7 1/2 minutes to do a 90,000 word book
> ms. (ca. 6,000 unique words).
>
> Even better would be to SPELL check with a dictionary containing all common,
> non-interesting words, leaving much less paring down to be done. Anybody
> know of such a list in adaptable electronic form?    Jim
>

Owners of Orbis will have a basic list of words considered too common to
index in your textbase (OMIT.LST), which you could substitute for the
spelling dictionary.

Orbis also has a way of creating a word list for a textbase (Display
keywords), which could be set to consist of just one file. See the
chapter in the Orbis manual on Displaying Vocabulary. You should
make the textbase case sensitive when you create it, or your list will
be entirely uppercase. You specify the sorting parameters.

Choosing "Keyword frequency" will report the number of entries (lines,
paragraphs, delimited sections) in which a given word occurs, so it's
not a word frequency list per se, but it will give you some information.
You could use this to identify words that occur more than a certain
number of times. Or you could simply delete words that seem
insignificant to you, add phrases, and then use that list as a basis
for automatically generating an index.

---
Dorothy Day			
School of Library and Information Science
Indiana University
day@xxxxxxxx