[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: Automated Clean-up of Ragged Text ?



Reply to note from jr_fox@xxxxxxxx Wed, 01 Nov 2000 18:10:10 -
0800

Jordan:

As some have already mentioned, the Jumbo U2 has any number of tools
that can be used to clean up "ragged" text. There is, however, no
single frame that harnesses these facilities, and that would indeed
be a useful addition. The following CLEANUP frame is a first stab
at such a utility.

First, issue DECODE to convert the code below into working
XPL. MErge the result into your U2 file, and issue LH to
reLOAD U2.

The usage is, simply, CLEANUP. Operation is from Top to
Bottom of File, so you'll need to move your subject text into a
separate window. Text width is resized using the margin settings in
effect at the top of the file. If you do nothing, the routine
adopts your default left and right margins. If you want different
margins, embed the appropriate commands at the top of the file
*before* running CLEANUP.

This is quick & dirty first try; refinements will no doubt be
necessary. My instinct is to keep this simple, since we know
already that it won't be possible to account for the infinite
variety of pretty-printing that could be mistaken for ragged text.
(But if the text were already "pretty", you wouldn't be running the
routine in the first place, no?)

For the record, CLEANUP does the following:
 - Regularizes lone Ascii-10's and -13's into CrLf's
 - Preserves short lines ending with a colon (:)
 - Preserves double spaces after end-of-sentence punctuation
 - Reduces ALL other multiple space chars to single spaces
   (this might produce anomalous results with some text!)
 - Deletes white space (spaces and tabs) at the end of lines
 - Resizes text width to fit current margins
 - Wraps lines with hard carriage returns

Give it a spin and let me know.

XPLeNCODE v2.0
b-gin [UNTITLED]
{{;5cleanup}} Clean up "ragged" Ascii text (TF to BF) [CLD][c
r|lf]{2}{<}IF{<}VA$WS{>}<>1{>}{<}PRNo file{>}{<}EX{>}{<}EI{>}
[BX_]es 1[Q2_][TF_];*;[cr|lf][JM_]2.from132cr[Q2_][JM_]2.from
102cr[Q2_];*;[cr|lf][BX_]ch :[wC] :[254+233+2][wC][wC][Q2_][
BX_]wait[Q2_];*;[cr|lf][BX_]ch "[255+192+174] [wA]"[255+192+
174][254+234+2][wA]"[Q2_][BX_]wait[Q2_];*;[cr|lf]{<}SU01,[TF_
]{<}LBa{>}[BX_]ch " " "[Q2_]{<}IF@not({<}ER{>}){>}[BX_]wait[
Q2_]{<}GLa{>}{<}EI{>}{<}LBb{>}[BX_]ch " [wC]"[wC]"[Q2_]{<}IF@
not({<}ER{>}){>}[BX_]wait[Q2_]{<}GLb{>}{<}EI{>}{<}LBc{>}[BX_]
ch "{tab}[wC]"[wC]"[Q2_]{<}IF@not({<}ER{>}){>}[BX_]wait[Q2_]{
<}GLc{>}{<}EI{>}{>}{<}GT01{>}[JM_]2.HIDE:01[Q2_][JM_]2.dfa[Q2
_][JM_]2.repcr[Q2_][YD_][JM_]2.UNHIDE[Q2_]{<}GT01{>};*;[cr|lf
][BX_]ch :[254+233+2][wC][wC] :[wC][Q2_][BX_]wait[Q2_];*;[cr
|lf][BX_]ch "[255+192+174][254+234+2][wA]"[255+192+174] [wA]
"[Q2_][BX_]wait[Q2_]{<}PRDone{>}{<}EX{>}{2}[cr|lf][cr|lf]
-nd
XPLeNCODE

--
Carl Distefano
cld@xxxxxxxx
http://users.datarealm.com/xywwweb/