[Date Prev][Date Next][Subject Prev][Subject Next][ Date Index][ Subject Index]

Re: Editing a 2GB+ file



Reply to note from Bill Troop  Sun, 26 Oct 2008
17:21:04 +0000

Bill:

> What you'll find is that the delimiting From has a capital F
> and the others do not. Clever, eh?

You're jesting, surely. Even if the delimiter were the RFC 822
header "From: ", preceded by two CrLfs, there's no guarantee that
"From:" would be the first header in the message source -- unless
Eudora rearranges the headers so that "From:" always comes first
(which strikes me as a bad idea, forensically and otherwise). (In
fact, the first header is usually "Return-Path:". RFC 822 says: "It
is recommended that, if present, headers be sent in the order
'Return-Path', 'Received', 'Date', 'From', 'Subject', 'Sender',
'To', 'cc', etc.") The message delimiter has got to be more specific
than simply the word "From". It's got to be a uniquely identifiable
as a delimiter.

OK... I found the answer in David Wood's "Programming Internet
Email" (http://tinyurl.com/Mbox-format). According to Wood, mbox
format has its origins in Unix and is used (with variations) by many
mail programs, including Eudora and pine. The message delimiter, the
so-called "From_ line", is distinct from the "From:" header. In
standard Mbox format, the From_ line is preceded by a blank line and
consists of the word "From" (no colon), followed by an email
address, followed by a formatted date and time, like this:

[newline]
>From you@xxxxxxxx Sun Oct 26 17:21:04 2008

The RFC 822 headers follow immediately, with no intervening blank
line. If the message body contains a From_ line (or quotes one), an
Mbox-compliant server is expected to "escape" it, like this:

>From you@xxxxxxxx Sun Oct 26 17:21:04 2008
>>From you@xxxxxxxx Sun Oct 26 17:21:04 2008
>>>From you@xxxxxxxx Sun Oct 26 17:21:04 2008
etc.

The user's e-mail program should remove the first ">" from these
lines when it displays the message, so that it reads as the sender
intended.

According to Wood, Eudora uses the string ???@??? instead of the
sender's e-mail address in the From_ line.

So an mbox a/k/a .MBX file is indeed plain text. That's good. Open
source. As God intended e-mail to be.

--
Carl Distefano
cld@xxxxxxxx