6 Hato-fetch

Hato-fetch is a Swiss-Army knife tool for moving mail back and forth between different sources and formats, and optionally running in a daemon mode to periodically poll the same sources (i.e. it can do what fetchmail does - but better).

The basic usage is like the Unix cp(1) command:

$ hato-fetch source1 source2 ... dest

This will copy all of the messages in each of the source folders to the final destination folder. Folders may be actual local files (in any of mbox, mh or maildir format), or any of a number of URI schema, described in detail below.

You can specify only one argument, and it will use the default destination, which is your default Hato MTA filter /.hato/filter, if that file exists, or otherwise your local mail spool. This default filter can also be specified explicitly with the hato: URI, which you’ll need if you want to fetch multiple sources into the default.

If you specify zero arguments, then it uses the same default destination along with the default sources as specified in /.hato/fetch/config. This is the usual way to run like fetchmail.

6.1 Hato-fetch Usage

-h	–help	print a help message and exit
-V	–version	print version and exit
-c	–config=FILE	specify config file (default /.hato/fetch/conf)
	–no-config	don’t use any config file
-n	–no-run	trial run, verify servers but don’t fetch
-d	–daemon	run the command periodically as a daemon
-k	–kill	kill running daemon
-i	–interval=N	interval to use in seconds (default 60)
	–delete	delete fetched messages (i.e. mv instead of cp)
	–delete-after=N	delete old messages after N days
-f	–filter key[=val]	fetch only messages where the key (header) matches
-r	–remove key[=val]	exclude messages where the key (header) matches
	–cc=MBOX	CC to an extra destination mbox
	–allow-relay	enable relaying to external mail addresses

The options are fairly straightforward, and any potentially “dangerous” options must be spelled out in full - there are no short forms.

The –no-run option means to verify any POP or IMAP servers, prompting for a password if needed, and also checking mailboxes in the case of IMAP, but not to actually retrieve any messages. It can be useful when you want to verify your configuration.

–daemon repeats the request periodically in the given interval (identical to fetchmail), and –kill can be used to terminate a running daemon.

–delete indicates to expunge any fetched messages from the source, effectively making the command behave like mv(1) rather than cp(1) - or even rm(1) if you use a null output destination.

–delete-after removes old messages after a certain number of days. This can be useful if you want to keep messages on a server, for remote access or to enable fetching from multiple clients, but you want to avoid using up all the server space.

–filter and –remove are analogues of the SRFI-1 procedures of the same names. Only messages passing all filters and not removed by any of the removes will be fetched. In the case of IMAP, some or all of the filtering may be handled server-side - otherwise we first fetch the message and then decide whether or not to keep it. The keywords are testing for the (case-insensitive) values of MIME headers in the message, or simply the existence of the header if no value is specified. Two special keywords are “larger” and “smaller” which instead act on the size of the message in bytes. Other special keywords may be added later.

–cc allows you to specify multiple outputs, since the syntax only allows one output by default. It uses the general output URI syntax and is not limited to email addresses.

–allow-relay is required if you want to specify an external email address as an output destination (or a result from Hato filtering). This is because it seems fetching to an address seems a somewhat uncommon case, and it would best to avoid accidentally spamming 5000 messages from a local mail spool. Local email addresses are always allowed as destinations, however.

6.2 Input Sources

/path/to/file	mbox, mh or maildir recognized
file:/path/to/file	same as above
pipe:command	use piped I/O to/from a command
\|command	same as above
alias:name	a named mbox from the config file
:name	same as above
imap[s]://user@host/[mailbox]	fetch from an IMAP server
pop[s]://user@host	fetch from a POP server
test:[subject]	generate a dummy message for testing
null:	the bit bucket
-	stdin/stdout

Most of the inputs are straightforward. pipe: will run the given command and copy the output to the destination(s). If the output is not a valid message beginning with MIME headers, then it will automatically be encapsulated as a message with a Subject: line of “output of command”.

An alias: source just refers to a named source defined in your config file, as explained below.

The test: source just generates a time-stamped dummy message, which can be very handy for testing your output sources. The null: source doesn’t generate a message at all.

imap: and pop: fetch from the given server with the IMAP and POP3 protocols respectively, optionally over SSL if the trailing “s” is included in the URI scheme. User defaults to the current user name. IMAP will by default fetch from the standard “INBOX” mailbox, but you may override this with a path specification after the host. This may include IMAP mailbox patterns such as “%” to fetch from all top-level mailboxes. Both of these protocols will prompt for a password.

6.3 Output Sources

You can output to any of the input sources except for pop: (which doesn’t allow uploading) and test: (which wouldn’t make sense). In addition, you can relay each fetched message to an email address with any of the following forms:

smtp:user@host	relay to an address (on a possibly remote host)
user@host	same as above
smtp:[user]	forward to local host

Note that to send to a remote host you need to specify the –allow-relay option.

In the case of output filters, the pipe: schema has a default command of procmail(1), so just “pipe:” alone is allowed.

6.4 Hato-fetch Comand-line Examples

Convert from mbox format to maildir:

$ hato-fetch mbox:foo maildir:bar
Move from POP to IMAP:

$ hato-fetch pop3://me@pop.myhost.com imaps://me@imap.myhost.com
Fetch the message with the given message-id over IMAP and write it to standard output:

$ hato-fetch -f 'Message-Id=<blah>' imaps://me@imap.myhost.com -

Encapsulate the host information in a message:

$ ./hato-fetch.scm pipe:uname%20-a - 
Message-Id: <1201854657.76446.g185@chernushka> 
From: foof@chernushka 
To: foof@chernushka 
Subject: output of uname -a 
Date: Fri Feb  1 17:30:57 2008 
Mime-Version: 1.0 
Content-Type: text/plain 
 
Darwin chernushka 9.1.0 Darwin Kernel Version 9.1.0: Wed Oct 31 17:46:22 PDT 2007; root:xnu-1228.0.2~1/RELEASE_I386 i386

Split a mailbox in two, moving everything from a specified mailing list into a separate folder:

$ hato-fetch --delete -f List-Id=chicken-users ~/Mail/inbox ~/Mail/chicken

6.5 Hato-fetch Config

The configuration for hato-fetch is in /.hato/fetch/config (unless specified otherwise with the –config option). This file should just hold a single alist. The recognized keywords are:

`auto`	a list of sources to poll by default
`interval`	the polling interval
`default-format`	the format to use for newly created folders, either `'mbox` or `'mh` or `'maildir`
`smtp-host`	the default outgoing smtp-host
`debug?`	log debug information
`no-flag-seen?`	don’t flag IMAP messages as seen after you fetch them
`sources`	an alist of sources and their configurations
`destination`	the default output (as a URI, symbol alias or source-style specification) if the user has no /.hato/filter file

The URI’s you can use on the command-line are just a convenient shorthand for the sources list in your config file. These source specifications allow more options. The one required keyword for any mail source is the protocol (equivalent to the URI scheme). Other keywords include:

`host`	the host for POP3, IMAP and other network protocols
`port`	the port number if different from the protocol’s default
`username`	the username used for authentication
`password`	the password (will prompt at startup if missing)
`mailboxes`	a list of mailboxes for IMAP servers
`delete`	remove fetched messages from this source like –delete
`delete-after`	remove old messages from this source
`interval`	override the default interval
`filter`	an alist of (header-symbol value-string) filters
`remove`	an alist of (header-symbol value-string) filters

The mailboxes keyword takes a list of mailbox pattern names as strings, and also allows a list of the form (except mailbox ...). For example, if you want to fetch all of you mailboxes from Gmail, you might notice:

$ ./hato-fetch.scm -n imaps://me@gmail.com@imap.gmail.com/\* 
Password for me@gmail.com@imap.gmail.com:  
[2008/01/01 (Fri) 18:14:24] [notice] polling imaps://me@gmail.com@imap.gmail.com ("Chicken" "INBOX" "[Gmail]" "[Gmail]/All Mail" "[Gmail]/Drafts" "[Gmail]/Sent Mail" "[Gmail]/Spam" "[Gmail]/Starred" "[Gmail]/Trash")

Oops, we certainly don’t want to fetch all those folders, especially the "Spam" that Google so thoughtfully filtered for us, and "All Mail" that will just contain duplicates of all the other folders.

Now, a * in an IMAP basically means the .* regexp, whereas % means to only match at the current level, like [^/]*. So if we try that we see:

$ ./hato-fetch.scm -n imaps://me@gmail.com@imap.gmail.com/%25 
Password for me@gmail.com@imap.gmail.com:  
[2008/01/01 (Fri) 18:14:24] [notice] polling imaps://me@gmail.com@imap.gmail.com ("Chicken" "INBOX" "[Gmail]")

That’s better. There’s still the pesky "[Gmail]" folder, but that’s set to non-selectable on the Gmail IMAP server, so all it will do is log an error when it tries that and continue processing. However, if we used a config rule of

(mailboxes "%" (except "[Gmail]*"))

then the folder would be filtered out to begin with.

6.6 Sample hato-fetch Config

;; -*- scheme -*-

((auto gmail school)
 (default-format maildir)
 (interval 300) ; 5 minutes
 (sources
  (gmail
   (protocol imaps)
   (host "imap.gmail.com")
   (username "me@gmail.com")
   (password "secret")
   (mailboxes "%" (except "[Gmail]*"))
   )
  (school
   (protocol pop3)
   (host "alma-mater.edu")
   (password "sesame")
   (delete-after 28)     ; expunge old mails from server after 28 days
   (interval 3600)       ; low priority, fetch only once an hour
   )
  (old-work ; not in auto so not fetched by default
   (protocol pop3)
   (host "somewhere.com")
   (username "me"))
  ))