6  Hato-fetch

Hato-fetch is a Swiss-Army knife tool for moving mail back and forth between different sources and formats, and optionally running in a daemon mode to periodically poll the same sources (i.e. it can do what fetchmail does - but better).

The basic usage is like the Unix cp(1) command:

$ hato-fetch source1 source2 ... dest

This will copy all of the messages in each of the source folders to the final destination folder. Folders may be actual local files (in any of mbox, mh or maildir format), or any of a number of URI schema, described in detail below.

You can specify only one argument, and it will use the default destination, which is your default Hato MTA filter  /.hato/filter, if that file exists, or otherwise your local mail spool. This default filter can also be specified explicitly with the hato: URI, which you’ll need if you want to fetch multiple sources into the default.

If you specify zero arguments, then it uses the same default destination along with the default sources as specified in  /.hato/fetch/config. This is the usual way to run like fetchmail.

6.1  Hato-fetch Usage

-h –help print a help message and exit
-V –version print version and exit
-c –config=FILE specify config file (default  /.hato/fetch/conf)
–no-config don’t use any config file
-n –no-run trial run, verify servers but don’t fetch
-d –daemon run the command periodically as a daemon
-k –kill kill running daemon
-i –interval=N interval to use in seconds (default 60)
–delete delete fetched messages (i.e. mv instead of cp)
–delete-after=N delete old messages after N days
-f –filter key[=val] fetch only messages where the key (header) matches
-r –remove key[=val] exclude messages where the key (header) matches
–cc=MBOX CC to an extra destination mbox
–allow-relay enable relaying to external mail addresses

The options are fairly straightforward, and any potentially “dangerous” options must be spelled out in full - there are no short forms.

The –no-run option means to verify any POP or IMAP servers, prompting for a password if needed, and also checking mailboxes in the case of IMAP, but not to actually retrieve any messages. It can be useful when you want to verify your configuration.

–daemon repeats the request periodically in the given interval (identical to fetchmail), and –kill can be used to terminate a running daemon.

–delete indicates to expunge any fetched messages from the source, effectively making the command behave like mv(1) rather than cp(1) - or even rm(1) if you use a null output destination.

–delete-after removes old messages after a certain number of days. This can be useful if you want to keep messages on a server, for remote access or to enable fetching from multiple clients, but you want to avoid using up all the server space.

–filter and –remove are analogues of the SRFI-1 procedures of the same names. Only messages passing all filters and not removed by any of the removes will be fetched. In the case of IMAP, some or all of the filtering may be handled server-side - otherwise we first fetch the message and then decide whether or not to keep it. The keywords are testing for the (case-insensitive) values of MIME headers in the message, or simply the existence of the header if no value is specified. Two special keywords are “larger” and “smaller” which instead act on the size of the message in bytes. Other special keywords may be added later.

–cc allows you to specify multiple outputs, since the syntax only allows one output by default. It uses the general output URI syntax and is not limited to email addresses.

–allow-relay is required if you want to specify an external email address as an output destination (or a result from Hato filtering). This is because it seems fetching to an address seems a somewhat uncommon case, and it would best to avoid accidentally spamming 5000 messages from a local mail spool. Local email addresses are always allowed as destinations, however.

6.2  Input Sources

/path/to/file mbox, mh or maildir recognized
file:/path/to/file same as above
pipe:command use piped I/O to/from a command
|command same as above
alias:name a named mbox from the config file
:name same as above
imap[s]://user@host/[mailbox] fetch from an IMAP server
pop[s]://user@host fetch from a POP server
test:[subject] generate a dummy message for testing
null: the bit bucket
- stdin/stdout

Most of the inputs are straightforward. pipe: will run the given command and copy the output to the destination(s). If the output is not a valid message beginning with MIME headers, then it will automatically be encapsulated as a message with a Subject: line of “output of command”.

An alias: source just refers to a named source defined in your config file, as explained below.

The test: source just generates a time-stamped dummy message, which can be very handy for testing your output sources. The null: source doesn’t generate a message at all.

imap: and pop: fetch from the given server with the IMAP and POP3 protocols respectively, optionally over SSL if the trailing “s” is included in the URI scheme. User defaults to the current user name. IMAP will by default fetch from the standard “INBOX” mailbox, but you may override this with a path specification after the host. This may include IMAP mailbox patterns such as “%” to fetch from all top-level mailboxes. Both of these protocols will prompt for a password.

6.3  Output Sources

You can output to any of the input sources except for pop: (which doesn’t allow uploading) and test: (which wouldn’t make sense). In addition, you can relay each fetched message to an email address with any of the following forms:

smtp:user@host relay to an address (on a possibly remote host)
user@host same as above
smtp:[user] forward to local host

Note that to send to a remote host you need to specify the –allow-relay option.

In the case of output filters, the pipe: schema has a default command of procmail(1), so just “pipe:” alone is allowed.

6.4  Hato-fetch Comand-line Examples

6.5  Hato-fetch Config

The configuration for hato-fetch is in  /.hato/fetch/config (unless specified otherwise with the –config option). This file should just hold a single alist. The recognized keywords are:

auto a list of sources to poll by default
interval the polling interval
default-format the format to use for newly created folders, either 'mbox or 'mh or 'maildir
smtp-host the default outgoing smtp-host
debug? log debug information
no-flag-seen? don’t flag IMAP messages as seen after you fetch them
sources an alist of sources and their configurations
destination the default output (as a URI, symbol alias or source-style specification) if the user has no  /.hato/filter file

The URI’s you can use on the command-line are just a convenient shorthand for the sources list in your config file. These source specifications allow more options. The one required keyword for any mail source is the protocol (equivalent to the URI scheme). Other keywords include:

host the host for POP3, IMAP and other network protocols
port the port number if different from the protocol’s default
username the username used for authentication
password the password (will prompt at startup if missing)
mailboxes a list of mailboxes for IMAP servers
delete remove fetched messages from this source like –delete
delete-after remove old messages from this source
interval override the default interval
filter an alist of (header-symbol value-string) filters
remove an alist of (header-symbol value-string) filters

The mailboxes keyword takes a list of mailbox pattern names as strings, and also allows a list of the form (except mailbox ...). For example, if you want to fetch all of you mailboxes from Gmail, you might notice:

$ ./hato-fetch.scm -n imaps://me@gmail.com@imap.gmail.com/\* 
Password for me@gmail.com@imap.gmail.com:  
[2008/01/01 (Fri) 18:14:24] [notice] polling imaps://me@gmail.com@imap.gmail.com ("Chicken" "INBOX" "[Gmail]" "[Gmail]/All Mail" "[Gmail]/Drafts" "[Gmail]/Sent Mail" "[Gmail]/Spam" "[Gmail]/Starred" "[Gmail]/Trash") 

Oops, we certainly don’t want to fetch all those folders, especially the "Spam" that Google so thoughtfully filtered for us, and "All Mail" that will just contain duplicates of all the other folders.

Now, a * in an IMAP basically means the .* regexp, whereas % means to only match at the current level, like [^/]*. So if we try that we see:

$ ./hato-fetch.scm -n imaps://me@gmail.com@imap.gmail.com/%25 
Password for me@gmail.com@imap.gmail.com:  
[2008/01/01 (Fri) 18:14:24] [notice] polling imaps://me@gmail.com@imap.gmail.com ("Chicken" "INBOX" "[Gmail]") 

That’s better. There’s still the pesky "[Gmail]" folder, but that’s set to non-selectable on the Gmail IMAP server, so all it will do is log an error when it tries that and continue processing. However, if we used a config rule of

(mailboxes "%" (except "[Gmail]*"))

then the folder would be filtered out to begin with.

6.6  Sample hato-fetch Config

;; -*- scheme -*-

((auto gmail school)
 (default-format maildir)
 (interval 300) ; 5 minutes
 (sources
  (gmail
   (protocol imaps)
   (host "imap.gmail.com")
   (username "me@gmail.com")
   (password "secret")
   (mailboxes "%" (except "[Gmail]*"))
   )
  (school
   (protocol pop3)
   (host "alma-mater.edu")
   (password "sesame")
   (delete-after 28)     ; expunge old mails from server after 28 days
   (interval 3600)       ; low priority, fetch only once an hour
   )
  (old-work ; not in auto so not fetched by default
   (protocol pop3)
   (host "somewhere.com")
   (username "me"))
  ))