Mailsync -------- 1. Compiling mailsync 2. Configuring mailsync 2.1 Mailbox specification 2.2 Stores 2.3 Channels 3. How does mailsync work? 4. Running mailsync 4.1 Verbosity 4.2 Various messages from mailsync 5. Limitations 5.1 Problems with concurrent mailbox access 5.2 Problems wit "New" messages (Mutt) 5.3 Empty Message-ID headers 6. More about the algorithm 6.1 Message id algorithms 7. History 8. Further reading 9. Support and homepage 10. Author, disclaimer, etc. 1. Compiling mailsync --------------------- Mailsync has been built on Linux, FreeBSD and on SGI-Irix 6.5. With a little coaxing, though, it should compile on any unix-like system on which c-client compiles. On a non-unix-like system, you may have to come up with alternatives to a few functions like stat() and getenv(). 1. Download, unpack, and compile the c-client from ftp://ftp.cac.washington.edu/imap. Currently I'm using imap-2002.RC8, which allows linking from a C++ program. If you already have an older version of c-client installed, you can still link it to Mailsync. Just get c-client.h from either the latest IMAP distribution or from http://mailsync.sourceforge.net. If you're building mailsync from CVS you'll need an automake version later than 1.6. The following should do the deal: 2. ./autogen.sh After that the build procedure is the same as building from a released source package: 2. ./configure && make && make install 2. Configuring mailsync ----------------------- To use mailsync, you first have to make a configuration file, which by default is `$HOME/.mailsync'. The config file specifies two kinds of things: "stores" and "channels". Lines in the mailsync configuration file starting with a `#' are regarded as comments and are being ignored. A "store" describes a mailbox and which parts of that mailbox you want to have synchronized. A channel is a pair of stores which you want to get synchronized along with a file where mailsync can save synchronization info. Additionaly mailsync initializes and reads in c-client configuration to fine tune it's behaveour. You will very likely not need to bother about that. Before you tinker with it make sure to read the imaprc.txt documentation file from c-client. 2.1 Mailbox specification ------------------------- Mailsync uses the c-client library for manipulation of mailboxes. Please have a look at the c-client library documentation for details of the format of a mailbox specification. Especially have a look at docs/naming.txt and docs/drivers.txt from it's documentation. Briefly, a mailbox specification looks like this: {imap.unc.edu/user=culver} refers to INBOX by default {imap.unc.edu/user=culver}INBOX.foo a mailbox on a Cyrus server {imap.unc.edu/user=culver}foo a mailbox on a UW or Netscape server mbox the file $HOME/mbox Mail/foo a mailbox in $HOME/Mail /tmp/foo some other file 2.2 Stores ---------- A store is a collection of mail folders. Exactly what kind of collection you can use depends on your IMAP server, but one thing that always works is a bunch of folders in a single directory. Unfortunately there is no general, concise way to specify even this kind of store. I have settled on a fairly general but redundant method. Here are examples of store specifiers that work with some servers I have access to. store cyrus { server {imap.unc.edu/user=culver} ref {imap.unc.edu} pat INBOX.sync.% prefix INBOX.sync. passwd secret } store netscape-or-uw { server {imap.cs.unc.edu/user=culver} ref {imap.cs.unc.edu} pat sync/% prefix sync/ } store localdirectory { pat Mail/% prefix Mail/ } To test a store specification, put it in your .mailsync and run mailsync Mailsync will list the mailboxes it thinks are in the store. (In this mode, it will not touch anything or even open the mailboxes.) The names it returns should be stripped of any store-specific information. `Pat' describes the pattern of the boxes you want to synchronize. It matches names exactly except for: * the delimiter character which is used to delimit hierarchies (folders) in your mailbox and which depends on the mailstore you use (unix filesystems use `/', Cyrus, UW and Netscape use `.'). * `%' is a globing operator. It will match all the items in a hierarchy (folder) but it will not descend down the hierarchy (into folders). * `*' acts like `%' but descends into the hierarchy. Be careful with `*' as it can take a lot of time to traverse a deep hierarchy. If you omit the `prefix' specification, you will see full mailbox names. Whatever you specify in `prefix' is stripped off of these names to form what I shall call a "boxname". Two mailboxes on different stores will be synchronized if and only if they have the same "boxname". Finally, the full c-client name is formed by . `passwd' is the password to use when accessing the box. If you omit it and the store will require a password then mailsync will ask you for it. If you want to use MH files, or some other format, you should check out the "docs/" directory in the imap distribution, particularly naming.txt, drivers.txt, and formats.txt. 2.3 Channels ------------ A channel just specifies two stores that are to be synchronized, and one c-client mailbox `msinfo' which is used by mailsync to remember what messages it has seen (it stores there sets of message-ids in between runs). Any c-client mailbox specification is fine. If one or both of the stores is a local file, then you might as well use a local file for `msinfo': channel local-cyrus localdirectory cyrus { msinfo .msinfo } The message-id list is kept as a message in the mailbox, with the channel tag ("local-cyrus") as the subject. So you can use the same `msinfo' file for many channels, as long as the channels have different names. If, on the other hand, both stores are on remote imap servers, you may consider putting `msinfo' on one or on the other server: channel uw-cyrus netscape-or-uw cyrus { msinfo {imap.cs.unc.edu/user=culver}msinfo passwd secret } The `msinfo' mailbox stores a bunch of message-ids in a mail message. The message is formed without a message-id, so you can even put the `msinfo' inside the store if you like, and mailsync will not copy or delete the `msinfo' message. (This may change, though. Better to keep it separate from the stores.) As for stores `passwd' can be omitted and in case it's required mailsync will ask for it. In the channel specification you can optionally set message size limit. Messages with size exceeding the limit (in bytes) will be skipped when synchronizing: channel uw-cyrus netscape-or-uw cyrus { msinfo {imap.cs.unc.edu/user=culver}msinfo passwd secret sizelimit 81920 } The example above will not synchronize messages bigger than 80k. 3. How does mailsync work? -------------------------- First, it loads the state of the store at the last sync. If a "lasttime" mailbox doesn't exist, it is assumed empty. This is the right thing to do. Then it iterates through every mailbox on either store. In each mailbox, it applies the following 3-way diff algorithm. If a message exists on both stores, it is left alone. If a message exists on one store but not the other, and it is a "new" message (it's not recorded in `msinfo'), it is copied to the other store. If a message exists on one store but not the other, and it is an "old" message (it was recorded in `msinfo' at last synchronization), it is assumed that the message was deliberately deleted from one store, and is removed from the other. Finally, it saves the set of remaining messages to the `msinfo' file. 4. Running mailsync ------------------- There are three modes. "Sync" mode: mailsync [options] Synchronizes the two stores specified by the channel, doing a 3-way diff with the message-ids stashed in the channel's msinfo box. "List" mode: mailsync [options] Simply list the boxnames specified by the given store. Don't change or write anything. Useful when writing the .mailsync config file. "Diff" mode: mailsync [options] or mailsync [options] Compare the messages in the store to the information in the channel's msinfo file. This mode does not disturb any mail. This is useful if, for instance, is local, 's msinfo is a local file, but 's other store is remote and you're not dialed up. Mailsync will tell you how many new messages and deletions would be propagated away from through . It also reports duplicate messages (without deleting them). The options change from time to time, and are described if you just type "mailsync". The `-D' option deletes empty mailboxes after synchronizing (and works only in "Sync" mode). Mailsync doesn't differentiate between empty and missing mailboxes. Suppose you delete a mailbox on store A but not on store B. Without `-D', mailsync will delete all messages on B, but it will also resurrect the mailbox on A. With `-D', both mailboxes will be deleted. Your choice. 4.1 Verbosity ------------- Mailsync knowns a series of log_levels. If you encounter problem with mailsync you should enable those to let mailsync tell you more about what's going on. Run: mailsync --help to see what logging options mailsync offers. The different log levels are described in the c-client documentation "internal.txt" under mm_log. 4.2 Various messages from mailsync ---------------------------------- When running mailsync with the "-m" you'll can see what messages mailsync is copying forth and back. Below is an example session with an explanation of the actions that mailsync is taking: $ mailsync -vw -m inbox Synchronizing stores "local-inbox" <-> "remote-inbox"... Authorizing against {[192.168.1.2]/imap} *** INBOX *** deleted > Wed Jun 9 tony kwami FROM MR TONY KWAMI. duplicate Wed Jun 9 Some Hero Re: comment 1 duplicate for deletion in INBOX/remote-inbox Authorizing against {[192.168.1.2]/imap} ign. del < Wed Jun 9 Mail Delivery System Mail failure - no recipient a copied <- Wed Jun 9 Debian Bug Tracking Sy Bug#253290 acknowledged by de 1 copied remote-inbox->local-inbox. 1 deleted on remote-inbox. 69 messages remain in INBOX Expunged 2 mail(s) in store remote-inbox Above you can see that: * a spam from some "tony kwami" was deleted on the remote mailbox, since I have deleted it locally * "Some Hero" sent me the same message twice - probably he sent it to me and to some mailing list in the "Cc:". Since I'm using msg-id algorithm, the duplicate is detected and automatically deleted. * copying a message from a mail daemon was not done, since it is marked as deleted in the remote mailbox and mailsync by default doesn't copy deleted mails * a mail from the Debian Bug Tracking System was copyied to the local mailbox. 5. Limitations -------------- Mailsync assumes that the message-id is a global unique identifier. If it isn't, then it won't work. Here are some situations where the message-id assumption doesn't quite hold. 1. Two different messages in the same mailbox with the same message-id. Mailsync will delete the one that comes later in the mailbox. 2. Two different messages in different mailboxes with the same message-id. Mailsync will never notice; everything works. 3. A message with no message-id. Mailsync prints a warning and leaves the message alone. It will never be copied to any other store, nor deleted for any reason. 4. A message that resides on both stores which is then edited on one store. The edit will not be propagated to the other store. A workaround is to edit the message-id when you edit the message. This will be interpreted by mailsync as a delete and a new message, and everything will work. Another limitation is that when a message is moved from one mailbox to another, mailsync interprets this by a deletion from one box and a new message in the other. This simplifies the algorithm a lot, but wastes some network bandwidth if you move large messages around between syncs. Workaround: try to put large messages in their final resting place before you run mailsync. 5.1 Problems with concurrent mailbox access ------------------------------------------- Mailsync is not coded with concurrent access in mind. It shouldn't break in a horrible way, since it tries to double-check before it deletes something, but it is not really fit for the task either. Efforts to make it concurrency capable are certainly appreciated. 5.2 Problems wit "New" messages (Mutt) -------------------------------------- The mail header line "Status: O" is interpreted differently depending on which mail client you're using. Pine and it's mailbox manipulation library c-client assume, that a new message is new as long as you have not read it. Mutt and possibly some other clients interpet only a mail with "Status: " as unseen. If you want mutt to see all "new" messages as "new", then you must use the "-n" (do not delete messages) option of mailsync. This will have the effect of not deleting any messages, not even duplicates on either store you're synchronzing. Explanation: c-client's author says the following: "The definition of "Recent" [that is "Status: " - tpo] is that the message has arrived since the last time the mailbox was opened readwrite." Strangely enough appending new messages to a mailbox can be done without opening a mailbox read-write. However any other operation that change anything in a mailbox such as tagging a message as deleted or expunging old messages needs to be done in read-write mode. As soon as that happens *all* messages will be tagged with "Status: O" and hence loose their "Recent" status. That means that if we want to be able to maintain that status, we are not allowed to tag messages as deleted, to expunge messages or similar. 5.3 Empty Message-ID headers ---------------------------- Some mail software produces messages with no or empty Message-ID headers in particular for draft or unsent messages and in particular Microsoft Outlook. That means that mailsync will not be able to uniquely identify such a message using the Message-ID header. If you need to synchronize such messages you should use the md5 message id algorithm. You can achieve this by passing "-t md5" to mailsync. Mailsync deletes duplicate messages by default - assuming that they're the same. However if it finds multiple messages with empty Message-IDs it will display the message: "Not deleting duplicate message with empty Message-ID - see README" and leave that message alone, supposing that this is just another email produced by weird software. 6. More about the algorithm --------------------------- The sync operation is symmetric---it doesn't matter which store you specify first. This is fundamentally different from the IMAP disconnected-mode mail reading model, where there is a primary store, and a client is responsible for synchronizing its local copy with the primary store. There is no limitation to the number of stores you sync with each other. Think of the stores as nodes of a graph, and put links between the stores you want to sync between. If there are no cycles, then everything will work. If there are cycles, some weird things can happen that I haven't totally worked out yet: I think that a message could be passed back as a new message to a store which deleted it, for example. Since both stores can be local, and in different formats, you can also use mailsync as a method for keeping all of your messages accessible by two MUAs, even if neither is IMAP-aware. So mailsync could help you transition between MUAs. 6.1 Message id algorithms ------------------------- By default mailsync uses the Message-ID header to uniquely identify email messages. In case this is not possible (see 5.2) mailsync can use a md5 hash of a number of mailheaders (defined in msgid.c) to identify a message. 7. History ---------- Mailsync-1 was a bunch of shell scripts. Mail was parsed using awk and transferred using ftp. Mailsync-2 was a Java program that suffered from the Second System Effect. It was never completed. Mailsync-3 was written in C over the c-client library. I used it for over a year without any surprises. Mailsync-4 is a C++ program based on mailsync-3. I was able to remove a lot of cruft and make things a little more sensible, safe, and efficient. Mailsync-4.2 and on has seen extended documentation and some reengineering efforts to make it easier to read, patch and understand. 8. Further reading ------------------ RFC 2076 - Common Internet Message Headers RFC 1176 - IMAP v2 RFC 1203 - IMAP v3 RFC 2060 - Internet Message Access Protocol - Version 4rev1 - being updated: http://www.imc.org/draft-crispin-imapv RFC 2342 - NAMESPACES (discovery of where my "*&/%* mailboxes are) http://www.iana.org/assignments/imap4-capabilities http://www.ics.uci.edu/~mh/book/mh/tocs/jump.htm 9. Support, development, new versions and homepage -------------------------------------------------- Mailsync's homepage is at http://sourceforge.net/projects/mailsync. There's also a CVS repository there in which mailsync development happens. Announcements about new versions are usually posted on http://freshmeat.net. If you need help or have a patch or suggestion to contribute then subscribe to the mailsync mailing list and post it there. Instructions on how to do so can be found on mailsyncs homepage. Tomas Pospisek will also provide commercial support if needed (see below). 10. Author, disclaimer, etc. --------------------------- Mailsync's author is Tim Culver . Since version 4.2 Tomas Pospisek maintains the code. Mailsync is copylefted under the GNU General Public License: http://www.gnu.org/copyleft/gpl.html Linking with openssl is explicitly permitted!