James at Linagora
James is a Mail server from the Apache Software Foundation. It supports many mail protocols such as IMAP4, POP3, SMTP and LMTP. It is highly customisable.
You can for instance, with James, customize the way you handle incoming e-mails ( that is the system of mailet and matcher we will see later in this post ), or, by implementing a distributed mailbox make a scalable implementation.
This post will be the occasion to summarize and explain what I have done for the past two months.
Cassandra mailbox improvement
How does a mailbox works ?
As it name may indicate it, it is the place were your mails are stored. Each time you perform a command that may interact with your stored mails, the mailbox is called.
One very good point in James is that you can implement your own mailbox quite easily as general high level code is already written. You just need to write mappers that will perform low level implementation specific tasks.
At Linagora, we want to use Cassandra as a back-end. Cassandra is a highly scalable extremely performant table oriented NoSQL database inspired from Big Tables concepts. With the Cassandra mailbox we want a mailbox backend that can scale very easily.
What I have done ?
CRUD ( Create Rename Update Delete ) operations were already implemented.
When I arrived on this project, the Cassandra mailbox implementation was working but :
- it appears that AL were not stored in Cassandra and so wasn’t persistent.
- Same thing with group membership.
- There is a lack of a good quota system.
I also find a few bugs :
- Concurency issues upon MODSEQ and UID generation
- Concurency issue upon flag updates
- Set globally flags did not work
- Error in the store mailbox when performing the move call when batch size was positive
In the coming month I will work on :
- Cassandra optimization by a better indexing and a better lock management
ElasticSearch message search index
Concepts
A quite complex command is the Imap Search command. We can not perform any search in our Cassandra mailbox. Hopefully James introduces the concept of messages search index.
This is a component, based on events, that listen to mailbox events ( that occurs when you perform actions ), index the messages related with these events and perform search on these events.
We want to perform distributed search so we naturally choosed ElasticSearch as a backend.
But we know that indexation can sometime be slow, so that is why we choosed to send index messages to ElasticSearch threw a Kafka. On ElasticSearch side, a river will listen and execute commands.
So our mailbox architecture will look like this :
We choosed to also index mailbox events ( and not just message events ) as we may want to use these data in other applications.
Mailets and matchers
Mailet and matcher are called when a mail is received threw SMTP. This is a highly tunnable system that allow you to define in XML complex mail traitements.
If you are interested in the way mailets works you can read this.
Our custom mailets and matcher
As we want to integrate James with other application, we are developping our custom mailets. They can be easily added without recompiling anything.
For instance, we want to give the hability for user who wants to interact with our Enterprise Social Network by mail. If they choosed it, they are notified of new messages by e-mail. They just need to respond to this e-mail to respond to the original message in the enterprise social network.
Other mailets will come.
Events management
We spoke a lot about events. But how do they really works ?
How are events working ?
Events are constructed in a dispatcher called when operations are performed in the mailbox manager. They are then sent to the Mailbox delegating listener to which you can register event aware components : Mailbox listeners. This is this this simple.
Event publishing
We want to send formated events outside James. This will allow to completely integrate mails in the association we wants. Events need to be filtered ( to decrease total size and amount of sent event, and because we can access the information we want in James ). This is what filters do. We need to have the hability to chain filters, to keep them as simple as possible. We also need to enrich informations they contains, and format them in an understandable format : this is what event filters do. Finally we send these formated messages outside James thanks to Publishers.
We can summarize it thanks to this schema :
On what will I work tomorrow ?
In my roadmap, I need first to implement a convincing Quota system for James.
We will also look for implementing distributed Sieve scripts.
We may also be interrested in distributed IMAP subscription ( you will be notified of incoming messages whatever the James server you will be connected on ).
Of course will also need to store rejected e-mails, rewrite tables and stuff like this in a distributed way.
We need a distributed Kafka mailqueue ( for coherence ).
Well, let say it : there is a lot of hours of developpement coming, but this is for awesome features.
Collaborative document edition Climate driven) and landslide initiated glacial advances