Apache James with Cassandra
The number of Linagora customers grows up every day. To adapt Linagora’s projects to this state, we need to propose a scalable mail server solution. Apache James is this kind of server and we also have experiences on Apache Cassandra, the NoSQL database of the moment. So we have decided to use James with Cassandra as the mailbox backend. As there was no driver nor plugin for James-server, we have begun the development of our own.
This article describes our approach to develop a James driver for Cassandra with the Datastax Java driver.
Overview
This should provide an overview of the design and implementation of Mailbox Cassandra. This module provides a mailbox implementation for persisting mailboxes (messages, and subscriptions) in a Cassandra cluster.
Cassandra table
Cassandra uses key-space and table to access values. The current design uses the following structure:
The name of default keyspace is defined in CassandraMailboxSessionMapperFactory and the default value is ‘rse’.
The default tables are :
- rse.mailbox, described in CassandraMailboxTable
- rse.message, descibed in CassandraMessageTable
- rse.mailboxCounters, described in CassandraMailboxCountersTable
- rse.subscription, described in CassandraSubscriptionTable
Mailbox UID generation
Mailboxes are identified by using a unique UUID
Message UID generation
The IMAP RFC states that mailboxes should keep message UIDs unique and in ascending order. Mailbox Cassandra uses the incrementation of a counter column provided by com.datastax.driver.core.querybuilder.QueryBuilder in CassandraUidProvider.
Installation
In order for the mailbox implementation to work you have to copy/past the apache-james-mailbox-cassandra-0.6-SNAPSHOT-jar-with-dependencies.jar in the James server lib folder.
Putting cassandra-site.xml on the class path should be enough. We also need to select cassandra in the mailbox.xml.
Only our james-server can accept the cassandra option at the moment.
To download the project, please, refer to our Stash .
Mailbox Cassandra Classes
This is an overview of the most important classes in the implementation.
CassandraMailboxManager
CassandraMailboxManager extends the StoreMailboxManager class. It has a simple implementation which just overrides the doCreateMailbox method to return a SimpleMailbox implementation and createMessageManger method to return CassandraMessageManager implementation. Other than that it also relies on the default StoreMailboxManager implementation.
CassandraMailboxMapper
CassandraMailboxMapper extends StoreaMailboxMapper, a high-level api for writing your own server side mailbox implementation. This class defines methods like login, logout, createMailbox or mailboxExists. We extend this API to override some methods by using the Datastax driver.
CassandraMessageMapper
It’s the end of the request in the mailbox : the save method builds the query for the Cassandra Cluster. Another important method is getLastUid which provides a CassandraUidProvider. The message method is used to build a message stocked in Cassandra.
This class must implement the MessageMapper interface to create/read/delete/query a message.
CassandraSubscriptionMapper
This mapper is used to manage the subscription of a user to its folders. The data is stored in the subscription table, define in CassandraSubscriptionTable. We can add/delete/find a subscription.
CassandraMailboxSessionMapperFactory
It’s the first classe build by Spring, so it is an important class for the Spring injection. The Spring injection was defined in ./src/main/resources/META-INF/spring/mailbox-cassandra.xml
All classes build by Spring are singleton and you can inject your class if necessary. For example, at the moment, we have replaced the configuration by a CassandraSession with a static configuration to test the new methods.
CassandraSession
This class is the connection with Cassandra, the session is defined by the IP of Cassandra server, the port and the default keyspace. I inject this class wherever a connection with the database is required.
To test our development, this class is also responsible of the keyspace. The session drops and builds the default keyspace at every new call (so, when you restart James).
CassandraModSeqProvider
It’s the implementation of the ModSeqProvider, and needs to provide the mod-sequences. There are only two methods, the nextModSeq and the highestModSeq. This class used the mailboxCounters table to stock this informations, described in CassandraMailboxCountersTable.
CassandraUidProvider
This class also uses the mailboxCounters table. There are also two methods, nextUid and lastUid.
The nextUid returns the first free UIDs to create a new message. And the lastUid returns the last UIDs used by a message. It’s important that the returned uid should be higher than the last used.
Current status
At the moment, we have implemented the CRUD operations for the message, the mailbox and the subscription.
Next steps :
The next steps are :
- add unit tests
- support of Hadoop configuration
- a generation UUID based on Zookeeper
- support of Elasticsearch
Brainstorm session : How to properly use Websockets in the ESN ? I18n: Let’s talk about AngularJS and Internationalization
Hi,
I am really interested in this project of yours. Would you be prepared to share approximately how many users you are supporting?
Thanks,
Jon
Hi Jon,
We don’t have load tests now, as we are still in the process of development. We are currently plugging James with Elastic Search, to be able to support the IMAP search feature. However, the long term plan is to use it on our open source products obm.org and open-paas.org , so we’ll need stress tests in a near future. Please do not hesitate to contact us if you’re willing to help on that topic, or any other topic regarding james/cassandra/elasticsearch work !
Regards,
Philippe