LeanXcale runs in a Hadoop cluster and reuses two pieces from the Hadoop ecosystem. In the figure it is depicted the architecture of LeanXcale in terms of subsystems. At the bottom, there is the persistence layer for which we use Hadoop file system that is known be able to scale to petabytes and provides data high availability. On top of it, we use HBase. HBase acts as LeanXcale storage engine. On top of HBase we have added our ultra-scalable transactional processing. With this, a transactional HBase with full ACID guarantees is provided. The interface is the same as HBase with three additional methods to begin, commit and rollback a transaction. On top of this, we have added the SQL layer. The SQL layer enables to access the data using SQL. It provides full SQL. The SQL layer has taken the SQL compiler, SQL query plans and SQL optimizer from Apache Derby database. This approach enables to provide a very mature and trusted SQL query engine. LeanXcale provides a JDBC driver to access the database. As aforementioned, the database provides both OLTP and OLAP capabilities.
Principles for Scaling
Traditional transactional databases even modern transactional system do not scale or scale in a limited because at some point of the transactional processing they perform some action in a sequential manner what inherently creates a single node bottleneck or they have a centralized component that performs the transactional processing.
LeanXcale is able to scale out to 100s of nodes thanks to its radical new approach in which transactions are processed and committed fully in parallel without no coordination across them. This is why LeanXcale can scale so much. Consistency is achieved instead of delaying, processing sequentially or in a centralized component by making visible snapshots of the committed data whenever they are consistent, basically whenever there is a longer prefix with no gaps, new committed state is made visible to the applications. It is like instead of executing transactions sequentially or centrally to enforce consistency, it allows to perform transactions their work in advance fully in parallel and whenever a new consistent snapshot of the data is available is made visible.
Interestingly, LeanXcale in order to be able to scale had to avoid using a centralized transactional manager, since obviously it becomes a single node bottleneck (see below figure). In order to achieve it, the ACID properties have been decomposed into its more basic components.
Basically a transactional database has to care about Atomicity, Isolation and Durability. Consistency in the context of the ACID properties means that an application transaction that gets a consistent database should transform it into another consistent state, that is, that the application is correct. That is, the application should be correct (the C of ACID) and the database should provide the other three properties: AID. LeanXcale has decomposed isolation even further into the isolation of reads and isolation of writes, since they require different mechanisms. Then, it has scaled out each of the properties individually, but in a composable way.
An important feature of LeanXcale transactional processing is that it has been conceived to scale to very large rates, but to scale efficiently. There are many technologies that scale a lot, such as map-reduce, but scale inefficiently, that is, what they attain with a number of nodes could be achieved with a much smaller system. LeanXcale is 20 times more efficient in scaling transactions than competing approaches thanks to its inherent asynchronous processing of transactions that enables to perform batching systematically and process efficiently thousands of items or transactions with a single message, extremely reducing the distribution overhead. Additionally, it uses a novel architecture that is NUMA and multi-core aware and it is able to save the overhead of current multithreaded databases.