Know the disruptive technology running into LeanXcale

LeanXcale can offer an incredible set of functionalities thanks to a never seen technology, as a result of 20 years of investigation of some of the most brilliant minds in the database field, led by the former UPM professor Ricardo Jimenez-Peris.

Download the white paper or keep reading to discover all the features of LeanXcale’s next gen database.

 

Scalability

icon_nocircle_Ultra-Scalable Operational Database.png

Traditional ACID databases do not scale out linearly or do not scale out at all. Companies have to develop complex architectures, with many problems, or scale up on expensive hardware.

LeanXcale has developed the patented Iguazu technology to scale out linearly with no bottleneck from a single server to hundreds. It uses a distributed algorithm that processes transactions massively in parallel, maintaining all the ACID properties.

LeanXcale has a shared-nothing architecture that enables it to run either on a commodity cluster or the cloud. It is ready to manage any volume just by adding new nodes with an excellent performance per node. Thanks to its linear scalability behavior, fifty nodes provide fifty times the performance of a single node. No more bottlenecks, nor sub-linear scalability.

With LeanXcale, your architecture is ready for any future growth, breaking the bottlenecks of traditional RDBMS and avoiding using complex architectures based on NoSQL systems trading off essential features such as data coherence and ease of query with SQL.

LeanXcale can be used as an alternative solution when you have a scale-up-only traditional database running on costly hardware (i.e., mainframe). You can offload it partially, as a first step, or substitute it completely.

 

Hybrid Transactional Analytical Processing (OLAP+OLTP)

Traditional operational databases do not support analytical queries, and companies have to resort to a data warehouse and therefore, implement ETLs to copy overnight data from the operational database into the data warehouse.

LeanXcale has a distributed data warehouse engine designed to run analytical queries on operational data and delivering the real-time analytical request. Thanks to this capacity, it avoids ETLs, which can save up to 80% of the average business analytics cost.

This capability enables real-time analytics so that decisions can be made in real-time. Business will not be deaf and blind for hours or days.

icon_nocircle_database_yingYang.png
 
icon_nocircle_Efficient High Availability.png

Contention-Free High Availability

High Availability (active-active replication) is a typical bottleneck for many traditional databases, and in any case, it creates a very high overhead. They rely on a coordination protocol such as two-phase commit or consensus (e.g. Paxos) that is very costly and/or introducing severe bottlenecks.

LeanXcale has developed a new replication algorithm that has a minimal overhead (LeanXcale execute just each write to all replicas) and is bottleneck-free.

High availability is a crucial capacity to store business critical data where reliability is a must.

 

Non-Intrusive Elasticity

Companies have to overprovision for the highest peaks they expect. But this over-provisioning is expensive since you have to pay for it, 24x7.

Additionally, overprovisioning can be short in some cases (i.e., during a Black Friday or any other flash sales) and result in the collapse of the application with the resulting blackout and its consequences with angry customers.

LeanXcale has a novel non-intrusive data migration algorithm that allows moving data from a server to another without disrupting operations, even while it is being updated, and keeping full ACID consistency.

Since a LeanXcale cluster can grow or shrink according to the current needs with zero downtime, it minimizes operational costs (cloud cost/on-premise operations and operational team shifts) by reducing the used hardware resources to actual needs.

icon_Non-Intrusive Elasticity.png
 
icon_Dual SQL and Key-Value Interface.png

Dual Interface

Some use cases demand to ingest data at very high rates that traditional SQL databases cannot bear. Key-value data stores are typically chosen because they can ingest data at very high throughput. However, this comes with very important sacrifices, such as data coherence (ACID properties) and ease of query (SQL).

This approach often creates complex architectures and company's silos: To retrieve the full information of an entity frequently requires the request to several systems, creating artificial complexity: losing transactionality, complexity in joins or losing coherence and synchronicity.

KiVi storage engine is a relational key-value data store. Users can access the data standard JDBC/SQL API, but also through a direct ACID key-value interface. This interface enables to perform data ingestion at very high rates (key-value performance), very efficiently by avoiding SQL processing overhead. The Direct API provides all operations one can do with SQL but joins, that are: insertions, predicate filtering, aggregation, grouping, and sorting.

Since LeanXcale is hybrid, analytical queries may be run over the data inserted through this key-value direct API with no delay.

KiVi data storage is the answer for this high rate insertion demanding operational applications, without creating architecture complexity, since both interfaces provide the same visibility over the same data and ACID properties.

In summary, LeanXcale database thus combines the capabilities of SQL operational databases, data warehouses and key-value data stores on a single database manager.

 

Ultra-Efficient Storage Engine

KiVi is LeanXcale storage engine. It has been designed from scratch with a brand-new storage engine architecture.

It has a radically different architecture designed to minimize overheads from most storage engines. As a result it can even run on a Raspberry Pi.

It avoids expensive context switches, thread synchronization, and NUMA remote memory accesses, taking advantage of more than 20 years of operating systems research.

This new storage engine leverages all the value of LeanXcale, making efficient its massively parallel transactional processing.

icon_nocircle_Ultra-Efficient Storage Engine.png
 
icon_nocircle_Polyglot Queries.png

Polyglot Support

NoSQL vendors have appeared in the last years, with a high level of specificity, to solve particular problems. Around them, a full portfolio of new architectures has been designed, creating silos and making the system more difficult to maintain and also to develop.

To solve this problem, LeanXcale provides:

• Polyglot Queries: LeanXcale performs queries across its SQL and other data stores. In this way, organizations can break their data silos and query across all their databases. LeanXcale supports queries across MongoDB, HBase, Neo4J and any SQL RDBMS. Queries can combine the ease of SQL with the power of the native APIs/query languages of the underneath data stores.

• Integration with Data Lakes: By defining metadata and parsing of data lake (i.e., HDFS) files, they become read-only SQL tables. SQL queries can query and correlate operational data and historical data stores in data lakes.

LeanXcale reduces the total cost of ownership by reducing the time-to-value in development and simplifying the maintenance.

 

Online Aggregations

LeanXcale has another innovation that enables to aggregate data in real-time without any conflict.

Since aggregation computing is made online at the time of insertion, aggregates are already pre-calculated. So, getting the aggregate requires just reading the row from the relevant aggregate table. Aggregation analytical queries are substituted for single-row queries, making LeanXcale unbeatable in these scenarios.

This elegant mechanism allows a fully persisted aggregation, avoiding expensive analytical queries.

icon_nocircle_Online Aggregations.png
 
icon_Efficient for Range Queries & Random Update.png

Multi-Workload

Until today, there has been a duality between SQL databases and key-value data stores:

• SQL databases are more performant for range queries.

• Key-value data stores are more efficient for data ingestion.

This duality results from the underlying data structures used by the SQL engines and key-value engines. SQL databases use B+ Trees, while key-value data stores use LSM Trees (Log-Structured Merge Trees).

B+ trees are perfect for range queries: Logarithmic access to get the first key in the range and sequential access to access the rest of the keys in the range.

LSM trees are excellent to ingest data: They buffer data in memory, and when the buffer is full, it serializes it and writes it to persistent storage as a sorted file.

However, B+ trees are inefficient for ingesting data (random updates/inserts). It stores the data in the leaves and, when the tree does not fit in memory (the most common case), saving each new row requires to perform one or more IOs. Doing this per row results costly and insert data at the speed IO can be performed. LSM trees are also bad at range queries. To find the first key, one has to do as many searches as files for the targeted range are (ten to a few tens of files is quite common for a data region). The computational complexity of the search becomes more than an order of magnitude more expensive.

LeanXcale uses a novel data structure that is as efficient as B+-trees for range queries, and as efficient as LSM-trees for random updates/inserts.

This novel structure provides versatility to LeanXcale, making it a great choice with excellent behavior for any usage.

 

Costless Multiversion Concurrency Control

Modern databases use multi-version concurrency control (MVCC) to avoid conflicts between reads and writes. However, MVCC requires to remove obsolete versions.

Some databases allocate some area on data pages to store older versions, but this approach results in running out space when update rates are high and aborting most transactions. Other databases clean up obsolete versions periodically, but this produces in a stop-the-world process stopping operations while they copy the table with only the last version of each row.

LeanXcale's new MVCC uses a new approach that is almost totally costless and does not create issues with any update rate.

This algorithm means a stable throughput, needs for a lot of scenarios.

costless.png
 
icon_bidim_part.png

Bidimensional partitioning

On the one hand, some application workloads are very intensive in terms of data insertions. They store events or logs with a timestamp.

On the other, database performance depends on memory usage. As soon the memory cannot deal with the workload, IO increments and the throughput goes down.

LeanXcale is optimized to handling time series, either in the insertion and in the query time, by making a smart usage of their cache.

This approach is the optimum for information with timestamps or auto-increment, such as time series, log information, streaming events or IoT streaming data.

 
 

Enterprise-ready

LeanXcale gives you everything you need to go into production, confidently. This section describes some of the main features that LeanXcale provides to be integrated into a standard enterprise environment.

 

BI integration

LeanXcale can integrate with the most popular BI such as QLink, Tableau or Power BI through a standard OData interface.

Finally, any other BI tool supporting JDBC or ODATA connectivity can be integrated.

Hot backup

Continuous backup and consistent snapshots of distributed clusters allow seamless data recovery in the event of system failures or application errors. LeanXcale, even distributed, has a point in time hot backup capabilities.

Hot backup means you can make a backup your database without disrupting operations at any time and when needed restore a fully consistent view of your database at that point in time.

Machine Learning integration

Integrating with your favorite Machine Learning Toolkit: R, Pandas, Tensorflow or Spark is simple through JDBC interface.

Additionally, we provide a low-level integration exposing queries as an Apache ARROW/PLASMA shared an object that Python and Spark can use. It gives partitioned access to the dataset to have parallel machine learning jobs over LeanXcale.

 

Monitoring

Security

Business recovery system

LeanXcale provides an integrated monitoring dashboard based on Prometheus and Grafana out of the box.

Additionally, LeanXcale exposes a series of metrics to third-party systems through JMX and as Prometheus custom exporters.

 

Some critical data have security restrictions to be handled, because of business or legal requirements (i.e., banking, insurance or health).

LeanXcale is ready to manage them, by providing:

• Access control: LeanXcale can provide role-based access control, per user and individual permission level. LeanXcale can integrate authorization with an enterprise level LDAP.

• Communication encryption: You can activate SSL/TLS encryption for any external connection and depending on the deployment and security level of your application you can enable SSL/TLS for connections between internal database components.

• Data storage encryption: Data storage can encrypt information. This encryption may use FPGA or INTEL coprocessor to avoid burning CPU cycles on the encryption.

LeanXcale can be smoothly run in any scenario fulfilling all the security requirements.

We support several strategies:

• On one hand, it is possible to use LeanXcale DB replication standard capabilities. replication standard capabilities.

• But also, there is the option to keep an up-to-date copy using the event loggers. This option has a shallow footprint which makes it an excellent choice.

 
 

Do you have any doubts? Do you want to request enhancement?