Higher throughput and lower latency than a traditional RDBMS
Unlike traditional disk-based database architectures, SQLFire manages its data structures and indexes in-memory, with optimization for distributed data access rather than for disk I/O. SQLFire throughput can be measured in terms of the number of operations on a single node, or the number of aggregated operations across an entire cluster. And even when persisting data to disk, SQLFire does not seek to a disk location or flush data all the way to disk. The SQLFire architecture introduces no single points of contention, either with disk seeks, CPU cycles, or network access.
In comparison, relational databases typically seek and flush every transaction to disk, limiting the maximum throughput to approximately 200 writes per second on commodity hard disks. Even solid state drives cannot improve throughput significantly, because of the lock contention that is inherent in traditional RDBMS architectures.
All SQLFire data can be managed in memory with one or more copies of data spread across the cluster. Tables can be replicated in memory across several SQLFire instances, or can be partitioned horizontally across instances, with redundancy. SQLFire replicates data in memory using a distribution protocol that requires minimum overhead. Clustered applications can access data concurrently on multiple nodes when the data is infrequently updated. For large datasets, or datasets where frequent updates are required, you can partition the table across many server instances and configure redundant copies for high read and write throughput. When the cluster capacity (CPU or memory) exceeds a certain threshold, you simply add more servers. The additional servers can automatically rebalance the data and behavior so that the entire application scales in linear fashion. In applications where data access is balanced across the entire data set, doubling the cluster capacity usually doubles the aggregate throughput. A linear increase in throughput is limited only by the backbone network capacity—typically the network switch rather than the a node's full duplex, Gigabit ethernet capacity. SQLFire uses highly optimized data structures designed to minimize context switches between threads and processes. Contention points are minimized by managing data in concurrent structures. Communication to peer members is synchronous, which minimizes the latency for data distribution.
Parallel, data-aware stored procedures
As with common relational databases, SQLFire enables applications to execute stored procedures in the database to move data-intensive logic close to where that data resides. SQLFire extends this functionality by making application logic, as well as data, highly available. Procedure execution is routed transparently to the appropriate data nodes, which avoids moving data across the network and increases application throughput.
Parallel execution is implemented in the following ways:
- SQLFire enables applications to "hint" at the location of the data upon which a procedure is dependent. When hints are provided, SQLFire executes the operation on the server that owns the data.
- When data is partitioned across many nodes, SQLFire enables an application to execute stored procedures on specific data nodes or in parallel on all the nodes of a server group. An application can also target specific nodes based on the data requirements for the procedure.
When stored procedures are invoked, the application hint can be a single key, a set of keys, or a set of filters ("where" clauses). If the data set is partitioned across several servers, the procedure invocation is parallelized to the servers that host the hinted data. Each parallel execution can stream its results back to a coordinating server which, in turn, aggregates the results using an application-supplied callback. This process is similar to the "map-reduce" pattern where data-intensive operations execute in parallel and a reduction phase computes a single result. The application that invokes the stored procedure is abstracted from the details of parallel execution and reduction. Any failures in procedure execution (such as server crash) automatically result in idempotent re-execution of the procedure. SQLFire servers execute in a Java virtual machine and all stored procedures must be written in Java. Stored procedures are executed in-process to where the data resides, providing direct access to data in the process heap. SQLFire can also be used in an embedded data server within popular containers such as Spring or JEE. When used within a Spring container, for example, aspects can be developed such that bean method invocation is routed to the node that manages the data set required by the bean.
Dynamic, linear scalability
SQLFire achieves scalability by partitioning data dynamically across many member nodes, distributing the load uniformly across servers, and supporting "elastic" data clusters. Servers can go online and offline as necessary to serve a growing or shrinking demand pattern. When a new server announces its availability, SQLFire initiates a non-blocking algorithm to determine the subset of data that should be moved to the new server. This is achieved without introducing contention points to current application clients. Data is first copied and all updates on the data being copied are applied simultaneously to the new member to ensure data coherency. By using SQLFire's built-in instrumentation that captures throughput, latency metrics, and CPU/network/memory utilization, you can program applications to detect changing performance patterns and proactively add extra resources and rebalance the data. This enables you to provide predictable data access throughput and latency without the having to over-provision the cluster.
Shared-nothing disk persistence
You can manage replicated and partitioned tables completely in memory, or both in memory and on disk. In SQLFire, each member node can persist data in disk files independently of other members. Failures in disks or cache failures in one node do not affect other instances being able to safely operate on their disk files. This "shared nothing" persistence architecture allows applications to be configured such that tables can be persisted on different nodes across the cluster, reducing the impact of disk latencies.
SQLFire data management on disk is fundamentally different from disk-oriented databases where each update results in at least two writes to disk: one write to a "redo" or "undo" log file and one write to the data files themselves. SQLFire does not manage data and transaction logs in separate files. Instead, all data updates are appended to files that are similar to the transaction logs of traditional databases.
Table update operations are appended to rolling log files only, with an algorithm that does not require a flush all the way to disk and no need to seek to a disk location. Query operations primarily use memory indexes and in most cases do not require disk access. Writes are buffered by the IO subsystem in the kernel, allowing the IO scheduler to merge and sort disk writes to achieve the highest possible disk throughput. You mitigate risk from catastrophic machine failure by making sure at least two machines are synchronously writing to the disk buffers. The result is throughput that comes close to the maximum disk transfer rate on each node in the cluster.
Caching options for existing data
You can use SQLFire as a data store or as a middle-tier cache for an existing database system. SQLFire provides a caching framework that has the following capabilities:
- Read-through caching—SQLFire automatically loads data from an RDBMS when requested data is missing from the cache.
- Write-through caching—SQLFire synchronously applies a change to the RDBMS before changing that data in the cache.
- Write-behind caching—When data in the cache changes, SQLFire asynchronously queues those changes and applies them to the RDBMS in batches. Queuing and batching updates allows applications to operate at very high speeds, unconstrained by database contention or availability. When you configure SQLFire as a cache, tables can be configured to evict the least frequently used rows when the entire data set cannot fit into the available memory. Multiple eviction strategies are supported, including expiry of data after a period of time.
Continuous availability of data within and across data centers
SQLFire enables you to replicate or partition any table while maintaining one or more redundant copies. To provide this level of availability without slowing down applications, SQLFire quickly detects and processes machine failures, unresponsiveness, and even network partitions. SQLFire uses multiple TCP and UDP-based protocols to detect failure conditions within a predictable time period. It uses a coordinator to maintain a consistent view of the distributed system. When a server fails or leaves the cluster unexpectedly, other servers each recognize the change and maintain coherency across the system.
SQLFire attempts to maintain the configured level of redundancy for tables automatically even when clustered servers fail. If a failed server does not reappear within a configurable time period, the lost copy of data is restored among the other cluster members if sufficient capacity is available. SQLFire uses three main features to enable continuous availability:
- In-memory replication. The number of table replicas can be increased or decreased dynamically. A table can be replicated across all servers or within a configured subset of the servers. Similarly, multiple copies of data can be maintained when data is partitioned. All data changes are propagated synchronously to replicas using an ACK-based protocol. The protocol ensures that data is transmitted safely to the replica and to the local data store.
- Optional storage to disk. SQLFire enables applications to synchronously or asynchronously persist data to disk on one or more nodes using a shared-nothing disk architecture. This ensures very high levels of data availability. Asynchronous (store-forward) events are always managed in at least two member nodes so that if one server fails, the redundant node takes over. Clients connect to logical servers and the client automatically fails over to alternate servers in a group during failures or when servers become unresponsive.
- Asynchronous replication to remote clusters for disaster recovery. All data changes can be replicated asynchronously and in batch to one or more remote clusters. Clients can programmatically determine when servers in the primary cluster are unavailable, and then fail over to another available cluster.
Unlike traditional solutions for disaster recovery where the backup site is in standby mode, SQLFire supports active-active clustering over WAN boundaries. This means that application clients can read and write to either cluster, and can also fail over to either cluster. Given the asynchronous nature of replication, applications in such an environment should be designed handle the potential for some inconsistency during the period of failover.