Distributed Database Design – Notes

  • Most distributed systems(ex: HBase, VoltDB) require coordination among-st its nodes for several/all of its tasks. Coordination can be either taken up by the system itself or the system can give this responsibility to a separate coordination system ex: Zookeeper.
  • The coordination system itself needs to be fault tolerant, preferably very high throughput, scalable (preferably horizontally scalable) system i.e. it is a distributed system on its own right.
  • Zookeeper Paper (Wait free coordination for large distributed systems) has these key ideas
    • Provide wait-free base features and let clients of zookeeper implement higher order primitives on-top of these simpler primitives ex: read/write locks, simple locks without herd effect, group membership, configuration management.
    • The base features are
      1. sequential/linearizable writes at leader node
      2. All reads are locally executed by the client
      3. FIFO execution of client requests
      4. Optimistic locking i.e. Version based operations
    • The underlying mechanisms for linearizables writes is
      • A protocol for Atomic broadcast — zookeeper defines  this as guaranteed delivery of a message to all nodes (ex: a write message from leader)
      • ZAB(Zookeeper Atomic Broadcast) seems to be essentially a variant of two phase commit protocol. An article explaining it further is here.
  • VoltDB is a in-memory, fault tolerant, ACID compliant, horizontal scale system. Some of its architecture choices are here. The key ideas are
    • Single thread operations
    • No client controlled transactions. instead each stored procedure runs in a transaction.
    • Partitioning and shared-nothing cluster (with replication for fault tolerancve)
    • leveraging/support deterministic operations. Instead of writing at master and propagating changes to all nodes containing replicas, each node which has a replica executes same transaction in parallel.
  • VoltDB is based on this paper which tries to analyze operations involved the OLTP systems and its key observations are
    • buffer management, locking, logging and latching are the costliest operations in OLTP….unless one strips out all of these components, the performance of a main memory-optimized database is unlikely to be much better than a conventional database where most of the data fit into RAM.
    • In a fully stripped down system — e.g., that is single threaded, implements recovery via copying state from other nodes in the network, fits in memory, and uses reduced functionality transactions — the performance is orders of magnitude better than an unmodified system.
  • VoltDB uses  Zookeeper for leader election. More here.
Some more notes on VoltDB: