Java 8 Streams(Parallel), Lambda expressions and Amdahl’s law observations


Java has fantastic language features ex: Generics, Annotations and now Lambda expressions. These are complimented very well by several core capabilities packaged within a JDK ex: collections (also newer concurrency package), multi-threading.

“Streams” – a feature that Java made available in 8 in combination with lambda expressions can be extremely expressive, almost akin to functional programming (leverages Functional interfaces). It also makes writing code to leverage muti-processors, for parallel execution, much more simpler.

Purely for illustrative purposes I thought of putting together a scenario to understand these features better.

Below is the description of the scenario:

  • 100,000 products have monthly sales over one year period.
  • Requirement is to sort products by their sales of a specific month.
  • If we were to do this on a single processor – the filter, sort and collect operations will happen sequentially with sort being the slowest step.
  • If we were to do this in a multi-processor system – we have a choice to either do them in parallel or sequentially.

(Caveat all operations do not uniformly benefit with parallel execution ex: Sorting needs specialized techniques to improve performance while filtering should theoretically execute much faster straight away. Concurrency and List implementations esp. ArrayList is again a non-trivial consideration.In addition streams add a different dimension. NOTESorting in parallel, using streams or otherwise, is unstable Stable refers to the expectation that two equal valued entries in a list appear in their original order post sorting as well.)

To simplify things we could break this into 4 smaller test cases

  1. In stream execute filter and collect and then sort the list
  2. In parallel stream execute filter and collect and then sort the list
  3. In stream execute filter, sort and collect the list
  4. In parallel stream execute filter, sort and collect operations.(NOTE: don’t try this as noted above)

Timing 100 runs of each of the above 4 test cases and making observations against Amdahl’s law, which helps predict the theoretical speedup when using multiple processors, felt like a fun illustrative way to understand – streams, parallel streams, sorting, lambda expressions and multi-processor runs.

Here’s the code for it and below are results of one complete execution.

  • For case 2 vs case 1: Average Speedup = 1.2291615907722224 Amdahls percentage = 0.18190218936593688 and Speedup = 1.1000508269148064
  • For case 4 vs case 3:Average Speedup = 2.1987344172707037 Amdahls percentage = 0.8987062837645722 and Speedup = 1.8160459562383011

Distributed Database Design – Notes

  • Most distributed systems(ex: HBase, VoltDB) require coordination among-st its nodes for several/all of its tasks. Coordination can be either taken up by the system itself or the system can give this responsibility to a separate coordination system ex: Zookeeper.
  • The coordination system itself needs to be fault tolerant, preferably very high throughput, scalable (preferably horizontally scalable) system i.e. it is a distributed system on its own right.
  • Zookeeper Paper (Wait free coordination for large distributed systems) has these key ideas
    • Provide wait-free base features and let clients of zookeeper implement higher order primitives on-top of these simpler primitives ex: read/write locks, simple locks without herd effect, group membership, configuration management.
    • The base features are
      1. sequential/linearizable writes at leader node
      2. All reads are locally executed by the client
      3. FIFO execution of client requests
      4. Optimistic locking i.e. Version based operations
    • The underlying mechanisms for linearizables writes is
      • A protocol for Atomic broadcast — zookeeper defines  this as guaranteed delivery of a message to all nodes (ex: a write message from leader)
      • ZAB(Zookeeper Atomic Broadcast) seems to be essentially a variant of two phase commit protocol. An article explaining it further is here.
  • VoltDB is a in-memory, fault tolerant, ACID compliant, horizontal scale system. Some of its architecture choices are here. The key ideas are
    • Single thread operations
    • No client controlled transactions. instead each stored procedure runs in a transaction.
    • Partitioning and shared-nothing cluster (with replication for fault tolerancve)
    • leveraging/support deterministic operations. Instead of writing at master and propagating changes to all nodes containing replicas, each node which has a replica executes same transaction in parallel.
  • VoltDB is based on this paper which tries to analyze operations involved the OLTP systems and its key observations are
    • buffer management, locking, logging and latching are the costliest operations in OLTP….unless one strips out all of these components, the performance of a main memory-optimized database is unlikely to be much better than a conventional database where most of the data fit into RAM.
    • In a fully stripped down system — e.g., that is single threaded, implements recovery via copying state from other nodes in the network, fits in memory, and uses reduced functionality transactions — the performance is orders of magnitude better than an unmodified system.
  • VoltDB uses  Zookeeper for leader election. More here.
Some more notes on VoltDB:

Graph processing in query languages – Part III using Hive and HBase


  • HBase is great at large number of (on-demand) columns, random writes and reads and maintaining history
  • Against each node(row key), treat every neighbor as a column with a fixed (can be extended for weighted graphs) value of 1. Include self loop.
  • Pick node and least neighbor and insert into another hbase table with just node and component-id(least neighbor)
  • Use joins (based on GIMV) to refine component-ids. Iterate till convergence is achieved. (In several ways join is also similar to a BSP approach in Pregel/Giraph)
  • Use hive as the language for above steps

Solution is here


  • Illustrative log on a small graph is here. Its summary is here
  • Benchmark has not been done


  • Incremental processing of graphs is feasible
  • Should theoretically scale for web scale graphs – sparse or dense.
  • Though based on GIMV – Matrix Vector multiplication isnt needed. GIMV is a very useful base to model solutions like above for other graph problems that can be solved by it.



Gimv hbase