Java 8 Streams(Parallel), Lambda expressions and Amdahl’s law observations

 

Java has fantastic language features ex: Generics, Annotations and now Lambda expressions. These are complimented very well by several core capabilities packaged within a JDK ex: collections (also newer concurrency package), multi-threading.

“Streams” – a feature that Java made available in 8 in combination with lambda expressions can be extremely expressive, almost akin to functional programming (leverages Functional interfaces). It also makes writing code to leverage muti-processors, for parallel execution, much more simpler.

Purely for illustrative purposes I thought of putting together a scenario to understand these features better.

Below is the description of the scenario:

  • 100,000 products have monthly sales over one year period.
  • Requirement is to sort products by their sales of a specific month.
  • If we were to do this on a single processor – the filter, sort and collect operations will happen sequentially with sort being the slowest step.
  • If we were to do this in a multi-processor system – we have a choice to either do them in parallel or sequentially.

(Caveat all operations do not uniformly benefit with parallel execution ex: Sorting needs specialized techniques to improve performance while filtering should theoretically execute much faster straight away. Concurrency and List implementations esp. ArrayList is again a non-trivial consideration.In addition streams add a different dimension. NOTESorting in parallel, using streams or otherwise, is unstable Stable refers to the expectation that two equal valued entries in a list appear in their original order post sorting as well.)

To simplify things we could break this into 4 smaller test cases

  1. In stream execute filter and collect and then sort the list
  2. In parallel stream execute filter and collect and then sort the list
  3. In stream execute filter, sort and collect the list
  4. In parallel stream execute filter, sort and collect operations.(NOTE: don’t try this as noted above)

Timing 100 runs of each of the above 4 test cases and making observations against Amdahl’s law, which helps predict the theoretical speedup when using multiple processors, felt like a fun illustrative way to understand – streams, parallel streams, sorting, lambda expressions and multi-processor runs.

Here’s the code for it and below are results of one complete execution.

  • For case 2 vs case 1: Average Speedup = 1.2291615907722224 Amdahls percentage = 0.18190218936593688 and Speedup = 1.1000508269148064
  • For case 4 vs case 3:Average Speedup = 2.1987344172707037 Amdahls percentage = 0.8987062837645722 and Speedup = 1.8160459562383011