Should you really use Streams?

Well most of us know how to use JAVA Streams. If you don’t, lucky you because i’ll share some basic use cases where i found them useful.

Iterating over a primitive array

String[] character = {"captain marvel", "batman", "superman", "wolverine"};

int length = character.length;

//old imperative style
for (int i = 0; i < length; i++){
    System.out.println(character[i]);
}

//new functional style
IntStream.range(0, length)
        .forEach(i -> System.out.println(character[i]));

Iterating over a List 

List<String> countries = new LinkedList<>();

countries.add("India");
countries.add("Germany");
countries.add("Brazil");
countries.add("Spain");

//old imperative style
for(String country : countries) {
    System.out.println(country);
}

//new functional style
countries.stream().forEach(System.out::println);

Working with Streams 

Streams operations can be divided into 2 parts:

  1. Intermediate operation and
  2. Terminal operation.

An intermediate operation is one in which one stream is transformed to another via filtering or mapping etc. Whereas terminal operation is a final operation where the stream is closed after processing. Keep in mind that streams are lazily evaluated i.e they are only processed when we apply some kind of terminal operation on them.

Intermediate Operations
Method  Purpose  Return Value
filter Returns stream containing elements matching a predicate Stream<T>
map Applies an operation against ecah element in the stream, possibly transforming it Stream<U>
flapMap Flattens out a stream Stream<R>
sorted Sorts the elements of a stream Stream<T>
distinct Returns a stream which contains no duplicate eements Stream<T>
limit Restricts the number of element in the returned value Stream<T>
Terminal Operations
Method Purpose Return Value
forEach Performs some operation on each element of the stream and terminates the stream void
reduce Combines stream elements together using a BinaryOperator T
collect puts the stream’s elements into a container such as collection R
min Finds the smallest element as determined by a comparator Optional<T>
max Finds the largest element as determined by a comparator Optional<T>
count Determines the number of elements in a stream long
anyMatch Determines whether at least one element of the stream matches a predicate boolean
allMatch Determines whether every element of the stream matches a predicate boolean
noneMatch Determines whether there are no elements of the stream matches a predicate boolean
findFirst Returns the first element of a stream T
findAny Returns any element of a stream T
iterator Returns a iterator for the stream Iterator
toArray Creates an array based on the elements of the stream A[]

Be aware that once you consume a stream it is no longer available for further processing.  If you need to apply a new operation then you need to open the stream again.

To keep it simple, Streams work on a pipelining model where every element is processed one by one and passed to other pipeline/stream for further processing.

Using a stream instead of collection means:

  • No need of explicit declaration of intermediate variables or storage.
  • Potential for lazy evaluation.
  • More flexibility in design using a more natural style (pipeline).
  • Potential for automatic parallelization.

JAVA 8 is cool.!!

Isn’t this cool, how much of the boilerplate code is reduced!
This comes in with huge impact on the application. One such impact can be seen on the Lines Of Code. We can now write more of business logic in lesser code.
I was once having a discussion (more of a JAVA vs Python debate) with one of the freelance developer from Netherlands who switched from JAVA to Python. He made a point that clearly pointed something bad with JAVA, The point was “The lesser the code you write, the less chances are you would screw up”.
The point is simple, Business will surprise you in ways that you cannot even think of. So to write a maintainable and readable code, it is better to have small compact functions that clearly depict the functionality. Well which sometimes is missing in JAVA. Let’s look at an example here,

/**
 * A simple impertive solution to get a Map of persons
 * with key as his/her mobile number
 *
 */
        private HashMap<String, Person> getPersonsWithKeyAsMobileNumber(HashMap<String, Person> personsWithKeyAsEmail) {
            // avoid returning null if map is empty;
            HashMap<String, Person> personsWithKeyAsMobile = new HashMap<>();
            // always try to use an iterator over a Reference type
            Iterator iterator = personsWithKeyAsEmail.entrySet().iterator();
            while (iterator.hasNext()) {
                Map.Entry pair = iterator.next();
                if (null != pair.getValue().getMobileNumber()) {
                    personsWithKeyAsMobile.put(pair.getValue().getMobileNumber(), pair.getValue());
                }
            }
            return personsWithKeyAsMobile;
        }

/**
 * A simple functional solution to get a Map of persons
 * with key as his/her mobile number
 *
 */
        private HashMap<String, Person> getPersonsWithKeyAsMobileNumber(HashMap<String, Person> personsWithKeyAsEmail) {
            HashMap<String, Person> personsWithKeyAsMobile = new HashMap<>();
            personsWithKeyAsEmail.entrySet().stream()
                    .filter(entry -> null != pair.getValue().getMobileNumber())
                    .forEach(entry -> personsWithKeyAsMobile.put(entry.getValue().getMobileNumber(), entry.getValue()));
            return personsWithKeyAsMobile;
        }

The second function looks more intuitive and is easy to read. You may consider this making no difference but believe me once your business keeps popping up and there are requirement changes the first function will become unreadable unless proper refactoring is done and which is generally not the case.

JAVA 8 is cool.!! it is right…??

For me, that is what Java 8 is providing us at the moment, intuitive and readable code.
Let me now highlight some of the issues with Streams API now.

The Streams API can hit your systems and can slow down your application by 5x. This is really horrific considering how much of optimisations are made to make the backend systems perform more efficiently for lesser response time.
Taking into account the Benchmarking done by Angelika Langer and Nicolai Parlog a simple for loop is 5 times faster than access via the sequential stream api. I have provided a code snippet which you can run yourself, do some modifications and verify yourself.

We should always remember that compilers have 40+ years of experience optimising loops and the virtual machine’s JIT compiler is especially apt to optimize for-loops over arrays. Streams on the other hand are a very recent addition to Java and the JIT compiler does not perform any particularly sophisticated optimizations to them.

Let’s take a deep breath and understand this very clearly that sequential streams DO NOT provide any kind of performance enhancements. They should ideally be used if you like readable code or functional style of programming or simply if your application does not require to be fast enough that you can compromise speed over maintainability and readability.

This is the point where a good developer should think why Java 8 then? Where is the performance enhancements that Streams were invented for?

My simple answer would be Java sequential Streams is a baby step towards the next level of Java as a programming language.
Keeping this aside, the answer is in Parallel Streams!

Parallel streams gives developers the power to write code that can be executed directly on multiple cores without a severe headache of managing all of the complexity required.

import java.util.*;
import java.util.stream.*;

public class Benchmarking {

    public static void main(String[] args) {
        List<String> numbers = new ArrayList<>();
        Random random = new Random();
        StringBuilder sb = new StringBuilder();
        for (int i=0; i<SIZE; i++) {
            for (int j=0; j<10; j++) {
                sb.append(random.nextInt(10));
            }
            numbers.add(sb.toString());
            sb.setLength(0);
        }
        System.out.println(imperitiveFilter(numbers));
        System.out.println(sequentialFunctionalFilter(numbers));
        System.out.println(parallelFunctionalFilter(numbers));
    }

    private static final String OFFER_SEQUENCE = "007";
    private static final int SIZE = 5000000;

    private static long imperitiveFilter(List<String> numbers) {
        List<String> offerNumbers = new ArrayList<>(SIZE);
        long start = System.currentTimeMillis();
        Iterator<String> iterator = numbers.iterator();
        while (iterator.hasNext()) {
            String number = iterator.next();
            if (number.contains(OFFER_SEQUENCE)) {
                offerNumbers.add(number);
            }
        }
        System.out.println("imperitive Count => " + offerNumbers.size());
        return System.currentTimeMillis() - start;
    }

    private static long sequentialFunctionalFilter(List<String> numbers) {
        List<String> offerNumbers = new ArrayList<>(SIZE);
        long start = System.currentTimeMillis();
        numbers.stream()
            .filter(number -> number.contains(OFFER_SEQUENCE))
            .forEach(number -> offerNumbers.add(number));
        System.out.println("seq stream Count => " + offerNumbers.size());
        return System.currentTimeMillis() - start;
    }

    private static long parallelFunctionalFilter(List<String> numbers) {
        List<String> offerNumbers = new ArrayList<>(SIZE);
        long start = System.currentTimeMillis();
        numbers.parallelStream()
            .filter(number -> number.contains(OFFER_SEQUENCE))
            .forEach(number -> offerNumbers.add(number));
        System.out.println("parallel stream Count => " + offerNumbers.size());
        return System.currentTimeMillis() - start;
    }

}

Below is the runtime stats for the above benchmarked code,

RUN 1:
+————–+————–+
| Type | Time |
+————–+————–+
| | |
| Imperitive | 68ms |
| Sequential | 128ms |
| Parallel | 46ms |
| | |
+————–+————–+

RUN 2:
+————–+————–+
| Type | Time |
+————–+————–+
| | |
| Imperitive | 67ms |
| Sequential | 119ms |
| Parallel | 50ms |
| | |
+————–+————–+

RUN 3:
+————–+————–+
| Type | Time |
+————–+————–+
| | |
| Imperitive | 64ms |
| Sequential | 115ms |
| Parallel | 45ms |
| | |
+————–+————–+

Below is the configuration of my system:
Processor : 2.3 GHz Intel Core i5
Memory : 8 GB 2133 MHz LPDDR3

Am gonna use all the cores every time!

Parallel streams should ideally be Nx times faster than sequential streams (N being the number of cores) but in reality there is always an overhead of splitting the problem, creating subtasks, running them in multiple threads, gathering their partial results, and producing the overall result.
Again this might not be the case always, consider parallel streaming over a linked list.

In case of arrays (or ArrayList, as it uses arrays under the hood), it is very easy to slice the DS and do parallel processing but in case of Linked List that is not the case. Parallel streams use the Fork-Join pool to split the problem and join the individual results. Slicing a Linked List itself consumes so much of processing that it in itself becomes a single threaded job to do whole processing (I hope you get it!).
Like this parallel stream usage should also be limited to specific use case.

Back to JAVA 7 ?

I would say no!
You should always use JAVA 8 features if it fits your use case. DO NOT use it because it is new and demanding. Choose wisely as in the end you response time may get increased and if you are high scale, this will hurt.

Below is the response time for List size of 50 elements (Strings):

RUN 1:
+————–+————–+
| Type | Time |
+————–+————–+
| | |
| Imperitive | 1ms |
| Sequential | 61ms |
| Parallel | 4ms |
| | |
+————–+————–+

RUN 2:
+————–+————–+
| Type | Time |
+————–+————–+
| | |
| Imperitive | 0.8ms |
| Sequential | 57ms |
| Parallel | 5ms |
| | |
+————–+————–+

RUN 3:
+————–+————–+
| Type | Time |
+————–+————–+
| | |
| Imperitive | 0.7ms |
| Sequential | 56ms |
| Parallel | 4ms |
| | |
+————–+————–+

Yes, try not to use if List size is not considerable as overhead for streams is much much more and it is better to use simple plain old java for-loop.

 

Leave a Reply