×
Community Blog Application of Stream Technology in Real-World Scenarios

Application of Stream Technology in Real-World Scenarios

This article provides theoretical explanations and demonstrates how to apply the Stream API to solve common programming problems through practical code examples.

1

By Jiangsheng

In daily development, scenarios like object transformations, linked list deduplication, and batch service calls can lead to lengthy, error-prone, and inefficient code when using for loops or if-else statements. By reviewing project code and with guidance from senior developers, I've learned several new ways to use Streams. Here, I will summarize my findings on Streams.

Case Introduction

In Java, when working with elements in arrays or Collection classes, we typically process them one by one using loops or handle them using Stream.

Suppose we have a requirement: return a list of words longer than five characters from a given sentence, output them in descending order by length, and return no more than three words.

At school, or before learning about Stream, we might write a function like this:

public List<String> sortGetTop3LongWords(@NotNull String sentence) {
    // Split the sentence to obtain specific word information
    String[] words = sentence.split(" ");
    List<String> wordList = new ArrayList<>();
    // Loop to check the length of the words, and filter out those that meet the length requirement
    for (String word : words) {
        if (word.length() > 5) {
            wordList.add(word);
        }
    }
    // Sort the list that meets the conditions by length
    wordList.sort((o1, o2) -> o2.length() - o1.length());
    // Check the length of the results in the list; if it is greater than three, return a sublist containing the first three data
    if (wordList.size() > 3) {
        wordList = wordList.subList(0, 3);
    }
    return wordList;
}

However, using Streams:

public List<String> sortGetTop3LongWordsByStream(@NotNull String sentence) {
    return Arrays.stream(sentence.split(" "))
            .filter(word -> word.length() > 5)
            .sorted((o1, o2) -> o2.length() - o1.length())
            .limit(3)
            .collect(Collectors.toList());
}

It is clear that the code is nearly halved in length. Therefore, understanding and effectively using Stream can make our code much clearer and more readable.

1. Stream Basics

In general, Stream operations can be categorized into three types:

• Create a Stream
• Intermediate Stream Processing
• Terminate a Steam

2

Each Stream pipeline operation consists of several methods. Here is a list of the various API methods:

1.1 Starting Pipeline

The starting pipeline is primarily responsible for creating a new Stream or generating a new Stream from the existing array, List, Set, Map, and other collection-type objects.

API Function Description
stream() Create a new sequential Stream object
parallelStream() Create a Stream object that can be executed in parallel
Stream.of() Create a new sequential Stream object from a given series of elements

1.2 Intermediate Pipeline

The intermediate pipeline is responsible for processing the Stream and returning a new Stream object. Intermediate pipeline operations can be superimposed.

API Function Description
filter() Filter elements based on a condition, and return a new stream.
map() Convert existing elements to another object type, following a one-to-one logic, and return a new stream.
flatMap() Convert existing elements to another object type, following a one-to-many logic, that is an original element may be converted to one or multiple new elements of a different type, and return a new stream.
limit() Retain only the specified number of elements from the beginning of the collection, and return a new stream.
skip() Skip the specified number of elements from the beginning of the collection, and return a new stream.
concat() Merge the data from two streams into one new stream, and return the new stream.
distinct() Remove duplicates from all elements in the Stream, and return a new stream.
sorted() Sort all elements in the stream according to the specified rule, and return a new stream.
peek() Traverse each element in the stream, and return the processed stream.

1.3 Terminal Pipeline

As the name suggests, after performing terminal pipeline operations, the Stream ends. These operations may execute certain logical processing or return specific results obtained from the execution.

API Function Description
count() Return the final number of elements after stream processing.
max() Return the maximum value of the elements after stream processing.
min() Return the minimum value of the elements after stream processing.
findFirst() Terminate stream processing when the first element that meets the condition is found.
findAny() Exit stream processing when any element that meets the condition is found. This is similar to findFirst for sequential streams but is more efficient for parallel streams, as it stops the computation as soon as any matching element is found in any shard.
anyMatch() Return a boolean value similar to isContains(), used to determine if there is an element that meets the condition.
allMatch() Return a boolean value, used to determine if all elements meet the condition.
noneMatch() Return a boolean value, used to determine if no elements meet the condition.
collect() Convert the stream to a type specified by Collectors.
toArray() Convert the stream to an array.
iterator() Convert the stream to an Iterator object.
foreach() Traverse each element in the stream without returning a value, and then execute the specified processing logic.

2. Stream Usage

2.1 map and flatMap

In projects, map and flatMap are frequently seen and used, for example in the following code:

3

What are the similarities and differences between the two?

Both map and flatMap are used to convert existing elements into other elements. The difference lies in:

• map must be one-to-one, meaning each element can only be converted into one new element.

• flatMap can be one-to-many, meaning each element can be converted into one or multiple new elements.

The following two figures illustrate the difference between the two:

map:

4

flatMap:

5

map example

There is a list of string IDs, which now needs to be converted into a list of other objects.

/**
 * Purpose of map: one-to-one conversion
*/
List<String> ids = Arrays.asList("205", "105", "308", "469", "627", "193", "111");
        // Use stream operations
List<NormalOfferModel> results = ids.stream()
        .map(id -> {
            NormalOfferModel model = new NormalOfferModel();
            model.setCate1LevelId(id);
            return model;
        })
        .collect(Collectors.toList());
System.out.println(results);

After execution, it is found that each element is converted into a corresponding new element, but the total number of elements before and after the conversion is consistent:

[
NormalOfferModel{cate1LevelId='205', imgUrl='null', linkUrl='null', ohterurl='null'},
NormalOfferModel{cate1LevelId='105', imgUrl='null', linkUrl='null', ohterurl='null'},
NormalOfferModel{cate1LevelId='308', imgUrl='null', linkUrl='null', ohterurl='null'}, 
NormalOfferModel{cate1LevelId='469', imgUrl='null', linkUrl='null', ohterurl='null'}, 
NormalOfferModel{cate1LevelId='627', imgUrl='null', linkUrl='null', ohterurl='null'},
NormalOfferModel{cate1LevelId='193', imgUrl='null', linkUrl='null', ohterurl='null'}, 
NormalOfferModel{cate1LevelId='111', imgUrl='null', linkUrl='null', ohterurl='null'}
]

flatMap example:

Given a list of sentences, we need to extract each word in the sentences to get a list of all words:

List<String> sentences = Arrays.asList("hello world","Hello Price Info The First Version");
// Use stream operations
List<String> results2 = sentences.stream()
        .flatMap(sentence -> Arrays.stream(sentence.split(" ")))
        .collect(Collectors.toList());
System.out.println(results2);

Output:

[hello, world, Hello, Price, Info, The, First, Version]

It is worth noting that during the flatMap operation, each element is processed and a new Stream is returned first, and then multiple Streams are expanded and merged into a complete new Stream, as shown below:

6

2.2 The peek and foreach Methods

Both peek and foreach can be used to traverse elements and then process them one by one.

However, as previously mentioned, peek is an intermediate operation, while foreach is a terminal operation. This means that peek can only be used as a step in the middle of the pipeline and cannot directly produce a result; it must be followed by another terminal operation to be executed. On the other hand, foreach, being a terminal operation without return values, can execute the relevant operation directly.

public void testPeekAndforeach() {
    List<String> sentences = Arrays.asList("hello world","Jia Gou Wu Dao");
    // Demo point 1: Only peek operation, which will not be executed in the end
    System.out.println("----before peek----");
    sentences.stream().peek(sentence -> System.out.println(sentence));
    System.out.println("----after peek----");
    // Demo point 2: Only foreach operation, which will be executed in the end
    System.out.println("----before foreach----");
    sentences.stream().forEach(sentence -> System.out.println(sentence));
    System.out.println("----after foreach----");
    // Demo point 3: Add a terminal operation after peek, and peek will be executed
    System.out.println("----before peek and count----");
    sentences.stream().peek(sentence -> System.out.println(sentence)).count();
    System.out.println("----after peek and count----");
}

The output shows that when peek is called alone, it is not executed, but when a terminal operation is added after peek, it can be executed. Meanwhile, foreach can be executed directly:

----before peek----
----after peek----
----before foreach----
hello world
Jia Gou Wu Dao
----after foreach----
----before peek and count----
hello world
Jia Gou Wu Dao
----after peek and count----

2.3 filter, sorted, distinct, and limit

These are the commonly used intermediate methods for Stream operations, and their meanings are explained in the table above. When using them, you can select one or more based on your needs, or combine multiple instances of the same method:

public void testGetTargetUsers() {
    List<String> ids = Arrays.asList("205","10","308","49","627","193","111", "193");
    // Use stream operations
    List<OfferModel> results = ids.stream()
            .filter(s -> s.length() > 2)
            .distinct()
            .map(Integer::valueOf)
            .sorted(Comparator.comparingInt(o -> o))
            .limit(3)
            .map(id -> new OfferModel(id))
            .collect(Collectors.toList());
    System.out.println(results);
}

The processing logic of the above code snippet:

1. Use filter to remove data that does not meet the condition.
2. Use distinct to eliminate duplicate elements.
3. Use map to convert strings to integers.
4. Use sorted to sort the numbers in ascending order.
5. Use limit to extract the first three elements.
6. Use map again to convert IDs to OfferModel objects.
7. Use collect to terminate the operation and collect the final processed data into a list.

Output:

[OfferModel{id=111},  OfferModel{id=193},  OfferModel{id=205}]

2.4 Terminal Stream with Simple Results

As described earlier, terminal methods such as count, max, min, findAny, findFirst, anyMatch, allMatch, and noneMatch are considered terminal operations with simple results. The term "simple" refers to the fact that these methods return results in the form of numbers, Boolean values, or Optional object values.

public void testSimpleStopOptions() {
    List<String> ids = Arrays.asList("205", "10", "308", "49", "627", "193", "111", "193");
    // Count the remaining elements after stream operations
    System.out.println(ids.stream().filter(s -> s.length() > 2).count());
    // Check if there is an element with the value of 205
    System.out.println(ids.stream().filter(s -> s.length() > 2).anyMatch("205"::equals));
    // The findFirst operation
    ids.stream().filter(s -> s.length() > 2)
            .findFirst()
            .ifPresent(s -> System.out.println("findFirst:" + s));
}

Output:

6
true
findFirst:205

Once a Stream is terminated, you can no longer read the stream to perform other operations. Otherwise, an error will be reported. See the following example:

public void testHandleStreamAfterClosed() {
    List<String> ids = Arrays.asList("205", "10", "308", "49", "627", "193", "111", "193");
    Stream<String> stream = ids.stream().filter(s -> s.length() > 2);
    // Count the remaining elements after stream operations
    System.out.println(stream.count());
    System.out.println("----- An error will be reported below -----");
    // Check if there is an element with the value of 205
    try {
        System.out.println(stream.anyMatch("205"::equals));
    } catch (Exception e) {
        e.printStackTrace();
        System.out.println(e.toString());
    }
    System.out.println ("----- An error will be reported above -----");
}

Output:

----- An error will be reported below -----
java.lang.IllegalStateException: stream has already been operated upon or closed
----- An error will be reported above -----
java.lang.IllegalStateException: stream has already been operated upon or closed
  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:229)
  at java.util.stream.ReferencePipeline.anyMatch(ReferencePipeline.java:516)
  at Solution_0908.main(Solution_0908.java:55)

Since the stream was already terminated by the count() method, an attempt to call anyMatch on the same stream results in the error "stream has already been operated upon or closed". This is something to be aware of when working with Streams.

2.5 Terminal Methods of Result Collection

Since Stream is primarily used for processing collection data, in addition to the terminal methods with simple results mentioned above, more common scenarios involve obtaining a collection-type result object, such as List, Set, or HashMap.

This is where the collect method comes into play. It can support generating the following types of result data:

  1. A collection type, such as List, Set, or HashMap.
  2. A StringBuilder object, which supports concatenating multiple strings and outputting the concatenated result.
  3. An object that can record counts or calculate totals (for batch data operation statistics).

2.5.1 Generate Collection Objects

Generating collection objects is perhaps the most common use case for collect:

List<NormalOfferModel> normalOfferModelList = Arrays.asList(new NormalOfferModel("11"),
                new NormalOfferModel("22"),
                new NormalOfferModel("33"));

// Collect to a List
List<NormalOfferModel> collectList = normalOfferModelList
        .stream()
        .filter(offer -> offer.getCate1LevelId().equals("11"))
        .collect(Collectors.toList());
System.out.println("collectList:" + collectList);

// Collect to a Set
Set<NormalOfferModel> collectSet = normalOfferModelList
        .stream()
        .filter(offer -> offer.getCate1LevelId().equals("22"))
        .collect(Collectors.toSet());
System.out.println("collectSet:" + collectSet);

// Collect to a HashMap, with key being ID and value being the Dept object
Map<String, NormalOfferModel> collectMap = normalOfferModelList
        .stream()
        .filter(offer -> offer.getCate1LevelId().equals("33"))
        .collect(Collectors.toMap(NormalOfferModel::getCate1LevelId, Function.identity(), (k1, k2) -> k2));
System.out.println("collectMap:" + collectMap);

Output:

collectList:[NormalOfferModel{cate1LevelId='11', imgUrl='null', linkUrl='null', ohterurl='null'}]
collectSet:[NormalOfferModel{cate1LevelId='22', imgUrl='null', linkUrl='null', ohterurl='null'}]
collectMap:{33=NormalOfferModel{cate1LevelId='33', imgUrl='null', linkUrl='null', ohterurl='null'}}

2.5.2 Generate Concatenated Strings

It is a familiar scenario for many to concatenate values from a List or an array into a single string, with each value separated by a comma. If you use a for loop and a StringBuilder to concatenate the values, you also have to consider how to handle the final comma, which can be quite cumbersome:

public void testForJoinStrings() {
    List<String> ids = Arrays.asList("205", "10", "308", "49", "627", "193", "111", "193");
    StringBuilder builder = new StringBuilder();
    for (String id : ids) {
        builder.append(id).append(',');
    }
    // Remove the extra comma at the end
    builder.deleteCharAt(builder.length() - 1);
    System.out.println("Concatenated:" + builder.toString());
}

However, now with Streams, using collect makes this task much simpler:

public void testCollectJoinStrings() {
    List<String> ids = Arrays.asList("205", "10", "308", "49", "627", "193", "111", "193");
    String joinResult = ids.stream().collect(Collectors.joining(","));
    System.out.println("Concatenated:" + joinResult);
}

Both approaches yield the same result, but the Stream approach is more elegant:

Concatenated: 205,10,308,49,627,193,111,193

2.5.3 Batch Mathematical Operations

There is another scenario that might be less common in practice - using collect to generate summary statistics for numerical data. Here is how it can be done:

public void testNumberCalculate() {
    List<Integer> ids = Arrays.asList(10, 20, 30, 40, 50);
    // Calculate the average
    Double average = ids.stream().collect(Collectors.averagingInt(value -> value));
    System.out.println("Average:" + average);
    // Summary Statistics
    IntSummaryStatistics summary = ids.stream().collect(Collectors.summarizingInt(value -> value));
    System.out.println("Summary Statistics:" + summary);
}

In the example above, the collect method is used to perform mathematical operations on the elements of the list. The results are as follows:

Average: 30.0
Summary Statistics: IntSummaryStatistics{count=5, sum=150, min=10, average=30.000000, max=50}

3. Parallel Stream

3.1 Explanation of Parallel Stream Mechanism

A parallel stream can effectively leverage multi-CPU hardware to speed up the execution of logic. The parallel stream works by dividing the entire stream into multiple shards, processing each shard in parallel, and then aggregating the results of each shard into a single overall stream.

7

Looking at the source code of parallelStream, we can see that at the underlying layer, parallel streams split tasks and delegate them to the "global" ForkJoinPool thread pool that comes with JDK 8. In a ForkJoinPool with four threads, there is a task queue. Subtasks derived from a large task are submitted to the task queue of the thread pool. The four threads pick up tasks from the queue for execution. The faster a thread completes its tasks, the more tasks it gets. A thread becomes idle only when there are no more tasks in the queue. This is known as work stealing.

The following figure better illustrates the "split and conquer" strategy:

8

3.2 Constraints and Limitations

1.  In parallelStream(), the foreach() operation must be thread-safe.

Many developers have become accustomed to using stream processing, often applying foreach() directly to for loops. However, this is not always appropriate. If the for loop involves simple operations, there is no need to use stream processing, as the underlying stream encapsulates complex logic that may not offer performance advantages.

2.  In parallelStream(), foreach() does not directly use the default thread pool.

ForkJoinPool customerPool = new ForkJoinPool(n);
customerPool.submit(
      () -> customerList.parallelStream(). Specific operations

3.  Try to avoid time-consuming operations when using parallelStream().

If you encounter operations that are time-consuming or involve a large number of I/O or thread sleep, you need to avoid using parallel streams.

4. Pitfalls During the Use of Streams

4.1 ForkJoinPool is shared by parallelStream and the entire Java process

Using parallelStream().foreach will by default utilize the global ForkJoinPool, which means that multiple parts of your program, including GC operations, share the same thread pool. If the task queue fills up, it can cause blocking, affecting any part of the program that uses the ForkJoinPool.

4.2 ThreadLocal data becomes empty after using parallelStream

When you create a parallel stream with parallelStream, the ForkJoin framework creates multiple threads to execute the stream in parallel. Since ThreadLocal does not inherently support inheritance, the newly created threads cannot access the ThreadLocal data from the parent thread.

4.3 Overlooking duplicate key values when converting to map

When converting a Stream to a map, you need to be aware of potential key duplication.

5. Summary of Stream Processing

5.1 Advantages

  1. The code is more concise. The declarative coding style makes it easier to express the logical intent of the code.
  2. Logics are decoupled from each other. The intermediate processing logic of a stream pays no attention to upstream or downstream content; it simply needs to implement its own logic according to the specified requirements.
  3. Parallel streams can be more efficient than iterators looping one by one.
  4. With functional interface and delayed execution features, intermediate pipeline operations do not execute until a terminal operation is encountered, avoiding unnecessary intermediate operations.

5.2 Disadvantages

Debugging can be challenging.


Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

0 1 0
Share on

Alibaba Cloud Community

1,044 posts | 257 followers

You may also like

Comments