Utilizing the Java 8 Parallel Stream for faster Results in Selenium Automation

3 min readFeb 26, 2023

The Stream was first introduced in Java 8, and the API is used to process collections of objects. Basically, a stream is a sequence of objects that supports various methods which can be pipelined to produce the desired result.

Different Operations On Streams:

Intermediate Operations: map, filter, sorted.
Terminal Operations: collect, forEach, reduce.

But today we are going to read about Java Parallel & Sequential Stream and how we can take benefit from the same.

Java Parallel Streams is a feature of Java 8 and higher, which are introduced to utilize the multiple cores of the processor. In general, any java code has one stream of processing, where it is executed sequentially.

By using parallel streams:

we can divide the code into multiple streams that are executed in parallel on separate cores and the final result is the combination of the individual outcomes.

There are two ways to create parallel streams in Java:

Using parallel() method on a stream
Using parallelStream()on a Collection

Now Let’s understand how we can use this on Automation Testing and achieve faster Results:

Scenarios:

We need to get all the search result anchor links from google.com
https://www.google.com/search?q=java

3. Now, we need to check each URL status code or whether the URL is working or not.

Code:

Here we are using Seleniumto extract the URL from the pages and then returning back the List<String> to the calling method.

public static List<String> getLinkData() {
    List<String> dataList = new ArrayList<>();
    try {
        WebDriverManager.chromedriver().setup();
        ChromeOptions options = new ChromeOptions();
        options.addArguments("--headless");
        
        WebDriver driver = new ChromeDriver(options);
        driver.get("https://www.google.com/search?q=java");
        List<WebElement> el = driver.findElements(By.tagName("a"));
        for (WebElement e : el) {
            String href = e.getAttribute("href");
            if(href!=null)
                dataList.add(href);
        }
        System.out.println("Found url = "+dataList.size());
        return dataList;
    }
    catch (Exception ex){
        return dataList;
    }
}

And using HttpURLConnection Class to get URL status:

Code:

public static int getResponseCode(String address) {
    try {
        URL url = new URL(address);
        HttpURLConnection connection = 
        (HttpURLConnection)  url.openConnection();
        connection.setConnectTimeout(5000); 
        connection.setReadTimeout(5000);
        int responseCode = connection.getResponseCode();
        System.out.println("URL = "+siteUrl);
        System.out.println("Response Code = "+responseCode);        
        connection.disconnect();
        return responseCode;
    } catch (IOException ex) {
        return 0;
    }
}

Code: Now let’s execute the code and see the differences

public static void main(String[] args) {

    //Sequential Execution
    long startTime = System.currentTimeMillis();
    List<String> list1 = getLinkData();
    list1.stream().forEach(url -> getResponseCode(url));
    long endTime = System.currentTimeMillis();


    //Parallel Execution
    long startTimeParallel = System.currentTimeMillis();
    List<String> list2 = getLinkData();
    list2.parallelStream().forEach(url -> getResponseCode(url));
    long endTimeParallel = System.currentTimeMillis();

    printExecutionTime(endTime-startTime);
    printExecutionTime(endTimeParallel-startTimeParallel);


}

public static void printExecutionTime(long milliseconds){
    long minutes = (milliseconds / 1000) / 60;
    long seconds = (milliseconds / 1000) % 60;
    // Print the output
    System.out.println(minutes + " minutes and "
            + seconds + " seconds.");
}

The total URL Found on the Page was 120.

Now just look at the difference below:

Benchmarking:

You can clearly see the difference in the execution time of both. And we can clearly identify that using a parallel stream can speed up the whole process. There is only 120 URL in this case, let’s think about a situation where we have to process or check a large amount of data set.

Things to remember when we are using parallel stream:

As it running parallel the order of execution, cannot be controlled.
The parallel streams use the default ForkJoinPool.commonPool which by default has one less thread as the processors, as returned by Runtime.getRuntime().availableProcessors() (This means that parallel streams leave one processor for the calling thread).