| In the cases where your application can benefit from parallelizing simple operations over a large data set stored in a collection, `parallel()` is fine. It's even fine in the case where you're pulling data from a file or other low-latency sequential data source, assuming that the cost of filling a spliterator buffer is less than your cost of processing. But there's a list of gotchas all more dangerous than the "magic make it faster" button of .parallel() imply: - For the sequential data source case, if the cost of filling the spliterator buffers is higher than the cost of processing, you're just wasting a ton of overhead trying to use parallel. - You have to be aware that by default all uses of parallel() run on the same threadpool, which makes it a potential timebomb if someone uses it in the context of, say, a webserver where multiple requests might all individually process streams. This also means blocking operations during stream processing are very dangerous. - Mutating an external variable goes from being fine for a sequential stream to a race condition for a parallel one. - You can't hand out Streams that you intend to be executed sequentially, b/c your callers can just call parallel() whenever they want. And, yes, all of these considerations make the api more complicated than one operating over plain old iterators. |