It's an additional two lines of code when they add it (and probably one more to pull it in above), and only happens at the end of the actual work, after they've matched C's performance and beat C's memory footprint. Using the parallelism library at the very end hardly invalidates the rest of the exercise.