Can you share any more details on how the image diffing is done? BBC has a github repo called Wraith that uses ImageMagick and PhantomJS to accomplish a similar task (without the awesomeness of on-demand testing environments). Always curious to learn more about how people are solving the GUI testing problem on the web.