The developer talks about the process of generating the test data here: http://www.danvk.org/wp/2013-04-20/generating-training-data/