| [Teaser](http://zafar.cc/images/letters.png) [Personal Blog](http://zafar.cc/not-notmnist-dataset-generation/) [GitHub Link](https://github.com/zafartahirov/not_notMNIST) I wrote a little script that you can use to generate datasets for classification (like MNIST or notMNIST). It takes fonts that you have, and creates images + label/features pickle that you can load into Python. A more detailed explanation here: http://zafar.cc/not-notmnist-dataset-generation/ I would really appreciate any critique, issue requests, and pull requests on GitHub: https://github.com/zafartahirov/not_notMNIST The benefits that I personally see is that if you want to test your classification on datasets that involve Unicode characters, you can. The problem is that you have to have a lot of fonts to be able to generate a decent dataset. If you have a lot of fonts in your language, I would appreciate if you could share the dataset :) I generated some using Hiragana, but I don't have a license for a lot of fonts, so it is more of a demo (check GitHub). I would really love to have a dataset for Chinese, Arabic, Hebrew, Cyrillic, etc. |