| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aheifets 4163 days ago
	Thank you for the kind wishes! Over the past few years, there's been a huge increase in the amount of data available for this kind of machine learning. We curate our data from a number of private and public sources. For example, as part of my doctoral work (http://en.wikipedia.org/wiki/SCRIPDB), I learned how to parse chemical information out of U.S. Patent data, which is public domain. That said, if you're interested in working on something like this and need a quick million data points, I'd point you to PubChem as a first step: https://pubchem.ncbi.nlm.nih.gov/