Hacker News new | ask | show | jobs
by danielhfrank 5774 days ago
Hey, I wrote the piece so I'm happy to address your question. First off, the training data was completely separate from test data, which was painstakingly gathered by hand. We wanted to make sure the data we were testing against was classified as accurately as possible. For training data, volume is more of a priority. Your other point is very well taken-- we'll probably use some metrics like that ourselves to look for improvements! Let me know if you've got any other questions, I'm happy to clear things up.
1 comments

Cool, thanks! Very interesting stuff. Nice of you to share some of the "secret sauce."