Hacker News new | ask | show | jobs
by neoncontrails 1302 days ago
This is _really_ cool. Early in the pandemic I released a local news aggregation tool that aimed to aggregate COVID-related content and score it for relevance using an ensemble of ML classification models, including one that would attempt to infer an article's geographic coordinates. Accuracy peaked at about ~70-80%, which was just not quite high enough for this use case. With a large enough dataset of geotagged documents I'm pretty sure we could've improved that by another 10-15% which would've likely been "good enough" for our purposes. But one of the surprising things I took away from the project was that there's not a well-defined label for this category of classification problems, and as a result there's few datasets or benchmarks to encourage progress.
1 comments

Thanks! COVID is a great example of the use case, and we agree problems like this need more attention - we've shared some data already, and will continue to share more with the public to encourage collaboration on this. Hope you will find something useful there for your future projects:)