|
|
|
|
|
by neoncontrails
1302 days ago
|
|
This is _really_ cool. Early in the pandemic I released a local news aggregation tool that aimed to aggregate COVID-related content and score it for relevance using an ensemble of ML classification models, including one that would attempt to infer an article's geographic coordinates. Accuracy peaked at about ~70-80%, which was just not quite high enough for this use case. With a large enough dataset of geotagged documents I'm pretty sure we could've improved that by another 10-15% which would've likely been "good enough" for our purposes. But one of the surprising things I took away from the project was that there's not a well-defined label for this category of classification problems, and as a result there's few datasets or benchmarks to encourage progress. |
|