Hacker News new | ask | show | jobs
by violet13 751 days ago
It sounds like an impressive list, but many of these are ghost towns and are of little value to machine learning. Wikiversity is a mess... Wikipedia is the crown jewel and probably the only thing of unique commercial value for ML.
2 comments

Disagree, this is too dismissive. Commons, Wikidata, and Wiktionary are all useful. Especially Wiktionary; probably one of the best online dictionaries imo. Often has a lot of unique info that's hard to find even in dictionaries, very good etymologies. All useful in ML.
Wikipedia indeed seems the most valuable for ML, by far. Wikidata, Wikimedia Commons, and Wiktionary also seem useful there.
Wikivoyage is underrated and that was not helped by the acrimonious split with Wikitravel (which was acquired a predatory marketing company), but it finally seems to be pulling ahead.
One of my favorite LLM applications is getting them to write wikidata queries. The data is amazing, but the query language is nothing but pure hell.