Hacker News new | ask | show | jobs
by jobigoud 1414 days ago
From the article: "I landed on the freeDictionary API that uses the Wiktionary as a source.".
1 comments

Why didn't they just download the dumps via https://dumps.wikimedia.org/enwiktionary/ (as explained in https://en.wiktionary.org/wiki/Help:FAQ#Downloading_Wiktiona...)

Scraping, even via an api, is way less efficient imho.

They’re in wikitext, which looks to be considerably less semantic than the crawled data. I’m not sure that’s the reason, but it could be a reason.
I'd say not the reason, since the wiki text is pretty semantic. the wiki source of https://en.wiktionary.org/wiki/subbureau#English is:

  ==English==

  ===Etymology===
  {{prefix|en|sub|bureau}}

  ===Noun===
  {{en-noun|s|subbureaux}}

  # A [[district]]-level public security bureau in [[China]].
so as long as one can parse wikitext, it's split pretty well up!