Hacker News new | ask | show | jobs
by nologic01 1055 days ago
This is wrong on two counts: 1) translation is not the same as abstraction and 2) having the world's encyclopedia translated by an advertising company is not exactly everybody's idea for how things should be organized

Of course wrong criticism doesnt mean the project is a success (i think its been going for a few years now). The documentation in particular does not highlight what this infrastructure is good for.

4 comments

Denny Vrandečić — the lead developer of Wikifunctions, former Germany PM of Wikidata, co-developer of Semantic Wikipedia, and former member of the Wikimedia Foundation Board of Trustees — also helped develop Google's Knowledge Graph from 2013 to 2020. None of this is hidden, it's even on his Wikipedia article.[1]

The "having the world's encyclopedia translated by an advertising company" ship sailed years ago. All of these projects are supported, directly and indirectly, by exactly that motivation. The ultimate goal of commercial enterprises is to take zero-cost volunteer projects like Wikipedia and OpenStreetMap and make them cheaper for enterprises to associate user input with compatible monetization. It's now just a bonus side-effect, rather than their mission, that any public good comes from these projects.

1: https://en.wikipedia.org/wiki/Denny_Vrande%C4%8Di%C4%87

"translated by an advertising company" is akin to "Tor was funded by the US government" - it's basically organizational ad hominem.

Google's translations are fine and are high quality and don't yet (or in the foreseeable future) inject ad copy into the translations (like they do on eg Google Maps for POIs).

That's apples and oranges though. Tor is out of control of the US military as this point (+/- your tinfoil hat level), whereas Google Translate was created and is owned solely by Google. I'm not saying GP is fully correct but context is important.

I personally think using transformers for, well, transforming input into another language is going to be a great approach once hardware catches up for local offline use at a reasonable speed and hallucinations are minimized.

Corporate entities come and go. They bait-and-switch at will as they are ultimately only answering to legal obligations and in particular shareholders. It would be odd to overlay such a liability and uncertainty on top of wikipedia.

While abstraction is not the same as translation, if the wikipedia community wants specifically a translation service that is more tightly integrated into the platform imho it should be a fully open source project.

My point is about translating after the fact by the end user solving the problem. Now you can use Google translate for free, later you can use your own LLM. Abstracting the knowledge away is wasted work. We already have it in a definitive source language (english for most things, local languages for local things).

This abstract Wikipedia sounds like Esperanto to me.

> Abstracting the knowledge away is wasted work

Translation solves an immediate problem of giving human users a glimpse of Wikipedias knowledge base, but it is still stricly wrapped in textual data. It is still a content black box that, e.g an LLM would not make more transparent.

Abstraction builds a mathematical representation. Its a new product and it opens up new use cases that have nothing to do with translation. It may on occasion be more factually correct than a translation, or may be used in conjuction with translation, but is potentially a far more flexible and versatile technology.

The challege is really matching ambition and vision with resources and execution. Especially if it is to attract volunteers to crowdsource the enormous task, it needs to have a very clear and attractive onboarding ramp. The somewhat related Wikidata / wikibase projects seem to have a reasonable fan base so there is precedent.

What is the value in Wikpedia abstracting the language in its articles apart from translation?
Similar to abstracting maps and geography into GIS data and getting things like geographic proximity and POI-type filtering with lower overhead than creating a category tree for place articles in Wikipedia.

For instance, Wikipedia right now relies quite a lot entirely on manual tagging (authored categories) for classifying related subjects. If you want a list of all notable association footballers, for instance, then the best way to get one is to go to Category:Association football players. But then you're stuck in a very human, flawed, and often in-flux attempt to reach a consensus definition of that, and the list remains out of reach. (Hell, American players are categorized as "soccer players" under the same tree, confounding things like search, because that's the kind of thing Americans do.)

With abstraction, you get classification for much less, and the consensus problem moves from an arbitrary, authored category tree to a much narrower space. If an article is about a footballer, and the abstract data for that subject contains occupation Q937857 (association football player). The dialect and language don't matter — a footballer is a footballer. If you just want a list of footballers, you can get just a list of footballers without even going near things like SPARQL: https://www.wikidata.org/w/index.php?title=Special:WhatLinks...