Hacker News new | ask | show | jobs
by guy98238710 1055 days ago
Wikifunctions is primarily intended to support Wikimedia projects, especially Abstract Wikipedia. It is the code complement to Wikidata lexemes. It might be used for cross-wiki templates to reduce existing duplication and other auxiliary tasks, but Abstract Wikipedia is the reason it was proposed.
2 comments

Abstract Wikipedia is in my opinion fully wasted work. Translation is free and instant for web pages. I've lived for 6 years in different countries where I don't speak the local language (and am also not native English speaking) and you can get all the information you need by translating. This works totally fine already today with Google translate on top of pages.

And the pages that are in fact missing from "the other language wikis" are local myths, local detailed history, things that wouldn't even be in the English Wikipedia or in the "abstract" version in the first place.

> Translation is free and instant for web pages.

And also very often quite incorrect, and you don't know where.

I think the general idea of a "universal language" Wikipedia, that gets flawlessly rendered into local languages, is laudable.

But I don't think anybody would ever edit in it directly -- what I want to see is that when somebody edits Wikipedia to add a new sentence, it attempts to translate into the "universal language" and prompt you to select from ambiguities.

E.g. if you wrote:

  I saw someone on the hill with a telescope.
It would ask you to confirm which of the following was intended:

  [ ] "with a telescope" modifies "I saw"
  [ ] "with a telescope" modifies "someone on the hill"
And it would also ask to clarify meanings, e.g.:

  [ ] "saw" - spotted visually
  [ ] "saw" - dated romantically
It would be a real dream to have translated outputs that were guaranteed to be correct, because the intermediate representation was correct, because the translation from someone's native language to that intermediate translation was verified in this way.
I would still invest those resources into documenting more knowledge that currently doesn't exist online on their original languages and immediately translating to English. For better or for worse English is the "abstract" representation of language online and there's so much absent stuff that worrying about another universal format seems pointless.
It's not either/or. Different groups of people can do different things at once. And of the two things you're comparing, one is expert technical/engineering and the other requires expert archivists/translators. They're totally different groups.
exactly that! I have a design mockup that is quite similar to that.
This works totally fine already today with Google translate on top of pages.

How would anyone even know? By definition, if someone is using Google Translate, he already doesn't know the language, so how can he judge the quality of the results?

My company spends millions on professional translators because products like Google Translate are so bad for anything beyond the most basic uses.

This is wrong on two counts: 1) translation is not the same as abstraction and 2) having the world's encyclopedia translated by an advertising company is not exactly everybody's idea for how things should be organized

Of course wrong criticism doesnt mean the project is a success (i think its been going for a few years now). The documentation in particular does not highlight what this infrastructure is good for.

Denny Vrandečić — the lead developer of Wikifunctions, former Germany PM of Wikidata, co-developer of Semantic Wikipedia, and former member of the Wikimedia Foundation Board of Trustees — also helped develop Google's Knowledge Graph from 2013 to 2020. None of this is hidden, it's even on his Wikipedia article.[1]

The "having the world's encyclopedia translated by an advertising company" ship sailed years ago. All of these projects are supported, directly and indirectly, by exactly that motivation. The ultimate goal of commercial enterprises is to take zero-cost volunteer projects like Wikipedia and OpenStreetMap and make them cheaper for enterprises to associate user input with compatible monetization. It's now just a bonus side-effect, rather than their mission, that any public good comes from these projects.

1: https://en.wikipedia.org/wiki/Denny_Vrande%C4%8Di%C4%87

"translated by an advertising company" is akin to "Tor was funded by the US government" - it's basically organizational ad hominem.

Google's translations are fine and are high quality and don't yet (or in the foreseeable future) inject ad copy into the translations (like they do on eg Google Maps for POIs).

That's apples and oranges though. Tor is out of control of the US military as this point (+/- your tinfoil hat level), whereas Google Translate was created and is owned solely by Google. I'm not saying GP is fully correct but context is important.

I personally think using transformers for, well, transforming input into another language is going to be a great approach once hardware catches up for local offline use at a reasonable speed and hallucinations are minimized.

Corporate entities come and go. They bait-and-switch at will as they are ultimately only answering to legal obligations and in particular shareholders. It would be odd to overlay such a liability and uncertainty on top of wikipedia.

While abstraction is not the same as translation, if the wikipedia community wants specifically a translation service that is more tightly integrated into the platform imho it should be a fully open source project.

My point is about translating after the fact by the end user solving the problem. Now you can use Google translate for free, later you can use your own LLM. Abstracting the knowledge away is wasted work. We already have it in a definitive source language (english for most things, local languages for local things).

This abstract Wikipedia sounds like Esperanto to me.

> Abstracting the knowledge away is wasted work

Translation solves an immediate problem of giving human users a glimpse of Wikipedias knowledge base, but it is still stricly wrapped in textual data. It is still a content black box that, e.g an LLM would not make more transparent.

Abstraction builds a mathematical representation. Its a new product and it opens up new use cases that have nothing to do with translation. It may on occasion be more factually correct than a translation, or may be used in conjuction with translation, but is potentially a far more flexible and versatile technology.

The challege is really matching ambition and vision with resources and execution. Especially if it is to attract volunteers to crowdsource the enormous task, it needs to have a very clear and attractive onboarding ramp. The somewhat related Wikidata / wikibase projects seem to have a reasonable fan base so there is precedent.

What is the value in Wikpedia abstracting the language in its articles apart from translation?
Similar to abstracting maps and geography into GIS data and getting things like geographic proximity and POI-type filtering with lower overhead than creating a category tree for place articles in Wikipedia.

For instance, Wikipedia right now relies quite a lot entirely on manual tagging (authored categories) for classifying related subjects. If you want a list of all notable association footballers, for instance, then the best way to get one is to go to Category:Association football players. But then you're stuck in a very human, flawed, and often in-flux attempt to reach a consensus definition of that, and the list remains out of reach. (Hell, American players are categorized as "soccer players" under the same tree, confounding things like search, because that's the kind of thing Americans do.)

With abstraction, you get classification for much less, and the consensus problem moves from an arbitrary, authored category tree to a much narrower space. If an article is about a footballer, and the abstract data for that subject contains occupation Q937857 (association football player). The dialect and language don't matter — a footballer is a footballer. If you just want a list of footballers, you can get just a list of footballers without even going near things like SPARQL: https://www.wikidata.org/w/index.php?title=Special:WhatLinks...

You might well be right. Furthermore, English is on its way to become the universal language everyone speaks. You are however wrong about comparing AW to translators, which are probabilistic algorithms whereas AW is intended to be as exact as Wolfram Alpha. AW should be also able to use Wikidata to generate unique articles that do not exist even in English.

BTW, translation tech is not as good as you paint it here. I regularly translate my English blog posts to Slovak and every blog post requires 20-30 corrections. DeepL is marginally better than Google Translate. GPT-4 cannot even get word inflection right, an embarrassing fail for such a large model.

Not to mention that machine translation became dramatically better with LLMs like GPT 4 and will just get better over time... rapidly.

I fully expect by next year for AI translation of Wiki-like content to be essentially perfect.

Not everyone wants to use Google services to be able to read information on Wikipedia.
And not everyone wants to donate to wikpeda to solve a solved problem
improving language (and communication) is never a solved problem
Talk about over engineering
Yeah, you're not the only one to think that https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2...
Wow. This feels like someone has taken a Borges parody and ran with it:

> What is the scope of the new "Wikipedia of functions"?

> [...] Vrandečić explained the concept of Abstract Wikipedia and a "wiki for functions" using an example describing political happenings involving San Francisco mayor London Breed:

> "Instead of saying "in order to deny her the advantage of the incumbent, the board votes in January 2018 to replace her with Mark Farrell as interim mayor until the special elections", imagine we say something more abstract such as elect(elector: Board of Supervisors, electee: Mark Farrell, position: Mayor of San Francisco, reason: deny(advantage of incumbency, London Breed)) – and even more, all of these would be language-independent identifiers, so that thing would actually look more like Q40231(Q3658756, Q6767574, Q1343202(Q6015536, Q6669880)).

> [...] We still need to translate [this] abstract content to natural language. So we would need to know that the elect constructor mentioned above takes the three parameters in the example, and that we need to make a template such as {elector} elected {electee} to {position} in order to {reason} (something that looks much easier in this example than it is for most other cases). And since the creation of such translators has to be made for every supported language, we need to have a place to create such translators so that a community can do it.

I'm not sure I'm smart enough to decide if this is all really stupid or not. If I had to summarize my feelings it would probably be along the lines of Q6767574, (Q6015536, Q654880), Q65660.

THAT's the reason? Conveying a sentence as a series of propositions or a tree with case labels has been tried in the previous century, without success. It does not offer a good basis for translation, as e.g. Philips' Rosetta project showed. It works for simple cases, but as soon as the text becomes more complex, it runs into all the horrible little details that make up language.

A simple example: in Spanish you don't say "I like X" but "X pleases me". In Dutch you say, "I find X tasty" or "X is good" or something else entirely, depending on what X is. Those are three fairly close languages. How can you encode that simple sentence in such a way that it translates properly for all languages, now and in the future?

Symbolic representation isn't going to cut it outside a very narrow subset of language. It might work for highly technical, unambiguous, simple content, but not in general. Whatever you think of ChatGPT, it shows that a neural network can't be beaten for linguistic representation.

> It might work for highly technical, unambiguous, simple content

I mean, the goal is wikipedia lite basically - so they are targeting technical unambigious simple content.

My understanding is the goal to target small languages where it is unlikely anyone is ever going to put in the effort (or have a big enough corpus) to do the statistical translation methods. Sort of a - this will be better than nothing approach.

The original paper [0] envisages a much wider scope. Vrandecic literally quotes "a world in which every single human being can freely share in the sum of all knowledge".

It also makes the task of the editor much, much more difficult than it is now.

[0] https://arxiv.org/pdf/2004.04733.pdf

But it seems like a huge amount of work to achieve that goal.

I suspect a large proportion of the realistic target audience are bilingual.

Reminds me of this section of Cryptonomicon:

"""

RIST 9E03 is the RIST that RIST 11A4 denotes by the arbitrarily chosen bit-pattern that, construed as an integer, is 9E03 (in hexadecimal notation). Click here for more about the system of bit-pattern designators used by RIST 11A4 to replace the obsolescent nomenclature systems of "natural languages." Click here if you would like the designator RIST 9E03 to be automatically replaced by a conventional designator (name) as you browse this web site.

Click.

From now on. the expression RIST 9E03 will be replaced by the expression Andrew Loeb. Warning: we consider such nomenclature fundamentally invalid, and do not recommend its use, but have provided it as a service to first-time visitors to this Web site who are not accustomed to thinking in terms of RISTs.

... Click.

RIST stands for Relatively Independent Sub-Totality.

... Click.

A hive mind is a social organization of RISTs that are capable of processing semantic memes ("thinking"). These could be either carbon-based or silicon-based. RISTs who enter a hive mind surrender their independent identities (which are mere illusions anyway). For purposes of convenience, the constituents of the hive mind are assigned bit-pattern designators.

Click.

A bit-pattern designator is a random series of bits used to uniquely identify a RIST.

"""

Feels a lot like RDF, especially in terms of how I expect the underlying utopian dream to play out.
Vrandečić was Google's consultant on the old Freebase's RDF export. Wikidata, which he helped create, succeeded it. It's the same people pushing the same solution under different names.
My takeaway from this is that Wikimedia clearly has way, way too much money.
Reads even worse than Ulillillia literature, at least he doesn’t fully yield to scientific measurements
this is the kind of unhinged make-work schemes all those wikipedia beg banners are funding
This particular work is mostly funded through a set of large restricted donations, not through the general funds.
So Wikipedia asks me for donations but does company off-sites in Switzerland with Google?
Google.org donated money and staffing to support the Abstract Wikipedia project. Two of the seven Google.org fellows who were supporting the Abstract Wikipedia team are permanently based in Zurich[1], and Google was able to provide space to meet. It was the most practical place to hold an off-site.

  [1]: https://diff.wikimedia.org/2022/04/14/google-org-fellowship-with-abstract-wikipedia-and-wikifunctions/
Do you think they should have to embrace austerity because they’ve asked for donations? Or do you think they can use donations in lieu of advertising dollars and otherwise function like any other similar company? Do you think it’s possible they were invited by google.org or received donations for the off-site itself?

I guess I’m not sure why this is remotely worth commenting on, but it seems to have struck a nerve. It’s like being upset that NPR takes donations but then gives its staff 15 minutes off to watch a tiny desk concert sometimes.

>Do you think they should have to embrace austerity because they’ve asked for donations?

"Not embracing austerity" is one thing, "asking for donations" is another thing, "what Wikimedia currently does" is something completely different from these two things.

When you get a banner featuring Jimmy Wales with the words "Please read: A personal appeal from Wikipedia founder Jimmy Wales" and then something like this:

>To all our readers in the UK,

>Please don’t scroll past this. This Friday, for the 1st time recently, we interrupt your reading to humbly ask you to support Wikipedia’s independence. Only 2% of our readers give. Many think they’ll give later, but then forget. If you donate just £2, or whatever you can this Friday, Wikipedia could keep thriving for years.

The impression is that Wikipedia (NOT Wikimedia) is in need of money to keep operating, which is simply not true.

Wikipedia has got more than enough money to keep operating, if Wikipedia, ever in our lifetimes, goes under, it won't be because they weren't given enough money but because they mishandled it.

It's like having a beggar come to you saying that he needs to eat, then seeing him 20 minutes later driving a porsche. I consider this to be abhorrent behavior. I donated once and will NEVER. EVER do it again and I advise nobody does it. If you want to do a good deed donate to the Internet Archive.

> if Wikipedia, ever in our lifetimes, goes under, it won't be because they weren't given enough money

I agree, I think it will be because they'll accept more money from commercial actors on the terms of whoever these actors are – Google currently does not seem to force any conditions on WP, as far as I can tell.

> If you want to do a good deed donate to the Internet Archive.

I agree with this as well but I consider both Wikimedia and the Internet Archive as extremely important.

Charitable causes always are at risk of "wasting" money. But the reason for that is that in a purely capitalistic sense the cause itself is not profitable.

Because nobody who survives on donations or taxes should have the luxury of consuming more than 800 calories per day.
I think a lot of abstract wikipedia is coming from restricted grants not the general donation pool.

Its a weird thing in the non profit world where its often easier to get money for pie in the sky things than keeping the lights on.

Why is that so hard to believe? It’s a global organisation.
This is a much more interesting link than the actual article link.