Hacker News new | ask | show | jobs
by ojo-rojo 791 days ago
> now that a team of international researchers revisited the site and published their findings in English in the journal PLOS One, scientists around the world are learning about the boats for the first time

It's regrettable that research had to be published in a specific language for scientists to leverage. Maybe recent advances with language models can change the game here and make knowledge more accessible across all fields.

4 comments

>It's regrettable that research had to be published in a specific language for scientists to leverage.

Not really. ~70 million people read Italian. Billions read English. The biggest journals are English language. Most of Europe learn English. Chinese academics learn to write in English, and English is the official language of India.

It'd be regrettable if they had to publish in Science/Nature to get noticed, but PLOS One is pretty open.

> ~70 million people read Italian. Billions read English.

It does not matter. Researchers interested in old Rome or in old canoes will be able to find it. All articles have abstracts in English and academics do an extensive use of keywords and publishing databases.

I had to translate a very old paper from Dutch once, before to cite it, and it didn't was an unsurmountable problem with the correct motivation. Dictionaries were made for this.

A dictionary is only one small piece of translation. Dutch is linguistically very similar to English so it's relatively easy to learn and translate. Something like Russian is far more difficult because it comes from a different language family and uses a different writing system. There is a treasure trove of Russian journal articles which have never been properly translated and represent something like "scientific dark matter". LLM translation tools can help a lot to make those more accessible.
The dutch were in an excellent position at the beginning of the quantum revolution- they could read and translate english and german, and played a key role in sharing ideas between the two centers (Berlin and London) which were not highly aware of the other's progress.
I don't think they were complaining that the world didn't read the Italian research, just that it's a shame that people from non-English speaking countries have to either be good enough at a second language to write their research in English, or need to waste time waiting for someone else to translate. Along with hope that machine translation can fix this problem and remove hurdles for international collaboration.
It's near impossible to work as a professional academic without learning English well enough to publish in it. Furthermore, in many fields you need to move countries during your academic journey and learn a language. For example my previous PI was Spanish (Catalonian actually) but did his PhD in Paris, so he needed to learn three languages in addition to his native tongue. Other lab members were French, Italian and German, all again having to have learnt English so they could each communicate with each other but also publish. The international collaboration is already there, and it's aided, not hindered, by the use of English as a common tongue.

It's a two way street as well, if an academic can't read English than they effectively cut themselves off from 95% of the research in their field, and most certainly the most impactful research. Lastly, most journals offer paid translation services and a lot of Universities will similarly offer a service, so it's almost a moot point.

I appreciate that a lot of people successfully get round the issue of learning English, but isn't this submission a direct example that there are still cases where language is barrier? And it's not like it's a one off case, surely there's huge amounts of research published in languages like Chinese, and probably smaller amounts in various fields published in the languages of pretty much any country by people who want to work on science even if they don't want to learn English?

I'm not arguing that it's high on the list of things that could be improved in the world of research, just that it's something that would be worth improving if and when computers are good enough to remove this friction.

Recent advances aren't even necessary; translation software has been a thing for two decades now, even back then it should be good enough to create a translation that can at least be indexed and interpreted as "this is relevant to my interests".

But it's down to the scientists and their publishers to decide whether to publish the papers as readable text, feed them into translators and republish them in different languages, or at least the abstract / things that scientists use to find papers.

English is the new lingua francua.

AI does a piss poor job translating nuance and context. If you cannot understand the language you are publishing an article in, you should not publish.

Currently the language of the world is English. Maybe that won’t always be true. Maybe English will gain so many loanwords that it will look completely foreign 1000 years from now. But learning a language isn’t particularly hard, and the only real difficulty seems to be Korean/chinese/japanese <-> English.

> But learning a language isn’t particularly hard

Actually, it is probably one of the hardest things you can attempt. You can speak a language every day for 30 years, but if you start after you're maybe 14, native speakers will be able to spot you in (literally) less than a second.

Speaking (and thus cca thinking in) foreign languages massively stimulates your brain. You can't do much more for your (or child') mental development than make it learn and keep using multiple languages. Age doesn't matter, you arrive at the destination just a bit slower - Ie I am in my forties and learning french, slowly but surely.

If they spot you, 2 things may happen - they will be nice like all normal people anywhere do and maybe even appreciate that other people are learning their non-trivial and probably a bit obscure language.

Or you hit the rest, and they will either laugh at you, ignore you or just switch to english if they want/can. This is very common in France in my decade and a half experience of going there almost every weekend, also very frequent among young. Their own shame and mistake, instead of embracing the future and improving themselves, they choose the other direction and watch world slowly pass by. I think it comes from their deeply flawed education in this regard, not on languages per se but whole view on exceptionalism and what current world actually revolves around.

I know people say that, but I also know quite a few people who learned English after that age who sound "native". All of them now live in English-speaking countries.
So? They can't hear your accent in a journal article. Unless you're an actor the goal of an adult learner probably shouldn't be a native accent. You can speak competently but with a non-native accent, it's fine.
That's because you're only aware of the "popular" ones.

Lots of less known not PIE-languages (Proto Indo-European) out there that will make you reconsider your opinion.

There are roughly 6,000 languages in the world. My own first language, Danish, has about 6 million native speakers, and it is something like the 50th most spoken language in the world. The world is a big place.

Did you know that there are languages where you can't form a sentence without describing what direction something happens in? Like you can't just "see the house", you're either doing it up or down the mountain. And you know how in English you can't really say anything without saying if it's going on now or happened in the past. Other languages don't really care about that at all, you can speak all day with specifying if it "is" or "was".

I'm not familiar with Danish, so this background of yours may be influencing your comment, but I'm not sure what you think you are referring to with your last two sentences on the temporal aspect of English. With the romance languages the tense is often explicitly built into the conjugation of verbs. Korean and Hindi also have conjugation systems to specify the tense. English does not have a similar system, so auxiliary verbs are relied on. Romance languages and others that have explicit tense conjugation systems are still specifying whether something "is" or "was", but through conjugation, rather than purely auxiliary verbs. Of note though, is that in Japanese, base words do not change conjugation between tenses, but are modified with a auxiliary verb system, where the tense is specified with auxiliary verbs, as with English.
If we described tense as something like the grammaticalization of the temporal relationship between the state of affairs described in the utterance and the time of the utterance itself, then it is my understanding that this isn't really a meaningful category in e.g. Sino-Tibetan or Austronesian languages. You can specify "today" or "tomorrow", but most sentences will be ambigous as far as time goes.

(I'm also slightly confused by your suggestion that English does not have tense outside of auxilliary verbs? "He goes" vs. "He went"?)

Imagine a world where every piece of poetry, novel or literature would be forced to be written in English. Our literature would be much poor and many new points of view would be dismissed just because "not enough Anglo Saxon". This does not happen because editors take care of the translation, hire professional translators for that, and let the creator alone to keep creating while happily selling the novel in 40 different languages. That is the right way to do it.

Science still lives in a previous, less sophisticated, age. We should be grateful for them not expecting us to knit our own tunics also.

The idea that somebody somewhere could have a possible solution to our urgent problems but we'll need to wait until this people learns English and "earns the right to be heard", is a disaster.

> Imagine a world where every piece of poetry, novel or literature would be forced to be written in English.

Nobody is wanting that world.

> because "not enough Anglo Saxon"

What is this, the year 1055? "Anglo Saxon" ceased to be a cultural group later that century. Now "Anglo-Saxon" is just a phony geopolitical spectre, mostly invoked by despotic governments as a scapegoat.

> hire professional translators for that, and let the creator alone...that is the right thing to do.

It would be if scientific publications had audiences where it made economic sense to do such a translation. Additionally, the cost of such translation for scientific literature is much more expensive owing to the need for great accuracy.

Most scientists work for free for the journals as reviewers, the same Journals earn also money obviously selling the product and they aren't cheap. They didn't spend a cent into the research done. Somebody else paid for it.

How the cost of a translator could be too expensive when everybody is working for free for you?

> What is this, the year 1055?

If your surname is Brown your work will be treated clearly different by journals than if it is Gutierres or Coulibaly. The editors probably don't even perceive that there is a bias here.

Including at least a member with an English name in your team is a known trick that eases to be accepted by publishers.

>How the cost of a translator could be too expensive when everybody is working for free for you?

I don't necessarily disagree that translation would be beneficial, but it isn't just a matter of cost. As an author, I would absolutely not be comfortable with a journal translating my research without my direct involvement. Translation would require too much expert knowledge of the specific fields involved. I'd even be uncomfortable myself with translation of specific terms, even for languages I might know in non-scholarly contexts: it would require knowing the literature of the specific field in the language.

So it wouldn't be enough to have the journal hire a translator. It would be more work for the authors, and likely for others in their field who would need to be brought in.

> Imagine a world where every piece of poetry, novel or literature would be forced to be written in English. Our literature would be much poor and many new points of view would be dismissed just because "not enough Anglo Saxon"

In such a world “written in English” wouldn’t imply “Anglo Saxon”, just as today, it doesn’t imply “British” anymore.

I think people would be better off if they all spoke the same language. Reason is that, statistically, the best literature is written in a language with many writers. So, if you grow up monolingual speaking a minority language, the best stuff you can read won’t be as good as the best stuff written in English, Spanish, Mandarin Chinese, etc. The worst stuff you can read won’t, either, but nobody reads that stuff (some may argue Hollywood is the exception that proves the rule)

Getting there will have pain points, though. Going cold turkey would cut off people from their culture (imagine children not being able to read what their parents wrote). However, a few centuries of bilingual education would, IMO, be a fairly smooth way to get there. People would no longer be able to read what their forefathers wrote, but that already is the case with most languages (few people can read Middle English fluently, for example)

That’s all assuming the “English” people would speak would be universally intelligible, though. That’s far from guaranteed to happen. Subcultures with their own words and grammar changes would still form.

> I think people would be better off if they all spoke the same language.

This has some problems related to the resulting intellectual monoculture.

I don’t see how that follows. As a counterexample, do you think there’s a monoculture in the English-speaking part of the USA?
Yes? It isn't subtle.
>some may argue Hollywood is the exception that proves the rule

Anyone who argues that has no understanding of how proof works.

"The exception that proves the rule" is an idiom that depends on a different meaning of the word "prove" from what you have in mind.
No, that sense of the expression is just nonsense based on a misunderstanding of the actual expression.
> The idea that somebody somewhere could have a possible solution to our urgent problems but we'll need to wait until this people learns English and "earns the right to be heard", is a disaster.

Gauss, Leibniz, Euler, and the Bernoullis (all of them) wrote their theses in Latin.

Methinks you aren't so much against a lingua franca but want it to be Latin(ate) and not English.

Imagine a world where every scientific articles from most regions of the world were written in the language that was only used by scientists and a few religious zealots.

Then you get something historians call The age of enlightenment...

> But learning a language isn’t particularly hard

I'm very curious about what makes you think that. I belive it's true only if you already speak one or two other similarly rooted languages, or if you learn the language before you're 10.

Did you learn your first foreign language as an adult and found it "not particularly hard"?

people often move to places with a language that is needed to exist in that place and then manage to learn that language. I would say becoming proficient in a language with small effort and no especial facility in languages should take no more than 2 years. With a good deal of intensive effort, focus and natural 3-6 months.

this also depends a lot on the language, many of the Romance languages are relatively easy to learn, many of the Nordic ones seem quite difficult.

Difficulty of course may depend on what you are moving from to what you will be using.

Abilities to do this range widely with individuals.

Leaving that aside, there is a concept of linguistic distance from people´s first language, which makes it easier or harder to learn another language. French and English are very close to each other, as are Arabic and Hebrew, the Scandinavian languages are also all very close to each other (a little further from English), except for Finnish of course, which is kind of out there on its own. There is a nice recent paper looking at this in the academic context:

https://www.sciencedirect.com/science/article/pii/S004873332...

Essentially how far your native tongue is from the language you are attempting to learn will drastically affect how long it takes to learn it for most people.

For the other direction, based on a couple of centuries of data: https://www.state.gov/foreign-language-training/

Some languages take english L1 speakers 3x as long to learn as others.

I agree with GP's 2 years for Cat I-II languages (assuming you're mostly living there and learning the language in your spare time, not intensively).

I spoke absolutely fluent French, when I was a kid, living in Morocco. You could say a sentence; half in English, and half in French, and I'd not notice the difference.

I've forgotten almost every word. It would be quite difficult for me to relearn, at 61, and I'd likely not have anywhere near the efficacy, that I had, then.

People say how easy it is to learn new programming languages. I've probably written in a half-dozen different ones, over the years.

IME, learning the basics takes a week or two, but it takes years to really grok the language, at the fundamental level.

Chance is you haven't forgotten as much as you think and it would come back super fast if you had to use it daily again.

Having said that, this is a thing to be able to learn a language enough to interact with people and live in a country where another language is spoken. It is a different thing to use that language to understand completely scientific/medical/law/tax forms and texts.

> It is a different thing to use that language to understand completely scientific/medical/law/tax forms and texts.

Jargon has to be learned independently. This is true of every kind of jargon, not just academic and legal stuff. If you want to talk to junkies and sound like one of them, you'll have to learn how first.

The technical term for this kind of concern is usually "register", as in "writing in an academic register".

A Chinese college student once asked me to review a paper of theirs for English quality, because their professor had criticized the English in a prior paper and they trusted me to be a native speaker (which I am). But being a native speaker didn't really help; once I saw the paper, I had to say "I'm sorry, but I don't have academic business training; I can't guarantee that anything I said would be phrased correctly."

> I've forgotten almost every word. It would be quite difficult for me to relearn, at 61

Have you tried? It would probably be a lot easier than you imagine.

Not really. Maybe you’re right.

It would not be useful, unless I lived in an area where I would use it. Maybe if I moved to Canada.

Same goes for computer languages. At one time, I was quite fluent in C++, but I can hardly recognize it, anymore.

With both programming languages and spoken ones you also have the language moving on without you.

If you remembered pre-C++0x, or god forbid, pre-Standard C++, it would probably hurt as much as it helped, as your old idioms would either be outdated by new std features, or outright dangerous by modern coding standards.

At least with old spoken languages, you'll just either sound like a small child or a character from an old movie, using slang that no one uses anymore.

Erm, what? English is literally the most important language in the world. You're a nobody if you can't speak English in most of the world.