| Here is another way to look at the problem. There is a team of 5 people that are passionate about their indigenous language and want to preserve it from disappearing. They are using AI+Coding tools to: (1) Process and prepare a ton of various datasets for training custom text-to-speech, speech-to-text models and wake word models (because foundational models don't know this language), along with the pipelines and tooling for the contributors. (2) design and develop an embedded device (running ESP32-S3) to act as a smart speaker running on the edge (3) design and develop backend in golang to orchestrate hundreds of these speakers (4) a whole bunch of Python agents (essentially glorified RAGs over folklore, stories) (5) a set of websites for teachers to create course content and exercises, making them available to these edge devices All that, just so that kids in a few hundred kindergartens and schools would be able to practice their own native language, listen to fairy tales, songs or ask questions. This project was acknowledged by the UN (AI for Good programme). They are now extending their help to more disappearing languages. None of that was possible before. This sounds like a good progress to me. Edit: added newlines. |
Protecting and preserving dying languages and culture is a great application for natural language processing.
For the record, I'm neither against LLMs, nor AI. What I'm primarily against is, how LLMs are trained and use the internet via their agents, without giving any citations, and stripping this information left and right and cry "fair use!" in the process.
Also, Go and Python are a nice languages (which I use), but there are other nice ways to build agents which also allows them to migrate, communicate and work in other cooperative or competitive ways.
So, AI is nice, LLMs are cool, but hyping something to earn money, deskill people, and pointing to something which is ethically questionable and technically inferior as the only silver bullet is not.
IOW; We should handle this thing way more carefully and stop ripping people's work in the name of "fair use" without consent. This is nuts.
Disclosure: I'm a HPC sysadmin sitting on top of a datacenter which runs some AI workloads, too.