|
|
|
|
|
by LunarAurora
1228 days ago
|
|
Sadly, the very best datasets that seem publicly available are for Gulf Arabic dialect (where the money is) [1] I suggest you contact https://www.icompass.tn/, a (Tunisian) startup specialized in Natural Language Processing...that process Arabic dialects and African languages On a general note, I believe this kind of work should be a (urgently) nationally funded, because these countries will be forced to use second languages like French, or literary Arabic when AI/NLP becomes the dominant computing paradigm (bots, prompts...). A model in this respect is what Sweden is doing [1]. For mostly "oral" dialects (like Algerian I guess), collaborating with big names into adapting the best transcription models (like whisper) to them first is the key IMO. [1] https://nyuad.nyu.edu/en/research/faculty-labs-and-projects/... [2] https://news.ycombinator.com/item?id=34492572 |
|
The trick with this kind of project is the outcome. The way I was thinking about it was mostly as a personal side project. But if it requires more resources and effort than that then it's a different topic.
It's not clear who'd benefit from this, beyond an interesting curiosity to toy with here and there.