| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bertil 1445 days ago
	Are there online corpora, like Wikipedia, that could be used to train the models? Are those under a permissive enough license to be used for model training? If there are spoken, with enough budget, a library of voices could be recorded. I think you’d prefer that collection to be gathered and maintained by a non-profit rather than Meta.

1 comments

For náhuatl, I found this: Wikipedia in nahuatl https://nah.wikipedia.org/wiki/Cal%C4%ABxatl

I’m wondering if 7065 articles is enough to train the model.