| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by algoth1 72 days ago
	This really makes me think if it would be feasible to make an llm trained exclusively on toki pona (https://en.wikipedia.org/wiki/Toki_Pona)

2 comments

MarkusQ 72 days ago

There isn't enough training data though, is there? The "secret sauce" of LLMs is the vast amount of training data available + the compute to process it all.

link

algoth1 72 days ago

I think you could probably feed a copy of a toki pona grammar book to a big model, and have it produce ‘infinite’ training data

link

MarkusQ 72 days ago

This is essentially a distillation on the bigger model; you'd wind up surfacing a lot of artifacts from the host model, amplifying them in the same way repeated photocopying introduces errors.

https://dailyai.com/2025/05/create-a-replica-of-this-image-d...

link

eden-u4 72 days ago

There are not enough samples in that book to generate new "infinite" data.

link

mudkipdev 72 days ago

People have made toki pona translation models before, not exclusively trained though

link