Hacker News new | ask | show | jobs
by wongarsu 1250 days ago
Addendum: One possible challenge is that so far large lanuage models are trained on a large sample of all text that has been published, while what we have of cuniform is a decent sample of all text that has been written. Meaning most cuniform tablets are inventories, invoices, requests for payment, contracts, tablets from students practicing writing etc. Types of documents that are underrepresented in traditional training data.