Hacker News new | ask | show | jobs
by monkeybutton 1607 days ago
There are plenty of ML models that are trained unsupervised on text. What you would do next with your dead-language BERT, I don't know. But you could definitely make one.
1 comments

One issue is that we don't have a lot of text, not even a megabyte of it(represented as unicode characters). So you could get a language model, but how could you judge its output? Maybe it would be really good at generating more similar text, but that text isn't probably super representative of things we would want to be able to read.