|
|
|
|
|
by agentcoops
295 days ago
|
|
Not an answer to your original question, but I think you’d be surprised how much high quality historical linguistic content was hiding in the dusty old corners of the internet. I’ve been doing some work recently with LLMs on historical languages (various forms of Latin, Ancient Greek and medieval European languages) and the out-of-the-box performance of state of the art LLMs is shockingly good. It isn’t that surprising when you remember all these archive digitization projects that took place in the early 00s, but ended up either as stale links, preserved only by archive.org, or stored in arcane CRMs essentially unusable by humans. I assume the same is especially true for various historical Yiddish corpora. I ran some tests and, without fine-tuning, GPT can translate medieval German, for example, considerably better than well-known scholars today. |
|