Hacker News new | ask | show | jobs
Can you LLM a custom language?
1 points by campervans 840 days ago
If token limit and accuracy are important, it seems English (or other spoken languages) are no optimal.

They're a butchered product of history and easy verbal noises.

A new custom language seems inevitable, that is concise, unambiguous, rooted in relation with custom words. Replacing common sentences with simple strings such as "Once upon a time..." to "a1"

Most likely alpha-numeric, to minimise tokens, and generate an order of magnitude increase in context window.

Followed by translation back to {language}

Is this possible? Anyone working on it?

(here to be educated)

1 comments

> Replacing common sentences with simple strings

This is what byte-pair encoding does. It doesn't go quite so far as to allocate only a single token to "Once upon a time", because that string isn't actually that common, but in principle it could.

Trying to get humans to produce content directly in such a concise representation is a waste of time, since LLMs heavily rely on the ability to take whatever content is already available on the internet, which drastically reduces the labor cost of acquiring training data.

Makes sense, thanks for this