Hacker News new | ask | show | jobs
by embedding-shape 102 days ago
> given there is no training dataset, any other language would work better with AI.

I guess it depends on what "would work better" really means, but I don't think it's always a given. I've made my own languages, there is no available training set on exactly those, but AI with a prompt can figure out how to effectively use them as much as any other language, it seems to me. I guess it helps that most languages are more similar to each other than different, but even experimenting with new syntax seems to work out OK for me.

2 comments

To me it seems like a pretty strong given because context windows are an important thing.

I can tell an llm "write hello world in C", and it will produce a valid program with just that context, without needing the C language spec nor stdlib definition in the context window because they're baked into the model weights.

As such, I can use the context window to for example provide information about my own function signatures, libraries, and objectives.

For a language not well-represented in the training data-set, a chunk of my context has to be permanently devoted to the stdlib and syntax, and while coding it will have to lookup stdlib function signatures and such using up additional context.

Perhaps you're trying to argue that the amount of tokens needed to describe the language, the stdlib, the basic tooling to look up function signatures, commands to compile, etc is not enough tokens to have a meaningful impact on the context window overall?

In basic scenarios agree it’s possible. I’ve similarly toyed with building a small language and LLM picks it up. But that’s an additional context and it’s never felt like it worked “out of the box” vs say Typescript. Just saying that it’s an adoption barrier established languages don’t have.