In addition, I think token efficiency will continue to be a problem. So you could imagine very terse programming languages that are roughly readable for a human, but optimized to be read by LLMs.
That's an interesting idea. But IMO the real 'token saver' isn't in the language keywords but it's in the naming of things like variables, classes, etc.
There are languages that are already pretty sparse with keywords. e.g in Go you can write 'func main() string', no need to define that it's public, or static etc. So combining a less verbose language with 'codegolfing' the variables might be enough.
I'm not an expert in LLMs, but I don't think character length matters. Text is deterministically tokenized into byte sequences before being fed as context to the LLM, so in theory `mySuperLongVariableName` uses the same number of tokens as `a`. Happy to be corrected here.
Running it through https://platform.openai.com/tokenizer "mySuperLongVariableName" takes 5 tokens. "a", takes 1. mediumvarname is 3 though. "though" is 1.
You're more likely to save tokens in the architecture than the language. A clean, extensible architecture will communicate intent more clearly, require fewer searches through the codebase, and take up less of the context window.
I think there's a huge range here - ChatGPT to me seems extra verbose on the web version, but when running with Codex it seems extra terse.
Claude seems more consistently _concise_ to me, both in web and cli versions.
But who knows, after 12 months of stuff it could be me who is hallucinating...
Its not verbose to some of us. It is explicit in what it does, meaning I don't have to wonder if there's syntatic sugar hiding intent. Drastically more minimal than equivalent code in other languages.
Code readability is another, correlating one, but this is more subjective. To me go scores pretty low here - code flow would be readable were it not for the huge amount of noise you get from error "handling" (it is mostly just syntactic ceremony, often failing to properly handle the error case, and people are desensitized to these blocks so code review are more likely to miss these).
For function signatures, they made it terser - in my subjective opinion - at the expense of readability. There were two very mainstream schools of thought with relation to type signature syntax, `type ident` and `ident : type`. Go opted for a third one that is unfamiliar to both bases, while not even having the benefits of the second syntax (e.g. easy type syntax, subjective but that : helps the eye "pattern match" these expressions).
I would be very interested in this research... I'm trying to write a language that is simple and concise like Python, but fast and statically typed. My gutfeeling is that more concise than Python (J, K, or some code golfing language) is bad for readability, but so is the verbosity of Rust, Zig, Java.
There are languages that are already pretty sparse with keywords. e.g in Go you can write 'func main() string', no need to define that it's public, or static etc. So combining a less verbose language with 'codegolfing' the variables might be enough.