Hacker News new | ask | show | jobs
by trjordan 340 days ago
Seems like it could easily be training data set size as well.

I'd love to see some quantification of errors in q/kdb+ (or hebrew) vs. languages of similar size that are left-to-right.

4 comments

>Seems like it could easily be training data set size as well.

I'm convinced that's the case. On any major LLM I can carpet bomb Java/Python boilerplate without issue. For Rust, at least last time I checked, it comes up with non-existing traits, more frequent hallucinations and general struggle to use the context effectively. In agent mode it turns into a first fight with the compiler, often ending in credit destroying loops.

And don't get me started when using it for Nix...

So not surprised about something with orders of magnitude smaller public corpus.

I realized this too, and it led me to the conclusion that LLMs really can't program. I did some experiments to find what a programming language would look like, instead of e.g. python, if it were designed to be written and edited by an LLM. It turns out that it's extremely verbose, especially in variable names, function names, class names, etc. Actually, it turned out that classes were very redundant. But the real insight was that LLMs are great at naming things, and performing small operations on the little things they named. They're really not good at any logic that they can't copy paste from something they found on the web.
Is this really a surprise? I'd hazard a guess that the ability to program and beyond that - to create new programming languages - requires more than just probabilistic text prediction. LLMs work for programming languages where they have enough existing corpus to basically ape a programmer having seen similar enough text. A real programmer can take the concepts of one programming language and express them in another, without having to have digested gigabytes of raw text.

There may be emergent abilities that arise in these models purely due to how much information they contain, but I'm unconvinced that their architecture allows them to crystallize actual understanding. E.g. I'm sceptical that there'd be an area in the LLM weights that encodes the logic behind arithmetic and gives rise to the model actually modelling arithmetic as opposed to just probabilistically saying that the text `1+1=` tended to be followed by the letter `2`.

> I did some experiments to find what a programming language would look like, instead of e.g. python, if it were designed to be written and edited by an LLM.

Did your experiment consist of asking an LLM to design a programming language for itself?

Yes. ChatGPT 4 and Claude 3.7. They led me to similar conclusions, but they produced very different syntax, which led me to believe that they were not just regurgitating from a common source.
Great so your experiment just consisted of having an LLM hallucinate

That's not really an experiment is it? You basically just used them to create a hypothesis but you never actually proved anything

They're great at writing text and code so the fact that the other LLM was able to use that syntax to presumably write code that worked (which you had no way of proving since you can't actually run that code) doesn't really mean anything

It would be similar to having it respond in a certain JSON format, they are great at that too. Doesn't really translate to a real world codebase

  > That's not really an experiment is it? You basically just used them to create a hypothesis but you never actually proved anything
The experiment was checking how well another unrelated LLM could write code using the syntax. And then in the reverse direction in new sessions.

  > They're great at writing text and code so the fact that the other LLM was able to use that syntax to presumably write code that worked (which you had no way of proving since you can't actually run that code) doesn't really mean anything
Of course I could check the code. I had no compiler for it, but "running" code in one's head without a compiler is something first year students get very good at in their Introduction To C course. And checking how they edit and modify the code.

This isn't a published study, it was an experiment. And it influenced how I use LLMs for work, for the better. I'd even call that a successful experiment, now that I better understand the strengths and limitations of LLMs in this field.

Is there a reason you believe the models can accurately predict this sort of thing?
There wasn't, but after taking the syntax that I developed with one model to another model, and having it write some code in that syntax, it did very well. Same in the other direction.

LLMs need all their context within easy reach. An LLM-first (for editing) language still has code comments and docstrings. Identifier names are long, and functions don't really need optional parameters. Strict typing is a must.

In my experience, claude works well at writing rust, and gemini is terrible. gemini writes rust as if it's a C++ programmer who has spent one day learning the basics of rust.
i tried gemini, openai, copilot, claude on reasonably big rust project. claude worked well to fix use, clippy, renames, refactorings, ci. i used highest cost claude with custom context per crate. never was able to get it write new code well.

for nix, i is nice template engine to start or search. did not tried big nix changes.

Yep. I had similar issues asking Gemini for help with F#, I assume lack of training data is the cause.
Hebrew is still written sequentially in Unicode. The right-to-left aspect there is simply about how the characters get displayed. On mixed documents, there is U+200E and U+200F to change the text direction mid stream.

From the perspective of a LLM learning from Unicode, this would appear as a delimeter that needs to be inserted on language direction boundaries; but everything else should work the same.

I know I'm being pedantic, but I just want to point out that even U+200E/U+200F are generally not needed. If you put a Hebrew word in the middle of an English sentence, it displays correctly all by itself. This is due to the Unicode bidirectional algorithm, which defines a super sensible default behavior. You only need the RTL control characters in weird circumstances, perhaps ones involving punctuation marks or unusual uses of special characters.
> Hebrew is still written sequentially

Everything is written sequentially in the sense that the character that is written first can only be followed by the character that is written next. In this sense writing non-sequentially is logically impossible.

An older Hebrew encoding actually encoded the last character first, then the penultimate character, then the character preceding that, etc.

Exercise to the reader to guess how line breaks, text wrapping, and search algorithms worked.

Multiple characters can be written at once, they can also be done in reverse or out of order.
No no, the second character you write must always be temporally preceded by the character you wrote first. Otherwise the second wouldn't have been the second, but the first, and moreover, the first would have been the second, which it wasn't.
You could write multiple characters simultaneously. CRTs sort-of did that, for example, starting characters with ascenders before those without and finishing the characters without descenders before those with descenders.

So, in the word “gif”, they would start writing the “f” first and finish writing the “i” first (just before writing the last part of the “f”. For “if”, writing the “f” would start before writing the “i” started and finish after writing the “i” finished.

In traditional printing “writing” can happen simultaneously for an entire page, but colour printing can make things more complex.

I encourage you to find some place that still uses a Hebrew typewriter. When they have to type numbers, they'll type the number in backwards. And an old Hebrew encoding also encoded characters in reverse order.
I think parent just means that "backwards" is a relative term. Your backwards is someone else's "forward". For someone who is used to reading Hebrew, they would be used to reading right to left and this would seem completely natural, no?

Basically, the numbers 1234 and 4321 are identical assuming one is written left to right and the other is right to left. Then it's just a convention which way you are used to reading.

I know nothing of Old (or New) Hebrew unfortunately so I may be completely off base.

> (or hebrew)

W.r.t. natural languages, TFA clarifies it a bit:

> And it’s not the same as translation to Arabic or Hebrew; direction here refers to the temporal order in which the tokens are produced; even for right-to-left languages, the order in which the tokens get produced remains unchanged; rather, a thin display layer handles the visual presentation.

That’s what I thought. Lack of training data might be a reason.