| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gertlabs 23 days ago

When you're working on something difficult that requires a model to reason intelligently, lower level and strongly typed languages often outperform on the same problems [0]. We have a few hypotheses about why, with a moderately high correlation between performance and token density of the output program -- i.e. more token dense languages are more difficult for programs to reason about.

Most models come up with the least effective solutions when writing Python.

[0] https://gertlabs.com/rankings

4 comments

WatchDog 23 days ago

Your theory about token density seems reasonable, but your data doesn't seem to really match it.

Very little difference between TypeScript and JavaScript, which are essentially the same language, just one has more tokens.

Functional languages like Clojure and OCaml are pretty dense, I would have expected them to feature lower.

Kotlin is in some ways a more token dense version of Java, yet Kotlin leads, and Java is almost last.

ajb 23 days ago

That's a very interesting page, but the language ranking is wildly different for "average percentage" (python bottom) and "success rate" (python second). Sounds like there is some subtly about this.

gertlabs 23 days ago

Success rate is essentially loading/compilation success + ability to adhere to the environments' rules.

For one-shot responses, the majority of failures are environmental/syntax, which naturally favors interpreted languages. For longer agentic coding sessions, models solve the environment issues quickly and it becomes a fair comparison of who comes up with the smarter solution. You can filter for that here: https://gertlabs.com/rankings?mode=agentic_coding

christophilus 23 days ago

I ran a little test with Go, TypeScript, Clojure, F#, Haskell, and Rust. Token count was roughly in the same ballpark, but it used the fewest for TypeScript, then Go. The rest required a bit more. Clojure always won in terms of lines of code though, generally coming in at 1/2 the size of the Go or Typescript solutions.

iLemming 23 days ago

I bet you have not tested it with live Clojure REPL. When you give the LLM living, breathing REPL, it stops guessing and starts empirically analyzing current state of things and produces working solution faster, costing far less tokens.

This article doesn't seem to mention it either https://martinalderson.com/posts/which-programming-languages...

even though states Clojure to be the most token efficient. I personally, honestly don't care much. In my opinion (using LLMs with multiple different languages), specifics of PLs don't matter to the point of stating a "clear winner". It's not the language that matters but the "stories you tell" with the language. And greatest stories sometimes told in the languages people long forgotten.

mamidon 23 days ago

That's ... an interesting observation. I've found that LLMs work great when they can check their work and have a clear notion of what correct looks like. e.g. a good test suite they can rely on.

I'm sure a REPL helps out with this sort of thing quite a bit!

iLemming 23 days ago

Yeah, most kids have no idea what it is like. Here's my longer comment further down in the thread. https://news.ycombinator.com/item?id=48287062

jaggederest 23 days ago

This is really cool, it's one of my goals. I think LLM programming is essentially simulated annealing and the more you can do to constrain the problem space, the better.

I wonder how LLMs would do with something like an image based system - it seems like you could pin the image and get a perfectly reproduced environment to get the LLM to make changes to, each time.

iLemming 23 days ago

We have a few backend services written in Clojure. We expose nrepl ports in our testing SDE k8s clusters. Our UI is all in Typescript, but I use nbb with Playwright - that gives me another REPL. LLM can poke through things in the UI REPL, while simultaneously running subagents that monitor situation in the cluster and then can dynamically change backend behavior. This all happens dynamically, without recompiling anything, without restarting, re-deploying, even saving the code (until it works), without losing the state.

christophilus 23 days ago

No, I didn’t. It occurred to me that I should, but I ran out of time. It’s been a hot minute since I last wrote Clojure, but it was by far my favorite codebase to read out of this experiment. It’s a great language.

pyinstallwoes 22 days ago

As someone who hasn’t felt that magic with a repl and LLM’s, how do you integrate the two? Does it have to be in emacs and every buffer is accessible by a LLM?

iLemming 22 days ago

It depends. If you're building an Emacs package or extending your own customizations, having "every buffer accessible" helps, because that way LLM is not dealing with Emacs (so to say), but with Emacs REPL, that has access to everything running within it.

That however doesn't really work well with other languages like Clojure - LLM can poke into clj REPL from the Emacs REPL by tapping to it through emacsclient, in practice - these layers start leaking pretty quickly. For Clojure (or Lisps in general), you need something that changes the fundamental model of how agents work, which is roughly the Unix/pipe model. Agent spawns process -> reads stdout/stderr -> spawns next process. State lives in files. Each tool invocation is stateless. This works for languages with batch-style toolchains (compile, test, run - pretty much any non-lispy lang). An agent that edits files and re-runs `clj` is doing something fundamentally different from what a Clojure developer does.

For Lisps, you need persistent nrepl connection, eval-in-namespace as the primary tool, ability to inspect live values, hot-reload awareness - not "run clj test and parse the output". For that you need specialized MCP. There are plenty of existing solutions. I built mine in Clojure (in babashka)¹. But that's because I use Emacs. If I was using VSCode, I'd probably use BackSeatDriver²

The main point is - Lisp REPLs are great and powerful, and there's no significant obstacle not to utilize them with LLMs. If you're using a homoiconic language and not utilizing the full power of the REPL, really, why?

---

¹ https://github.com/agzam/death-contraptions/tree/main/tools/...

² https://github.com/BetterThanTomorrow/awesome-backseat-drive...

pyinstallwoes 20 days ago

Thank you for sharing. I never played with Clojure before but I’m very intrigued by this repl driven workflow from an agentic perspective given it changed the nature and relationship to the program/code.

I’ll look into the links and mentioned programs. Thank you.

iLemming 19 days ago

I've been programming for over four decades as a hobby and more than twenty years professionally. I have gone through dozens of languages and still trying to learn more. Getting into Clojure still remains one of the best decisions I made in my life. I don't know why mainstream programming media - publishers, meetups, major conferences don't talk about Lisp (as if it doesn't matter at all). Common Lisp perhaps has created this aura of elitist culture (similar to Haskell) - "you have to be this tall to ride this rollercoaster", etc. Turns out, Lisp really is not that difficult. You don't need years of accumulated knowledge to start programming in it. If you know just a single mainstream PL - you already have almost everything you need to start, because pretty much every single programming language has been influenced by Lisp.

And Clojure, unlike Haskell, is way more down-to-earth and enormously practical. I'm not saying Haskell is not, but let's be honest, for anyone to start writing production-grade Haskell would take weeks, not hours. While Clojure needs minutes. These days you can just download the Calva VSCode extension and start playing with it.

Even if one thinks "there's just no way to use it at work", there are so many smaller things they could use it to improve their personal workflows. "Of course, when you only know a hammer, everything looks like a nail", someone might say. Yet, for me, who has seen, learned and used dozens of different "hammers", this one does look quite interesting. For a bunch of pragmatic reasons. And my message to any "hammer-wielding craftsman" - you really don't need to try dozens of hammers to see the value in a good one, and Clojure is a pretty darn good one, I promise.

e12e 23 days ago

Curious what that would look like - and if javascript, ruby would benefit equally?

How do you work with LLM and repl?

e12e 23 days ago

On a side note, for other helix users, I found two approaches to improve repl interaction from helix:

https://github.com/a3lem/replink

https://github.com/waddie/nrepl.hx

iLemming 23 days ago

When I say REPL, I specifically mean "Lisp REPL". Every step in Read.Eval.Print.Loop slightly differs in homoiconic languages like Clojure, CL, Fennel, Elisp, etc. Javascript and Ruby in that sense can only "benefit" if they have a homoiconic language on top - e.g. Clojurescript.

e12e 23 days ago

How so? I don't think anyone would consider Smalltalk to be homoiconic? But would consider it to support a repl-like development process?

Smalltalk has system images - which AFAIK clojure lacks (as does python, ruby).

I wonder if it would be possible to pair python ZODB with storing python code alongside the pickled objects... And effectively create an unholy image-like workflow with IPython and ZODB?

But at any rate, I was more curious about how you mix repl, clojure and LLMs in practice?

thenobsta 23 days ago

I would looove to see an analysis of this.

iLemming 23 days ago

I can only share my empirical anecdata - I deal with code in different languages daily. Lisps have little difference when using with LLM just like any other compile&run language. But when you hook them up to a live Lisp REPL (which I admit requires some work) - it all gets very interesting.

nylonstrung 23 days ago

Even though Python code may use more characters/LoC than say Rust in text form, it's not necessarily more token dense because LLM tokenizers are good at "compressing" its English keywords

In contrast, langs with symbol-heavy syntax (ALP as extreme example) use fewer characters but don't tokenize well in practice so aren't as efficient as one would think