Hacker News new | ask | show | jobs
by diggan 350 days ago
> If you exceed the context window the remote LLM endpoint will throw you an error which you probably want to catch

Not every endpoint works the same way, I'm pretty sure LM Studio's OpenAI-compatible endpoints will silently (from the clients perspective) truncate the context, rather than throw an error. It's up to the client to make sure the context fits in those cases.

OpenAI's own endpoints do show an error and refuses if you exceed the context length though. I think I've seen others use the "finish_reason" attribute too to signal the context length was exceeded, rather than setting an error status code on the response.

Overall, even "OpenAI-compatible" endpoints often aren't 100% faithful reproductions of the OpenAI endpoints, sadly.

1 comments

That seems like terrible API design to just truncate without telling the caller. Anthropic, Google and OpenAI all will fail very loudly if you exceed the context window, and that's how it should be. But fair enough, this shouldn't happen anyway and the context should be actively handled before it blows up either way.
> That seems like terrible API design to just truncate without telling the caller

Agree, confused me a lot the first time I encountered it.

It would be great if implementations/endpoints could converge, but with OpenAI moving to the Responses API rather than ChatCompletion, yet the rest of the ecosystem seemingly still implementing ChatCompletion with various small differences (like how to do structured outputs), it feels like it's getting further away, not closer...

It's complicated, for example some models (o3) will throw an error if you set temperature.

What do you do if you want to support multiple models in your LLM gateway? Do you throw an error if a user sets temperature for o3, thus dumping the problem on them? Or just ignore it, but potentially creating confusion because temperature will seem to not work for some models?

I'm a big fan of fail early and fail loudly.
Me to, and I'm always battling with the LLM's obsession with lazily writing reams of ridiculously defensive code and masking errors in the code it generates and calls, instead of finding the root cause and solving that.

(Yes, I'm referring to the code LLMs generate, not the API for generating code itself, but "fail early and spectacularly" should apply to all code and apis.)

But you have to draw the line at failures that happen in the real world, or in code you can't control. I'm a huge fan of Dave Ackley's "Robust First" computing architecture, and his Moveable Feast Machine.

His "Robust First" philosophy is extremely relevant and has a lot of applications to programming with LLMs, not just hardware design.

Robust First | A conversation with Dave Ackley (T2 Tile Project) | Functionally Imperative Podcast

https://www.youtube.com/watch?v=Qvh1-Dmav34

Robust-first computing: Beyond efficiency

https://www.youtube.com/watch?v=7hwO8Q_TyCA

Bottom up engineering for robust-first computing

https://www.youtube.com/watch?v=y1y2BIAOwAY

Living Computation: Robust-first programming in ULAM

https://www.youtube.com/watch?v=I4flQ8XdvJM

https://news.ycombinator.com/item?id=22304063

DonHopkins on Feb 11, 2020 | parent | context | favorite | on: Growing Neural Cellular Automata: A Differentiable...

Also check out the "Moveable Feast Machine", Robust-first Computing, and this Distributed City Generation example:

https://news.ycombinator.com/item?id=21858577

DonHopkins on Oct 26, 2017 | parent | favorite | on: Cryptography with Cellular Automata (1985) [pdf]

A "Moveable Feast Machine" is a "Robust First" asynchronous distributed fault tolerant cellular-automata-like computer architecture. It's similar to a Cellular Automata, but it different in several important ways, for the sake of "Robust First Computing". These differences give some insight into what CA really are, and what their limitations are.

Cellular Automata are synchronous and deterministic, and can only modify the current cell: all cells are evaluated at once (so the evaluation order doesn't matter), so it's necessary to double buffer the "before" and "after" cells, and the rule can only change the value of the current (center) cell. Moveable Feast Machines are like asynchronous non-deterministic cellular automata with large windows that can modify adjacent cells.

Here's a great example with an amazing demo and explanation, and some stuff I posted about it earlier:

https://news.ycombinator.com/item?id=14236973

Robust-first Computing: Distributed City Generation:

https://www.youtube.com/watch?v=XkSXERxucPc