Hacker News new | ask | show | jobs
by danielbln 350 days ago
That seems like terrible API design to just truncate without telling the caller. Anthropic, Google and OpenAI all will fail very loudly if you exceed the context window, and that's how it should be. But fair enough, this shouldn't happen anyway and the context should be actively handled before it blows up either way.
2 comments

> That seems like terrible API design to just truncate without telling the caller

Agree, confused me a lot the first time I encountered it.

It would be great if implementations/endpoints could converge, but with OpenAI moving to the Responses API rather than ChatCompletion, yet the rest of the ecosystem seemingly still implementing ChatCompletion with various small differences (like how to do structured outputs), it feels like it's getting further away, not closer...

It's complicated, for example some models (o3) will throw an error if you set temperature.

What do you do if you want to support multiple models in your LLM gateway? Do you throw an error if a user sets temperature for o3, thus dumping the problem on them? Or just ignore it, but potentially creating confusion because temperature will seem to not work for some models?

I'm a big fan of fail early and fail loudly.
Me to, and I'm always battling with the LLM's obsession with lazily writing reams of ridiculously defensive code and masking errors in the code it generates and calls, instead of finding the root cause and solving that.

(Yes, I'm referring to the code LLMs generate, not the API for generating code itself, but "fail early and spectacularly" should apply to all code and apis.)

But you have to draw the line at failures that happen in the real world, or in code you can't control. I'm a huge fan of Dave Ackley's "Robust First" computing architecture, and his Moveable Feast Machine.

His "Robust First" philosophy is extremely relevant and has a lot of applications to programming with LLMs, not just hardware design.

Robust First | A conversation with Dave Ackley (T2 Tile Project) | Functionally Imperative Podcast

https://www.youtube.com/watch?v=Qvh1-Dmav34

Robust-first computing: Beyond efficiency

https://www.youtube.com/watch?v=7hwO8Q_TyCA

Bottom up engineering for robust-first computing

https://www.youtube.com/watch?v=y1y2BIAOwAY

Living Computation: Robust-first programming in ULAM

https://www.youtube.com/watch?v=I4flQ8XdvJM

https://news.ycombinator.com/item?id=22304063

DonHopkins on Feb 11, 2020 | parent | context | favorite | on: Growing Neural Cellular Automata: A Differentiable...

Also check out the "Moveable Feast Machine", Robust-first Computing, and this Distributed City Generation example:

https://news.ycombinator.com/item?id=21858577

DonHopkins on Oct 26, 2017 | parent | favorite | on: Cryptography with Cellular Automata (1985) [pdf]

A "Moveable Feast Machine" is a "Robust First" asynchronous distributed fault tolerant cellular-automata-like computer architecture. It's similar to a Cellular Automata, but it different in several important ways, for the sake of "Robust First Computing". These differences give some insight into what CA really are, and what their limitations are.

Cellular Automata are synchronous and deterministic, and can only modify the current cell: all cells are evaluated at once (so the evaluation order doesn't matter), so it's necessary to double buffer the "before" and "after" cells, and the rule can only change the value of the current (center) cell. Moveable Feast Machines are like asynchronous non-deterministic cellular automata with large windows that can modify adjacent cells.

Here's a great example with an amazing demo and explanation, and some stuff I posted about it earlier:

https://news.ycombinator.com/item?id=14236973

Robust-first Computing: Distributed City Generation:

https://www.youtube.com/watch?v=XkSXERxucPc