Hacker News new | ask | show | jobs
by all2 62 days ago
I wonder if stuffing tool call formatting into an engram layer (see Deepseek's engram paper) that could be swapped at runtime would be a useful solution here.

The idea would be to encode tool calling semantics once on a single layer, and inject as-needed. Harness providers could then give users their bespoke tool calling layer that is injected at model load-time.

Dunno, seems like it might work. I think most open source models can have an engram layer injected (some testing would be required to see where the layer best fits).

1 comments

The engram idea is actually technically clever but imo sees the solution from a bottom-up approach while Louf's real argument is a top-down view. His solution (declarative specs) solves that by centralizing the spec, making it versioned and composable, independent of any actual model.

Engram layers just move the coordination problem earlier and lock it in. Coordination problems between models & providers would still exist, requiring a layer injection in each open source model and another variant produced for each. Users would still need to chose between "Qwen-8b" and "Qwen-8b-engram" x model families and sizes. Is that cleaner?

Fair point. I don't know if it is cleaner or not.

The issue with a top-level spec, that I can see, is that models fall back to their training when it comes to tools. This is why I recommended the engram approach, because as far as I can tell the problem is a model problem not a systems problem.