|
|
|
|
|
by all2
62 days ago
|
|
I wonder if stuffing tool call formatting into an engram layer (see Deepseek's engram paper) that could be swapped at runtime would be a useful solution here. The idea would be to encode tool calling semantics once on a single layer, and inject as-needed. Harness providers could then give users their bespoke tool calling layer that is injected at model load-time. Dunno, seems like it might work. I think most open source models can have an engram layer injected (some testing would be required to see where the layer best fits). |
|
Engram layers just move the coordination problem earlier and lock it in. Coordination problems between models & providers would still exist, requiring a layer injection in each open source model and another variant produced for each. Users would still need to chose between "Qwen-8b" and "Qwen-8b-engram" x model families and sizes. Is that cleaner?