|
|
|
|
|
by dutchbookmaker
497 days ago
|
|
Not trying to be ironic but it would be interesting to see what this below would look like in the strange mix form: "If the model's actions involve generating tokens (like in language models), then optimizing these token outputs to maximize reward could lead the model to develop a consistent, efficient way of using tokens that's specific to the problem domain. This might look like a DSL because the tokens are used in a structured, perhaps abbreviated or symbolic way that's efficient for the task, not necessarily human-readable but effective for the model's internal processing." |
|