| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dutchbookmaker 545 days ago
	Not trying to be ironic but it would be interesting to see what this below would look like in the strange mix form: "If the model's actions involve generating tokens (like in language models), then optimizing these token outputs to maximize reward could lead the model to develop a consistent, efficient way of using tokens that's specific to the problem domain. This might look like a DSL because the tokens are used in a structured, perhaps abbreviated or symbolic way that's efficient for the task, not necessarily human-readable but effective for the model's internal processing."