| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vessenes 239 days ago
	I’ll be interested to see benchmarks. My expectation is that accuracy will take a hit on mid or longer context prompts: I’d bet that the heavy use of JSON in fine tuning will end up impacting quality of a more terse (less reasoning space) novel encoding. That said: I like the idea!

3 comments

mattcollins 238 days ago

FWIW, I ran a test comparing LLM accuracy with TOON versus JSON, CSV and a variety of other formats when using them to represent tabular data: https://www.improvingagents.com/blog/is-toon-good-for-table-...

I've only looked at one model (gpt-4.1-nano) so far. I'm hoping to run similar tests on some other models but it gets challenging to discern statistically significant differences with better models as their accuracy tends to be a lot better across the board.

link

mattcollins 237 days ago

Results from some further tests here: https://www.improvingagents.com/blog/toon-benchmarks

link

brian-bk 239 days ago

There are a very light benchmarks in the Readme, or are you looking for more?

link

Mumps 239 days ago

Do you mean the [0] Token Benchmarks section? I only see token count numbers.

Which doesn't address the question: do LLMs understand TOON the same as they would JSON? It's quite likely that this notation is not interpreted the same by most LLM, as they would JSON. So benchmarks on, say, data processing tasks, would be warranted.

[0] https://github.com/johannschopplich/toon?tab=readme-ov-file#...

link

tujux 239 days ago

I think they're talking about these sections:

1. Retrieval Accuracy - https://github.com/johannschopplich/toon?tab=readme-ov-file#...

2. Performance by dataset - https://github.com/johannschopplich/toon?tab=readme-ov-file#...

link

saretup 238 days ago

I would assume the next iterations/fine-tuned variants of current models would reach similar accuracy for TOON as they do for JSON.

The current models unfortunately do not have TOON in their training set, so they would probably require additional input tokens to grok the notation, and even then probably won’t have the same accuracy as they do for JSON.

link