| Fair point, but I'd push back on "none of these alternative formats exist in training data." ISON isn't inventing new syntax. It's CSV/TSV with a header - which LLMs have seen billions of times. The table format: table.users
id name email
1 Alice alice@example.com ...is structurally identical to markdown tables and CSVs that dominate training corpora. On the "3x translation overhead" - ISON isn't meant for LLM-to-code interfaces where you need JSON for an API call. It's for context stuffing: RAG results, memory retrieval, multi-agent state passing. If I'm injecting 50 user records into context for an LLM to reason over, I never convert back to JSON. The LLM reads ISON directly, reasons over it, and responds. The benchmark: same data, same prompt, same task. ISON uses fewer tokens and gets equivalent accuracy. Happy to share the test cases if you want to verify. |
If your real data is in JSON (and in JS/TS apps, it always is at runtime as only JSON objects exist in that language) it makes no sense to ever convert it, period.
Besides, corporate report type CSVs that are in training materials don't have data shapes anything like JSON or even most businesses software. You're crippling an established and useful data carrier in order to save pennies on tokens. Tokens are getting cheaper, so it's the wrong optimization.