Hacker News new | ask | show | jobs
by maheshvaikri99 181 days ago
Fair point, but I'd push back on "none of these alternative formats exist in training data."

ISON isn't inventing new syntax. It's CSV/TSV with a header - which LLMs have seen billions of times. The table format:

table.users id name email 1 Alice alice@example.com

...is structurally identical to markdown tables and CSVs that dominate training corpora.

On the "3x translation overhead" - ISON isn't meant for LLM-to-code interfaces where you need JSON for an API call. It's for context stuffing: RAG results, memory retrieval, multi-agent state passing.

If I'm injecting 50 user records into context for an LLM to reason over, I never convert back to JSON. The LLM reads ISON directly, reasons over it, and responds.

The benchmark: same data, same prompt, same task. ISON uses fewer tokens and gets equivalent accuracy. Happy to share the test cases if you want to verify.

1 comments

That's exactly the problem. Why convert anything, especially if it's as lossy as CSVs are? You lose nesting and the rest of your structure in favor of a single header row. That's not a benefit.

If your real data is in JSON (and in JS/TS apps, it always is at runtime as only JSON objects exist in that language) it makes no sense to ever convert it, period.

Besides, corporate report type CSVs that are in training materials don't have data shapes anything like JSON or even most businesses software. You're crippling an established and useful data carrier in order to save pennies on tokens. Tokens are getting cheaper, so it's the wrong optimization.

Fair enough. Let me clarify the use case:

ISON isn't meant to replace JSON in your application. Your JS/TS code still uses JSON objects internally. ISON is specifically for the LLM context window.

The flow: App (JSON) → serialize to ISON → inject into prompt → LLM reasons → response → your app

You're right that nesting is lost. But for LLM reasoning, flat structures often work better. LLMs struggle with deeply nested JSON - they lose track of parent-child relationships 4+ levels deep.

On "tokens are getting cheaper": True for API costs. But context windows are still limited. When you're stuffing RAG results, memory, agent state, and user history into 128K tokens, every byte matters. It's not about saving money - it's about fitting more context.

On "wrong optimization": I ran the benchmark. Same data, same task. ISON: 88.3% accuracy. JSON: 84.7%. The LLM actually performed better with the tabular format, not just "equivalent for fewer tokens."

## BENCHMARK STATS:

TOKEN EFFICIENCY: ISON: 3,550 tokens JSON: 12,668 tokens

  ISON vs JSON:        72.0% reduction
LLM ACCURACY (300 Questions): ISON: 265/300 ( 88.3%) JSON: 254/300 ( 84.7%)

EFFICIENCY (Acc/1K): ISON: 24.88 JSON: 6.68 ISON is 272.3% MORE EFFICIENT than JSON!

But I hear you - if your data is deeply nested and that nesting carries semantic meaning the LLM needs, JSON might be the right choice. ISON works best for relational/tabular data going into context.