Hacker News new | ask | show | jobs
by readitalready 44 days ago
I moved from Markdown to JSON for all spec writing about 9 months ago. Although not HTML, it still has the same benefits. Claude and the other models are just so much more reliable in a structured format like JSON/HTML/XML.

The most important thing is that I can run static analysis on a structured format. This is important even for my spec documents. I can write data fields and have static analysis analyze it. For example, to confirm database fields match across various spec documents, etc.. The static analysis is also why you use JSON/XML instead of HTML, since you can now have your own custom schema.

Also don't use YAML, as that's far more unreliable. (If you chop a YAML file in half, it's still valid)

2 comments

I think this is super interesting, but i think you and the OP is talking about two different problems: presenting text to end users and structuring text for agents
This flow is really for LLM consumption, since Markdown spec documents are for LLMs anyways. And you can always write a JSON-to-markdown converter for human use (actually, LLMs remember Markdown content better than JSON, so you should use that in your flow a well).

The real change is in generation side, and now the spec docs are LLM generated JSON based on other spec docs or human prompts. LLMs seem to write JSON better than Markdown or YAML, if you tell it to follow a schema.

For my schemas, I found LLMs really wanted to just use markdown embedded in the strings, so I've been considering doing away with the schema. I also figure that embedding markdown in a string may make it perform worse as it has to juggle nested formats, and thus escaping and such (wanted: eval for this). By replacing the json tool call with basic markdown extraction, I'd lose some structured data but gain flexibility (html would be even more flexible).

Wondering if you are referring to adherence to required data in a schema when you say LLMs do better with json vs markdown, or something else? Or perhaps to tool calls and/or strict json output being more reliably formatted for clean extraction?

you can embed JSON objects in a <script> tag if you need to.