| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dtagames 409 days ago
	They do build those relationships, and by being trained on large, general data sets rather than specialized ones. There's no need for special markup to achieve that.

1 comments

evanjrowley 408 days ago

Right. So I think the potential value here is not using a special markup to enable LLMs, but leveraging LLMs to build the special markup so that it can be applied towards other uses.

link

dtagames 408 days ago

I guess. If you really wanted something that took some text and wrapped it in markup, you can ask for that already with ChatGPT. It's easy to say, "lookup the top selling DSLR cameras and make me a list of their features in JSON."

I can't see anyone making a special tool just for that as a general use case. Everybody wanting such output would want their own format for the markup, and we're right back to XML.

Today, it's simpler to use RAG, which is just using the LLM to figure out the "English" part, then using regular (procedural, normal programming code) tools to put things in boxes, or data storage, (or markdown), etc. If you really want consistent output, you can't have the LLM generate it. You would need to RAG that output.

link

evanjrowley 408 days ago

The markup in this case would be some type of semantic web format, like JSON-LD[0] or OWL[1], or some database that can process SPARQL[2] queries. Goal being the "inverse" of something like OWL2Vec[2].

EDIT1: A few weeks ago, a team of Brazilian researchers published a report about using ChatGPT to enhance agriculture-focused OWL dataset[4].

EDIT2: In addition to training LLMs on ontologies, it looks like Palantir is using ontologies as guardrails to prevent LLM hallucinations[5]. Makes sense.

[0] https://json-ld.org/

[1] https://www.w3.org/TR/owl2-syntax/

[2] https://www.w3.org/TR/sparql11-query/

[3] https://arxiv.org/abs/2009.14654

[4] https://arxiv.org/abs/2504.18651

[5] https://blog.palantir.com/reducing-hallucinations-with-the-o...

link