The markup in this case would be some type of semantic web format, like JSON-LD[0] or OWL[1], or some database that can process SPARQL[2] queries. Goal being the "inverse" of something like OWL2Vec[2].
EDIT1: A few weeks ago, a team of Brazilian researchers published a report about using ChatGPT to enhance agriculture-focused OWL dataset[4].
EDIT2: In addition to training LLMs on ontologies, it looks like Palantir is using ontologies as guardrails to prevent LLM hallucinations[5]. Makes sense.