| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dtagames 410 days ago

Text, whether "semantic" or not, just gets tokenized and stored as weighted numbers in a model. It looses all its "semantic-ness."

So I would say the opposite is true. AI tools are removing the need for special declarative wrappers around a lot of text. For example, there's no need to surround a headline with <H1> when you can ask a GPT to "get the headlines from all these articles."

There are a couple kinds of wrapping that do help working with LLMs. That's markdown in prompts and JSON and XML in system instructions for MCP. But RAG refers to the non-LLM end of the process, getting data from files or a database, so the style of training data doesn't directly affect how that works.