| A small additional note for context: I’m not arguing that “LLMs will replace browsing” in some absolute way — but it is observable that for many users, the entry point for information is shifting from search → assistant.
When you actually inspect how models consume real websites today, the results are pretty uneven: pages with clean HTML and predictable structure get parsed reliably JSON-LD is used surprisingly often (but only if it’s correct and minimal) heavy client-side rendering breaks extraction more than people expect semantic markup still beats any “AI-enabled” tool by a mile models hallucinate less when the source has clear hierarchy and meaning This project isn’t trying to reinvent SEO — it’s more like exploring the minimum structural guarantees that make an LLM treat a page as a trustworthy, cite-able source instead of ignoring it or misreading it. If anyone here has done experiments with: how GPT, Claude, Gemini, Llama, etc. read arbitrary web pages failure cases in parsing / hallucination caused by layout the effect of metadata vs full-text signal or even prompt strategies for web ingestion …I’d genuinely love to compare notes. |