Hacker News new | ask | show | jobs
by klodolph 269 days ago
This does not make any sense to me. Can you elaborate on this?

It seems “obvious” to me that if you have a tool which can request a web page, you can make it so that this tool extracts the main content from the page’s HTML. Maybe there is something I’m missing here that makes this more difficult for LLMs, because before we had LLMs, this was considered an easy problem. It is surprising to me that the addition of LLMs has made this previously easy, efficient solution somehow unviable or inefficient.

I think we should also assume here that the web site is designed to be scraped this way—if you don’t, then “Accept: text/markdown” won’t work.

1 comments

If you have a website and you're optimizing it for GEO, you can't assume that the agents are going to have the glue. So as the person maintaining the website you implement as much of the glue as possible.
That sounds completely backwards. It seems, again, obvious to me that it would be easier to add HTML->markdown converters to agents, given that there are orders of magnitude more websites out there compared to agent.

If your agent sucks so bad that it isn’t capable of consuming HTML without tokenizing the whole damn thing, wouldn’t you just use an agent that isn’t such a mess?

This whole thing kinda sounds crazy inefficient to me.