Hacker News new | ask | show | jobs
by throwaway314155 442 days ago
> with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc

It seems strange to me to focus on this sort of standard well in advance of models being reliable enough to, ya know, actually be able perform these operations on behalf of the user with any sort of strong reliability that you would need for widespread adoption to be successful.

Cryptocurrency "if you build it they'll come" vibes.

2 comments

I think MCPs compensate for the unreliability issue by providing a minimal and well defined interface to a controlled set of actions. That way, the llm doesn't have to be as reliable thinking what it needs to do and in acting, just in choosing what to do from a short list.
You can provide an MCP for Pokemon Red, but Claude will still flounder for weeks, making absurd mistakes on a game literally designed for children.

Believe me. It's not there yet.

Is there an MCP for pokemon red?
Not that im aware of, but that actually would be an interesting project.

I was referring more broadly to ClaudePlaysPokemon, a twitch stream where claude is given tool calling into a Gameboy Color emulator in order to try to play Pokemon. It has slowly made progress and i recommend looking at the stream to see just how flawed LLM's are currently for even the shortest of timelines w.r.t. planning.

I compared the two because the tool calling API here is a similar enough to an MCP configuration with the same hooks/tools (happy to be corrected on that though)

The speed that every major LLM foundational model provider has jumped on this bandwagon feels VERY artificial and astro turfy...
Maybe because the LLM improvements haven't been that good in the last year, they needed some new thing to hype it/market it.

EDIT: Don't get me wrong, the benchmark scores are indeed higher, but in my personal experience, LLMs make as many mistakes as they did before, still too unreliable to use for cases where you actually need a factually correct answer.

This is in my opinion exactly what it is. A bunch of people throwing stuff at the wall trying to show "impact."