Hacker News new | ask | show | jobs
by amiantos 783 days ago
I think some commenters are missing the point of this post by pointing out an LLM is the wrong tool for the Harry Potter task, because that's literally the point of the post, that people are actively trying to use LLMs in this way, right now. A lot of hucksters and former/current sales people, specifically.

The reason long context windows are advertised is because there are an awful lot of people out there trying to make money replacing customer service agents with LLM-powered chatbots. In order to power them naively (which is the only way sales people who have never built software before know how), you need to feed them a context window full of all your industry/product specific knowledge and then hope that the LLM answers the same way.

But, they don't, and they can't, so you have to spend a lot of time trying to figure out how to tie the LLM down so it responds exactly the way you want, which sort of ruins all the hype and mystery for common people around LLMs. It's been sold as a miracle that can replace people, but the truth is, well, not that, which really hampers the sales process. I think we're seeing this with Elon Musk desperately trying to push people into "FSD". People aren't impressed by AIs that aren't as good as people, if not better than people, at doing whatever task they are supposed to do.

3 comments

I've made AI assistants that are perfectly accurate with products, pricing, etc. yet still maintain a human quality: https://github.com/bennyschmidt/ragdoll-studio/tree/master/e...

You can accomplish this with RAG.

Your overall point is taken though, the LLM itself is not enough, fine-tuning is not always feasible, and I think no matter how good an AI persona gets at, say, teaching yoga - for some yoga students it will never replace an in-person instructor.

However for a game NPC, online agent, Discord bot, etc. not to mention research, translation, tutorials, summarizing, etc. there is a lot of present day utility for LLMs.

I find that the level of hype is very correlated to how much the author is involved in making his own LLM chatbot based products, which is not surprising.

Interesting idea with ragdoll, but I'd hate to try to compete with Tavern/Pygmalion cards + Lorebooks, seems like there is a critical mass there already for RP chatbots.

I could do a better job with messaging, but Ragdoll isn't RP chatbots, although you can chat with them to test if they know how to speak correctly. You could form friendships with them I suppose if you want. But the main purpose is for creative deliverables: Concept art, music, videos, copy, voiceovers, SFX, etc. It's like you're building an AI cast and crew for your: Story, game, video, or to commission as an artist or musician.

There is a chat mode where the UI looks like a chat (communication is key with these guys), but there are also views like Picture mode where you paste images to create concept art or upload a still to generate a movie from - so it's more like a creative studio with different views (one of which is chat).

The best part of it all is you're like the boss or conductor, directing a staff of 1 or 2 or hundreds of AI personas all with specific knowledge and abilities to create whatever you need.

We're focused on the bad example because it's literally the title of the article and the model's inability to solve that problem has nothing to do with context windows and everything to do with "when all you have is a hammer".

It doesn't matter if the context window is large or small, the Harry Potter Problem as formulated is going to be just as hard because it's not a problem with false advertising in context window sizes, it's a problem inherent to the computing paradigm.

A version of the Harry Potter Problem that was formulated around a model's ability to recall specific scenes of a novel would be much more useful as an illustration of the limitations of the supposedly-large context windows.

Well the same principle of false advertising re: context window sizes also applies to its inability to count, no? AI companies claim that their models can do math, so wouldn't a regular developer assume that they can also count?

And if I can't trust a so-called SOTA model to partially answer - say, recall each mention of the word "wizard" instead of just giving me the wrong answer - then why should I trust it to list out specific scenes? That's even harder to benchmark.

> It's been sold as a miracle that can replace people, but the truth is, well, not that, which really hampers the sales process

How soon until the hype fades and we enjoy our next AI winter?