|
|
|
|
|
by Fade_Dance
321 days ago
|
|
Agreed, and big context windows are key to mass adoption in wider use cases beyond chatbots (random ex: in knowledge management apps, being able to parse the entire note library/section and hook it into global AI search), but those use cases are decidedly not areas where $200 per month subscriptions can work. I'll hazard to say that cost and context windows are the two key metrics to bridge that chasm with acceptable results.... As for software engineering though, that cohort will be demanding on all front for the foreseeable future, especially because there's a bit of a competitive element. Nobody wants to be the vibecoder using sub-par tools compared to everyone else showing off their GitHub results and making sexy blog posts about it on HN. |
|
For example, a chat bot doing recipe work should have a RAG DB that, by default, returns entire recipes. A vector DB is actually not the solution here, any number of traditional DBs (relational or even a document store) would work fine. Sure do a vector search across the recipe texts, but then fetch the entire recipe from someplace else. Current RAG solutions can do this, but the majority of RAG deployments I have seen don't bother, they just abuse large context windows.
Which looks like it works, except what you actually have in your context window is 15 different recipes all stitched together. Or if you put an entire recipe book into the context (which is perfectly doable now days!), you'll end up with the chatbot mixing up ingredients and proportions between recipes because you just voluntarily polluted its context with irrelevant info.
Large context windows allow for sloppy practices that end up making for worse results. Kind of like when we decided web servers needed 16 cores and gigs of RAM to run IBM Websphere back in the early 2000s, to serve up mostly static pages. The availability of massive servers taught bad habits (huge complicated XML deployment and configuration files, oodles of processes communicating with each other to serve a single page, etc).
Meanwhile in the modern world I've ran mission critical high throughput services for giant companies on a K8 cluster consisting of 3 machines each with .25 CPU and a couple hundred megs of RAM allocated.
Sometimes more is worse.