Hacker News new | ask | show | jobs
by appenz 1148 days ago
Chat history may work, it depends on how long it is and the business model.

I don't quite understand how general summarization would work. If you use an LLM to simply to summarize in order to feed it into a prompt, the summarization needs to be specific to the query. i.e. "summarize what this text says about topic X". You can't summarize long text in a generic way without losing information. Or do I misunderstand the comment?

If you have a perfect table of context (or better, an index by topic) you may not need semantic search. But for the typical use case we are seeing you have unstructured data without an index (e.g. tech support knowledge db entries, company reports, emails). For that, semantic search work quite well.

For the sizes, the observation is that the data that people want to search over (e.g. your email, a wiki, JIRA, a knowledge base) is far larger than the context length. You are correct that we assume that inference cost and speed won't decrease sufficiently quickly in the near future. Why is a longer topic, but in a nutshell GPU speed increase is ~2.5x gen/gen and other than overtraining vs. Chinchilla we don't see immediate model gains. But that is speculative, we don't know what's in store.

To some degree we are just reacting to user adoption in the market. We don't build these systems, but if we see enough of them eventually we recognize the pattern. And while I am optimistic, we could be wrong. AI is major revolution and we are all students.

edit: disclaimer, I work for a16z.

1 comments

Yeah, everything here seems basically reasonable, I'd quibble with a couple things but it's debatable. And we might be talking past each other a little bit on use cases. Anyway, it's a fun space.

edit: To me this is a better summary of what a vector db is useful for: https://cloud.google.com/blog/topics/developers-practitioner...

And if someone is building a chat interface which is effectively a search product then they are going to find these things useful. But it's not a generic LLM memory layer or something.