Hacker News new | ask | show | jobs
by Ozzie_osman 1212 days ago
It's more like "prompt augmentation" or "prompt orchestration". Classic example is doing Q&A over a corpus. You can't feed the entire corpus into a GPT3 prompt. So you embed snippets of the corpus on vector space, then when you get a query, you vectorize that and find the nearest neighbor snippets, then send the question and snippets into GPT3 to answer the question (with those snippets as context).

OP's example is a little different, because he's not even using Gpt3 completions, he's just using their embeddings API to vectorize product names, then when he gets a new product name, he maps it into the space to find the nearest product names.

2 comments

Wouldn't this approach be quite brittle? For example, where would one define snippet boundaries - isn't it possible that extracting a snippet at arbitrary points may change the information within that snippet?

But then you have the issue of GPT3 token limits, so you're limited in how many of these relevant snippets you can embed into a prompt. Wondering if there's a better way to go about this (for your first example, rather than OPs use case).

It works surprisingly well and you can see examples if you look up the documentation of GPT-Index or Langchain (both are libraries designed to enabled these type of use-cases, among others). Also, you can get fancy, for instance, you can have GPT3 (or any LLM) create multiple "layers" of snippets (for instance, you can have snippets of the actual text, then summaries of a section, then summaries of a chapter, and embed all those and pull in the relevant pieces). Or, you can go back-and-forth with the prompt multiple times to give/get more information.

I'm sure the techniques will evolve over time, but for now, these sorts of patterns (pre-index, then augmenting the prompt at query-time) seem to work best for feeding information/context into the model that it doesn't know about. The other broad family of techniques is around trying to train the model with your custom information ("fine-tuning", etc), but I think most practitioners will agree that's currently less effective for these sorts of use-cases. (Disclaimer: I'm not an expert by any means, but I've played around with both techniques and try to keep up-to-date on what the experts are saying).

Excited to see what comes of it. Lots of people will have a private corpus, and the idea that we can semantically query it sounds so interesting.

Like asking 'what streaming services am I paying for and how much have I spent on them to date?', and some tool going over your bank statements to pick out spotify, netflix etc. I could see being useful.

https://simonwillison.net/2023/Jan/13/semantic-search-answer...

IMO, prompt engineering is a good umbrella term for all of these kinds of augmentations!