Hacker News new | ask | show | jobs
by blizzardman 820 days ago
Thank you

I am imagine something more complicated, like having a chatbot with the personality of the character in the book answering hypothetical questions. Like asking gandolf why he didn't send the eagles to drop the ring!!!! Or asking dumbledledore why don't he create horcrux himself and fight voldemore, since he was able to defeat him once at the ministry of magic.

So here is kind of my understanding of pre gpt3 like models like BERT

1. Bert or any sentence transformer models generate embedding on the entire book (search space)

2. You pipe in your query to the same model generate embedding (query)

3. you do ANN or bruteforce KNN (lsh, pq) on top of the search space embedding with your query, essentially finding dot product with lowest value

What I am having trouble understanding is using sentence transformer does not give you answer using the character of the book, but LLM does.

How do I build a chat app that do that? Do I just use openai api? Or can I train my own llm or use off the shelf llm like llama?

1 comments

There is an AI app for this already:

https://chatfai.com/characters/book

I haven't tried it, but if I had to guess how it is built they're probably just setting up RAG vector databases at a per-book level and then augmenting a given character's context window with information from the vector database relevant to the conversation.

It would be relatively trivial (weekend project) to roll your own using streamlit + quant/pgvector + ggerganov llama.cpp and a suitable model such as Vicuna/Mistral/etc. Hardest part would be separating an entire book into a well representated set of embeddings.

Thank you for the recommendation. RAG would make sense Here is my understanding of how to do it

1. Use sentence transformer, transform the entire harry potter or lord of the ring book into embeddings

2. transform query into embeddings -> "why don't gandolf sent the eagles"

3. Find most relevant text using ANN through the query embeddings

4. pipe in the context + query to llama

However the result is not very good, am I missing something in RAG?

Make sure you're using a SOTA embedding model (UAE, embedding-ada-002, etc) that is capable of creating a vector from a reasonably large token size, see here for comparisons: https://huggingface.co/spaces/mteb/leaderboard

Experiment with a "sliding scale" around the book (paragraphs, pages, etc). Try to use a graph to relate book sections, etc.

Consider setting up a tuner with well defined questions and answers to search for optimality around embeddings.