| I built something similar using a variety of YouTube channels focused on NLP, AI, etc. The app is here https://huggingface.co/spaces/jamescalam/ask-youtube - you can ask things like "what is a transformer model?" or "what is semantic search?" The way I built it is documented here: https://www.pinecone.io/learn/openai-whisper/ Afaik it's the same approach as Riley, that is: - Scrape audio of YouTube videos - Transcribe to text with OpenAI's Whisper - Use sentence transformer to create embeddings of text - Index embeddings (with transcribed text, timestamps, and video URL attached) in Pinecone's vector database - Wrap up the querying functionality in a nice UI (this is for the search functionality) If wanting to replicate the Q&A part, I also built something similar and wrote about it (https://youtu.be/coaaSxys5so) - it's essentially the same process but we return text snippets to GPT-3 along with the original question and it generates an answer |
Typically you'd split the text in paragraph sized chunks to handle this requirement of sentence transformers, with GPT-3 embeddings you naturally have more flexibility there