Hacker News new | ask | show | jobs
by jamesbriggs 1275 days ago
I built something similar using a variety of YouTube channels focused on NLP, AI, etc. The app is here https://huggingface.co/spaces/jamescalam/ask-youtube - you can ask things like "what is a transformer model?" or "what is semantic search?"

The way I built it is documented here: https://www.pinecone.io/learn/openai-whisper/

Afaik it's the same approach as Riley, that is:

- Scrape audio of YouTube videos

- Transcribe to text with OpenAI's Whisper

- Use sentence transformer to create embeddings of text

- Index embeddings (with transcribed text, timestamps, and video URL attached) in Pinecone's vector database

- Wrap up the querying functionality in a nice UI

(this is for the search functionality)

If wanting to replicate the Q&A part, I also built something similar and wrote about it (https://youtu.be/coaaSxys5so) - it's essentially the same process but we return text snippets to GPT-3 along with the original question and it generates an answer

2 comments

I should add, Riley used the ada embedding model (rather than sentence transformers). Performance wise they should be similar (in ability to encode meaning accurately) but the ada model can encode a much larger chunk of text. I don't know exact numbers but something like 1-2 pages of text in a typical corporate PDF. Whereas sentence transformers are typically limited to around a paragraph of text.

Typically you'd split the text in paragraph sized chunks to handle this requirement of sentence transformers, with GPT-3 embeddings you naturally have more flexibility there

Thank you :)