Hacker News new | ask | show | jobs
Show HN: Scribbler – Podcast Summaries Using GPT (app.scribbler.so)
91 points by _fill 1104 days ago
Hey, we're Phil and Ian, the founders of Scribbler.

We're huge podcast fans, but found we never had enough time to soak it all in. So, we built Scribbler - a tool that leverages GPT to condense podcast episodes into bite-sized summaries for when life's too busy.

Now, we can catch the best bits from any episode, discover new shows, and best of all, stop wasting valuable time figuring out what's worth listening to and what's not. We hope you'll find it useful!

11 comments

You've basically created a really great show notes generator. Kudos.

What I'd really value is a podcast powered GPT chatbot, or at the very least, a very good search engine.

Podcasts like Peter Attia's or Paul Saladino's contain so much good knowledge on human biology in the context of nutrition, but it's buried in longform conversations. I often wish I could find a "soundbyte", or in this case, a textbyte. Paul has had guests perfectly articulate the top 10 functions of insulin, or pose perfect explanations for the value of saturated fat and its demonization via the sugar industry. Hell, there is a plethora of knowledge around basic salt that you don't find very easily in Google.

Being able to search for or rapidly recall things like this would be so useful.

I built a proof of concept [0] of exactly this because I listen to tons of podcasts but then fetching back that info is a pain. I left it at that because I saw there were plenty of other efforts doing something similar (e.g. [1])

[0] https://youtu.be/Q6G2m4xw3E4

[1] https://podsmart-frontend.vercel.app/

You should try https://dexa.ai/ (they're supporting a limited number of podcasts though)
Hey Solvency, Thanks for the feedback. We definitely have it on our road map to do more rigorous search along with question & answer.
That's kind of what's missing from all these summarisation services - what are the actual bits of content that are worth listening to amongst all the padding?
Can you describe in the UI if there's anything that would add?
I know the podcast app Snipd (https://www.snipd.com/) offers some similar things. It will build chapters with AI generated summaries of each chapter.

Not sure if they are leveraging that for search and discovery or not but it looks like they do (https://www.snipd.com/podcasters)

A user can request it for any podcast and it doesn't seem to take that long the times I have tried it.

Hey we've actually checked out snipd as well. We're excited that there are others that are also competing in this space. What are some of the features you like that we are lacking?
Thanks for having Latent Space on it! :) I noticed we have two entries (because we changed the full name of the podcast) and some episodes are there twice if we tweaked the title. Are you pulling from our RSS feed?

Also curious about what models you are using on the backend; we use Claude 100k to do timestamps generation for show notes and whisper-diarization [0] for transcription. The main post for each episode is manually written though, as we try to write a higher level summary of the episode + topics in it.

[0] https://replicate.com/thomasmol/whisper-diarization

Hey FanaHOVA, big fan! We were syncing from multiple sources and will be consolidating that in the near future (most likely via RSS).

We're using whisper and GPT-3.5 with the new 16K context. Eager to hear more feedback from you. Feel free to follow up with us at ian@scribbler.so & phil@scribbler.so

Very good product! I’ve been using it to catch up on certain podcasts. But was just wondering if it might also work for some things like transcribing certain hearings like CPI announcement videos or statements made by politicians?

Can see that this is a transcriber for everything that is a generic media.

Do you have specific links / sources you had in mind? It's certainly apart of our roadmap to include other medium.
Checked out and it works great! Just thinking if it would be even better if there’s some way to make all that knowledge searchable, or to find some way to help users discover podcast summaries that they might be into?
To clarify search here to you mean search within the podcast or within the summary? We're also looking into better recommendations.
I like this idea! Thanks for sharing.

Will I be able to point it at any podcast? The ones I saw look interesting but are not what I normally listen to.

I assume you can take any audio sample (say, a monologue) and generate a summary of it. I wonder if students would do this with their lectures.

Yes you can point it to "any" podcast for now. At this time, you will have to purchase a membership however we're thinking about allowing users with "registered" accounts to also be able to as well.
We are so close to AI-powered podcast sponsorblock I can taste it.
Maintainer of sponsorblock said they don't think it'll work https://github.com/ajayyy/SponsorBlock/issues/1766 but there's https://github.com/xenova/sponsorblock-ml.
I was referring to video, because you need to take into account visuals too. It would be simpler when it is audio-only.
I hope this never happens. Sponsors in the middle of podcasts, that are not targeted, are totally fine. You want people to produce something for you and consume it for free? If you don't want ads pay for it.
I don't even fast forward through ads that aren't obnoxious. When it starts with three solid minutes of ads, then has a block of ads in the middle, then more at the end for a half-hour show, I'm either skipping or not listening.

If you don't want ads pay for it.

I do. Funny enough a lot of creators seem to be too lazy to edit all the ads out of their "ad-free" feeds. In one instance a creator I backed had more ads in their ad-free premium feed than I had in my original downloads of the show from when they got started. Screw that.

Not the one and only way for society to organize
Very cool! I'm curious--I'd imagine that some long tail podcasts have transcripts that are too long to fit within a standard context window. Do you have some strategy for handling these?
There are a few strategies in use today. All involve splitting the content to be summarized into chunks smaller than the context size, summarizing each, and building a full final summary from there (potentially in multiple steps).

I wouldn’t necessarily recommend _using_ LangChain, but their summarization docs might be of interest: https://python.langchain.com/en/latest/modules/chains/index_...

What would you use besides LangChain?
I’ve found it preferable to build directly on top of OpenAI’s API. (I’ve also written a simple API wrapper for llama.cpp hosted LLMs.) Over time I’ve built a small library of utilities, including for summarization. It’s not that much code.

I don’t know if this is a spicy or a generally-agreed-upon take: my feeling is that, while LangChain was useful in that it helped the community codify some early intuitions about LLM invocation patterns, it’s basically a grab bag of partially complete somewhat disconnected utilities. It nods to composability but, in practice, its pieces often don’t fit together. On the Python side, it suffers from poor typing: when creating a chain, it’s often impossible to know what the full set of configuration options is without digging deep into LangChain’s code. It’s catch-as-can whether you can deeply configure specific sub-aspects of a chain.

There are other things I want in my own code at the moment, including keeping track of how many input/output tokens each of my actions takes, etc.

I dunno, maybe I’m the only one here. Curious what others think.

At the moment we're still using langchain but it is quite cumbersome in the long run. The library is developing quickly and a feature that you might expect to work one week might not the next. Have you had better luck with others?
I am also interested in the answer to this
Not OP but I have seen several use cases where first summarising parts and then summarising the summaries have been used.
All the strategies below were ones we tried. You can check it out!
FWIW, I’ve tried to sign up and the email verification code has not been received despite having it resent multiple times and checked spam and confirmed the email address.
Hey Voisin, sorry to here that you're having issues. Do you mind joining our discord and following up in help: https://discord.gg/9s8GNYSM otherwise feel free to email phil@scribbler.so or ian@scribbler.so
This looks great. How does your stack look like ?
nice one ! thanks for sharing