Show HN: Scribbler – Podcast Summaries Using GPT | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

Show HN: Scribbler – Podcast Summaries Using GPT (app.scribbler.so)

91 points by _fill 1104 days ago

Hey, we're Phil and Ian, the founders of Scribbler.

We're huge podcast fans, but found we never had enough time to soak it all in. So, we built Scribbler - a tool that leverages GPT to condense podcast episodes into bite-sized summaries for when life's too busy.

Now, we can catch the best bits from any episode, discover new shows, and best of all, stop wasting valuable time figuring out what's worth listening to and what's not. We hope you'll find it useful!

11 comments

Solvency 1104 days ago

You've basically created a really great show notes generator. Kudos.

What I'd really value is a podcast powered GPT chatbot, or at the very least, a very good search engine.

Podcasts like Peter Attia's or Paul Saladino's contain so much good knowledge on human biology in the context of nutrition, but it's buried in longform conversations. I often wish I could find a "soundbyte", or in this case, a textbyte. Paul has had guests perfectly articulate the top 10 functions of insulin, or pose perfect explanations for the value of saturated fat and its demonization via the sugar industry. Hell, there is a plethora of knowledge around basic salt that you don't find very easily in Google.

Being able to search for or rapidly recall things like this would be so useful.

huevosabio 1104 days ago

I built a proof of concept [0] of exactly this because I listen to tons of podcasts but then fetching back that info is a pain. I left it at that because I saw there were plenty of other efforts doing something similar (e.g. [1])

[0] https://youtu.be/Q6G2m4xw3E4

[1] https://podsmart-frontend.vercel.app/

saeedesmaili 1095 days ago

You should try https://dexa.ai/ (they're supporting a limited number of podcasts though)

_fill 1103 days ago

Hey Solvency, Thanks for the feedback. We definitely have it on our road map to do more rigorous search along with question & answer.

iamflimflam1 1104 days ago

That's kind of what's missing from all these summarisation services - what are the actual bits of content that are worth listening to amongst all the padding?

_fill 1103 days ago

Can you describe in the UI if there's anything that would add?

fils 1104 days ago

I know the podcast app Snipd (https://www.snipd.com/) offers some similar things. It will build chapters with AI generated summaries of each chapter.

Not sure if they are leveraging that for search and discovery or not but it looks like they do (https://www.snipd.com/podcasters)

A user can request it for any podcast and it doesn't seem to take that long the times I have tried it.

_fill 1103 days ago

Hey we've actually checked out snipd as well. We're excited that there are others that are also competing in this space. What are some of the features you like that we are lacking?

FanaHOVA 1104 days ago

Thanks for having Latent Space on it! :) I noticed we have two entries (because we changed the full name of the podcast) and some episodes are there twice if we tweaked the title. Are you pulling from our RSS feed?

Also curious about what models you are using on the backend; we use Claude 100k to do timestamps generation for show notes and whisper-diarization [0] for transcription. The main post for each episode is manually written though, as we try to write a higher level summary of the episode + topics in it.

[0] https://replicate.com/thomasmol/whisper-diarization

_fill 1103 days ago

Hey FanaHOVA, big fan! We were syncing from multiple sources and will be consolidating that in the near future (most likely via RSS).

We're using whisper and GPT-3.5 with the new 16K context. Eager to hear more feedback from you. Feel free to follow up with us at ian@scribbler.so & phil@scribbler.so

edison0xyz 1104 days ago

Very good product! I’ve been using it to catch up on certain podcasts. But was just wondering if it might also work for some things like transcribing certain hearings like CPI announcement videos or statements made by politicians?

Can see that this is a transcriber for everything that is a generic media.

_fill 1103 days ago

Do you have specific links / sources you had in mind? It's certainly apart of our roadmap to include other medium.

majimak 1104 days ago

Checked out and it works great! Just thinking if it would be even better if there’s some way to make all that knowledge searchable, or to find some way to help users discover podcast summaries that they might be into?

_fill 1103 days ago

To clarify search here to you mean search within the podcast or within the summary? We're also looking into better recommendations.

ramg 1104 days ago

I like this idea! Thanks for sharing.

Will I be able to point it at any podcast? The ones I saw look interesting but are not what I normally listen to.

I assume you can take any audio sample (say, a monologue) and generate a summary of it. I wonder if students would do this with their lectures.

_fill 1103 days ago

Yes you can point it to "any" podcast for now. At this time, you will have to purchase a membership however we're thinking about allowing users with "registered" accounts to also be able to as well.

causi 1104 days ago

We are so close to AI-powered podcast sponsorblock I can taste it.

dewey 1104 days ago

Maintainer of sponsorblock said they don't think it'll work https://github.com/ajayyy/SponsorBlock/issues/1766 but there's https://github.com/xenova/sponsorblock-ml.

ajayyy 1104 days ago

I was referring to video, because you need to take into account visuals too. It would be simpler when it is audio-only.

101008 1104 days ago

I hope this never happens. Sponsors in the middle of podcasts, that are not targeted, are totally fine. You want people to produce something for you and consume it for free? If you don't want ads pay for it.

causi 1104 days ago

I don't even fast forward through ads that aren't obnoxious. When it starts with three solid minutes of ads, then has a block of ads in the middle, then more at the end for a half-hour show, I'm either skipping or not listening.

If you don't want ads pay for it.

I do. Funny enough a lot of creators seem to be too lazy to edit all the ads out of their "ad-free" feeds. In one instance a creator I backed had more ads in their ad-free premium feed than I had in my original downloads of the show from when they got started. Screw that.

wahnfrieden 1104 days ago

Not the one and only way for society to organize

alexcannan 1104 days ago

Very cool! I'm curious--I'd imagine that some long tail podcasts have transcripts that are too long to fit within a standard context window. Do you have some strategy for handling these?

davepeck 1104 days ago

There are a few strategies in use today. All involve splitting the content to be summarized into chunks smaller than the context size, summarizing each, and building a full final summary from there (potentially in multiple steps).

I wouldn’t necessarily recommend _using_ LangChain, but their summarization docs might be of interest: https://python.langchain.com/en/latest/modules/chains/index_...

BigElephant 1104 days ago

What would you use besides LangChain?

davepeck 1104 days ago

I’ve found it preferable to build directly on top of OpenAI’s API. (I’ve also written a simple API wrapper for llama.cpp hosted LLMs.) Over time I’ve built a small library of utilities, including for summarization. It’s not that much code.

I don’t know if this is a spicy or a generally-agreed-upon take: my feeling is that, while LangChain was useful in that it helped the community codify some early intuitions about LLM invocation patterns, it’s basically a grab bag of partially complete somewhat disconnected utilities. It nods to composability but, in practice, its pieces often don’t fit together. On the Python side, it suffers from poor typing: when creating a chain, it’s often impossible to know what the full set of configuration options is without digging deep into LangChain’s code. It’s catch-as-can whether you can deeply configure specific sub-aspects of a chain.

There are other things I want in my own code at the moment, including keeping track of how many input/output tokens each of my actions takes, etc.

I dunno, maybe I’m the only one here. Curious what others think.

_fill 1103 days ago

At the moment we're still using langchain but it is quite cumbersome in the long run. The library is developing quickly and a feature that you might expect to work one week might not the next. Have you had better luck with others?

thatcherthorn 1104 days ago

I am also interested in the answer to this

victorbjorklund 1104 days ago

Not OP but I have seen several use cases where first summarising parts and then summarising the summaries have been used.

_fill 1103 days ago

All the strategies below were ones we tried. You can check it out!

voisin 1103 days ago

FWIW, I’ve tried to sign up and the email verification code has not been received despite having it resent multiple times and checked spam and confirmed the email address.

_fill 1103 days ago

Hey Voisin, sorry to here that you're having issues. Do you mind joining our discord and following up in help: https://discord.gg/9s8GNYSM otherwise feel free to email phil@scribbler.so or ian@scribbler.so

absk82 1101 days ago

This looks great. How does your stack look like ?

nittanymount 1104 days ago

nice one ! thanks for sharing