Hacker News new | ask | show | jobs
by nicd 1017 days ago
It would be really neat if after pulling the captions, an LLM was used to reword the content into an idiomatic "blogpost" (since speech is typically different than writing). Using LLMs, we could even choose the level of summarization and the output tone!

As someone who strongly prefers reading to watching instructional videos, I'd pay for this service :)

8 comments

I made something sorta like that specific to recipe videos. Basically converts recipe into an idiomatic format (inlines ingredients, detects and renders timers) and links each step in the recipe to its timestamp in the video for easy indexing while you're busy in the kitchen. (I spent too much time trying to scrub to that one spot where "how it's supposed to look" is shown while busy making it look that way)

See example: https://rexipie.com/watch?v=JiJXdoTjw8M

Just s/youtube/rexipie/ in any recipe video URL.

(full disclosure the step/transcript linking is paid-only as it requires a GPT-4 call, everything else is available to demo on free tier)

That's really cool! For comparison, this is the recipe written by the same guy: https://www.allrecipes.com/toy-box-tomato-ricotta-cheese-tor...

I've gotta say, your website might be easier to use during cooking, since it provides the information in-line (especially serving sizes etc.)!

Cool website. Much better than the SEO spam I came across earlier this week when I did a websearch for "pear qwerty horse" after seeing it in the tags under a binging with babish video.

Love the timers and jumping to sections of the video. Though, the second video I tried viewing didn't have linked steps.

The hyperlink "Food Wishes" at the top of the page is broken. It'd be nice too if there was a way on that page to request a new recipe (via video ID or whatever).
This is neat! Do you plan to support videos containing multiple recipes?
As a way too make that easier, maybe it would be nice to support a user-specified set of timestamps? Say recipe A: 0:00-7:46, recipe B: 7:47-15:33 and so on
I regularly use https://www.summarize.tech/ for this purpose.

Not exactly a blog post format, but it must've saved me a hundred hours, no joke!

You can absolutely already do what you describe with GPT-4 plugins (Plus membership required). Using VoxScript and Video Insight :

https://chat.openai.com/share/229e3ac8-3924-48e4-abd5-35bcb2...

Except for the complaining GPT will do, and some censorship based on the whims of its' programming group. No thanks; I'll stick to scripts, where the video dictates the content.
Just use something like llama2-uncensored then, they're on ollama.
It seems that many of "my script can do [something] with [information in a different form]" can be superseded by LLVMs already or in the near future and the quality is way better than what the scripts are capable of.

I just wonder what the price of this is. I can run most of these scripts on an old laptop. But for the LLVM I need a pricy an beefy computer or (even worse) a paid subscription to a big tech's service.

Honestly, I like the GPT 3.5 version I posted here way better.
There you go, friend:

Step right up, folks! Gather around and feast your eyes on the magnificent creatures before us – the elephants! Now, what makes these majestic beings so fascinating, you ask? Well, let me tell you – it's all about their incredibly, unbelievably long... um, trunks! Yes, you heard that right. These gentle giants sport trunks that seem to stretch on for ages, and let me tell you, it's nothing short of impressive. So, as we stand here in awe of these marvelous creatures, remember, it's the little (or should I say, not-so-little) things like their remarkable trunks that make them truly stand out. And that, my friends, wraps up the lowdown on our pachyderm pals – fascinating trunks and all!

I've been working on just such a tool [1] to help me digest podcasts and senate hearings.

[1] https://github.com/the-crypt-keeper/tldw

Did you consider directly taking the subs from yt?
>since speech is typically different than writing

Is a scripted video significantly different to a written blogpost? It might be a symptom of the type of YT videos I watch, but most of them seem to be essay-style "intro/thesis/points 1, 2, 3/counterpoint/conclusion", and the only thing that hints at speech is the umming-and-arring of the presenter.

It is to me...an example from a CNN transcript:

"Former chief-of-staff, Mark Meadows asking a federal judge to put his surrender on hold, while deciding whether to move his trial to federal court, and former DOJ official Jeffrey Clark, seeking the same, making a pretty remarkable argument in his filing."

That's someone doing a sort of play-by-play explanation of what viewers are seeing in a video. Compare to a purposefully written story:

"A federal judge in Georgia rejected a request by former White House chief of staff Mark Meadows to postpone his surrender and arrest in Fulton County, Georgia, as an attempt to move the case to federal court is litigated, according to a court order issued Wednesday."

It seems like there could be some value in an LLM that would rewrite the first into something more like the second.

Or at least chunk transcription onto logical groups.

Chunking on the example webpage[1] is poor.

[1] https://obra.github.io/Youtube2Webpage/example/

can I self-promote here? we are not doing exactly the same but we are transcribing videos ourselves (no auto YT captions) If you want to read a high quality transcript & summarize videos, you can do that at https://alphy.app
This is a blank page on iOS Chrome.