Hacker News new | ask | show | jobs
Show HN: I made a website that converts YT videos into step-by-step guides (stepify.tech)
271 points by aka_sh 782 days ago
Hey HN,

I've been working on this side project for the past month. It generates a step-by-step tutorial guide for YouTube videos that you can follow along without watching long videos. Best suited for tutorial videos but can work for other videos aswell. No BS. Just straight to the point.

The guides are generated from pure transcript so you don't have to worry about it being AI. It's my first project as a total beginner. Something I had to do inorder to get out of tutorial hell.

Please let me know if you have any suggestions or if you face any problems or bugs. I would try to fix them to the best of my abilities and as soon as possible.

I would appreciate your feedback on this. Let me know what you think!

40 comments

This is a brilliant and useful application of LLM technology, I'm impressed.

One question- On the backend, is it downloading each video CC (closed-caption) transcript and feeding that into a tuned prompt? What happens for videos where this is missing? Asking because I've noticed CC is occasionally unavailable for some YouTube videos.

If you cared to have a fallback, a potentially interesting experiment / solution for such cases is to download the video, extract the audio to a WAV file, then through the audio through Whisper [1] to generate the transcript. Using CPUa, it will still be incredibly intensive and slow, generally not much faster than real-time (e.g. a 5 minute clip will take on the order of ~5 minutes to complete transcription). However, with Whisper running on a fancy GPU it is insanely faster, between 100-200x faster, meaning even for long videos, generating the transcripts will complete in only a few seconds.

Great job @aka_sh!

[1] https://github.com/openai/whisper

p.s. Is there any chance you'd open source your code? Or do you plan to turn this into a business? The code itself is exactly a huge moat, and it'd be cool to see how you did this. Cheers.

p.p.s. stepify.tech app is currently crashing out to a heroku error page when I try to submit a YT link.

Thank you! I'm getting the transcript through an API and feeding it to the GPT. For now, the fallback function for no captions is just to make something out of the description of the video. I really appreciate the suggestion, i'll experiment around using Whisper. Regarding open source or business. I don't really know about that yet. Maybe, i'll lean towards the business side to cover the costs and see where this goes. And sorry for the downtime! API credits ran out. It should be fixed by now
Eek, so many typos in my comment - but the most egregious was where I meant to convey the code itself is not a huge moat. Even still, no worries if you don't want to give it away, I totally understand.

Keep up the good execution.

Definitly try out whisper after splitting out the audio as a fallback, and don't forget their are other models like WhisperFast that might be slightly less accurate but less resource intesnive, and since your not publishing the captions themselves you don't need it to literally get every word perfect.
It's epic how well that works. Even with Whisper locally, most of what I throw at it becomes readable.
Here an example of implementation you may find interesting (that also includes snapshots, and links back to original video) - https://github.com/Yannael/video2blogpost
Here is another resource on the same topic: https://news.ycombinator.com/item?id=39367264
Comparing yt transcript to open whisper transcripts could be interesting if it could pick up on something extra.

There is limited need to reinvent the wheel to process audio when other things can be solved.

The suggestion was to use Whisper as a fallback where no YT transcript exists.
I mean if CC is missing you just run it through whisper/whisperfast and you've got CC.
As someone who can’t stand the modern trend away from text and towards video, I can’t praise this idea enough. The number of circumstances where a video is better than text with some clarifying pictures is quite small
100% agree. Video can be helpful for supplementary illustration, to show exactly how to orient parts in an assembly, etc. but at the cost of (often) sitting through a lot of rambling monologue that is not.

I haven't tried this yet but it would be helpful if each step included a link to the spot in the video where that step is shown, so that in case you need it it's easy to find.

Yeah. The only way to find some written instructions these days is searching for reddit specifically. Which I'm not a big fan of, either.

I've had multiple instances where I had a simple issue with zero decent Google results, and a YouTube result with literally the exact question I had in the title. I had to sift through 12 minutes of "like and subscribe", a dude clicking around in various screens mumbling some stuff... I would have been very happy with a simple blog post

Totally agree with you on that. I hope this lives up to your expectations. Thank you!
Super interesting. I recently went down the DIY rabbit hole for solar, electricity, etc. I tested out https://stepify.tech/video/O8eVxRVwlnw and looks decent:

1. It took about ~45 seconds for the page to load once I put the URL in. You should have a loader on a page showing that the website is "doing something" while the AI transcribes.

2. It would be great to sync the chapters in the YT video with the guide details.

3. Even more advanced would be the specific items like "Drill holes, insert expansion bolts, and secure the inverter to the wall using nuts and washers." showed a timestamp and thumbnail with a link to the video part.

4. It would be great to have a checklist functionality (maybe this is the "pro version"). I often do something, get halfway and then need to scrub the YT video to find the specific place where he talks about the action item.

EDIT:

5. IMO iFixit has the best "guide" formatting: https://www.ifixit.com/Guide/How+to+Recover+Data+From+a+MacB... if you could somehow generate this by the video, that would be insanely useful.

Great suggestions! I really appreciate your feedback. I'll work on implementing these as soon as possible.
Checkout this app called Razzl. Pretty much does what you’ve described.
A web-page is way more convenient than an app.
Also, that requires the creator to build the whole step by step process. Being able to automatically generate the format would be HUGE.
Great work. A few ideas

1) Speed : the site is often showing heroku errors. Seems like you are running the entire processing in the request-response cycle. If not already done, please try to use a queueing system to perform async processing - and then let the user know when their video is ready to view as steps (probably via email or browser notifications). This will stop your site from crashing frequently and you'll be able to scale to many users very quickly.

2) Please add link-backs to the specific time in the video from where the step is shown.

Cheers!

Also, +1 to chapters as someone mentioned in the comments.
Not sure if putting the site behind cloudflare or something could help.

Heroku just wants a bigger bill.

Noted! I'll will look into that. Thank you.
Hi,

Is there a way to request items that were submit get removed? Can you provide a way to contact you such as an email address? There wasn't one posted on your site.

It's just a suggestion, I mean right now anyone can submit anyone's videos without their consent or ownership verification. How do you plan to handle that? I'm sure there will be folks out there who wouldn't feel comfortable that a site will be scraping their video content attempting to generate a large network of pages on 1 domain with loads of SEO terms. It provides a conflict of interest with the original creators. This conflict of interest is around SEO competition, reducing views from original creators and then there's the other can of worms of any future plans to monetize your site through subscriptions, paid features or ads where you'd be profiting from the content of others without their consent.

I posted one of my videos just to see what would happen and then it created a permanently hosted page on your domain with an AI generated recap of the video. I didn't realize that was going to happen. There was no warning, label of how it works, TOS that I agreed to or options available to make it private and there's no option to delete it. I put in the URL, hit submit and that was it.

It's nothing personal and I hope you don't see this as a deterrent. I'm all for building cool things and generally openly share almost everything for free (I've been blogging and making videos for ~9 years and don't have a single ad on anything I ever posted) but the idea of having inaccurate AI generated content does rub me the wrong way.

> The guides are generated from pure transcript so you don't have to worry about it being AI.

You mentioned it's generated from pure transcripts but most of the phrases used aren't what was mentioned in the video. It looks like a paraphrased version of it but it's also missing all of the details that would allow someone to follow along.

Directly under the video on the page it says "This response is AI generated". One one hand you say it's not AI generated but then on the other hand it is.

Well, this place is called hackernews, after all. Information should be free so if Youtube makes it public, public it should be.
Public doesn't mean it's available for someone else to use however they see fit.

That's why we have licenses and YouTube's default license ensures creators retain ownership of what they upload and are protected by copyright. The license allows YouTube to broadcast the content.

You're not the only one with this take on this thread and I'm really trying to understand it...

Why do some of you think it is not okay to put YouTube embeds on a website???

> Why do some of you think it is not okay to put YouTube embeds on a website?

YouTube embeds are a different story, that is an official YouTube feature which allows folks to embed a YouTube video on a 3rd party website. I have no problem with that. YouTube even allows creators to enable or disable that on a per video basis. I keep it enabled because it's useful and promotes sharing of the original content as it was delivered.

I have a problem with a 3rd party site taking a video and making a derivative of it without the consent of the copyright owner. It's violating the license that the video was uploaded under. They even went as far as explicitly claiming copyright ownership on all content on their site (at the time of this comment their footer reads: "© 2024 Stepify - All rights reserved.").

I don't like making assumptions but look at how responsive the original poster of this thread was to most comments. They replied to a ton of people, but not this comment. They've also made an explicit decision not to include any way to remedy this issue or even contact them through their website. I'll let you draw your own conclusions from that.

I wouldn't have even minded as much if the generated text was good but in this case it was wildly inaccurate and missed all of the details that would have let you follow along without the video. The site's official tagline is "Get a step-by-step tutorial of any video to follow along". If someone sees the text generated they might infer a video was of poor quality because this site claims it can produce a step by step tutorial of ANY video to follow along. That sheds negative light on folks who created the original video.

Yes and to be clear the creator gets the revenue, not this website: https://support.google.com/youtube/answer/132596?hl=en-do-i-...

> Only YouTube and the video owner will earn revenue from ads on embedded videos. The owner of the site where the video is embedded will not earn a share.

Furthermore, the YouTube creator can choose to not let their video be embedded if they wanted that.

Do you have a problem with every news website that has a video at the top, then an article describing what happens in the video? How would that violate the licensing? It's unrelated to licensing - they're using the official YouTube embed. YouTube manages the copyright of the embedded content and can even control whether or not the video can be viewed in your country, etc. based on such restrictions.

> look at how responsive the original poster of this thread was to most comments but they ignore this request

Irrelevant, but I think because it's obvious you're misunderstanding copyright, or because you wrote such a big paragraph with many separate points being made that it's a lot of work to reply to. The copyright in his footer is for his IP, it of course would not apply to the content inside a YouTube embed. And it's not IP theft to summarize a video in what is essentially a blog post.

Whoever did this is a prankster and hilarious: https://stepify.tech/video/co7KgV2edvI

I hope that didn’t wreck your compute costs

This one really made me laugh. Good thing the website takes in only transcript to produce the response. This video had none, otherwise it would've been a problem hah.
Yeah, definitely some interesting examples: https://stepify.tech/video/ikc6PUSwdK4
For the "Paid" or "Pro" version, let me have a browser extension that replaces ALL OF YOUTUBE with your text based breakdowns.

// I'm not really kidding! Because boy do I hate 15 minute videos with the one CLI command you need buried like a needle in a haystack. Seeing the nonsense distilled into a handful of straightforward steps is so refreshing. Awesome work!

So true, you're after a few seconds buried in the video.

Giving the 15 seconds up front and then explaining it in more and more detail can also be appreciated by users.

You’d have to be lucky to get the correct and complete CLI command from the transcript though, unless this is also doing OCR, which I don’t think it is.
Thank you! I'll try implementing something like that and get back to you.
Love how the AI turned “drop a comment below” into a project step:

“Seek feedback from stakeholders or viewers by encouraging questions and comments for further engagement.”

This is from a bathroom remodel video.

Sorry for that, I'm looking into it. The problem is for videos that have no transcript. Maybe it's because i'm feeding it the description of the video for now. I'll find some workaround for this. Thanks!
> The problem is for videos that have no transcript.

Whisper or other models can help with that too, but remember to preprocess to cut silence. The dataset tends to include ads in the captions, which results in hallucinated in from silence.

You could also add a transcript-evaluation step which checks whether this actually looks like a step-by-step video, but I'd consider skipping it for cost and efficiency. Trying to be helpful by evaluating whether the video is instructions or not is added complexity where bugs and strange behavior can creep in.

Feels like you might have to explicitly ask it not to put "drop a comment below" or "like and subscribe" into the instructions (or strip it from transcripts), since most YouTubers who take YouTube seriously are going to ask...
Consider passing the video and transcripts through SponsorBlock (removing sponsor, self-promo, interaction remember, intro and outro segments from the videos) before stepifying them, that might help
It’s not a problem! Just funny sometimes what AI does
Don't sweat the small stuff.
I made something a little similar, but just as a little cli script that I run locally for myself. You can input a url for a YouTube video, podcast link or local audio/video file. It transcribes it with whisper and outputs the full transcript in one text file and I use another model to summarize it into a bullet list in a separate file.

I so appreciate these open source/access models allowing us to build these kinds of tools without having to pay and send our data to openai.

Doesn't youtube automatically transcribe every video with whisper?
Whisper is a different company than Youtube (Google). Youtube's transcription existed before Whisper too so I'd suspect Google has their own for some time.

Whisper's is supposed to be better in some cases, but Google's probably works very well at scale.

My knowledge of YT is limited, but it sounds a little strange that Google would use OpenAI?
This seems like something people on HN have asked for before. I clicked on one Recent video about how to create a simple Flask app in 5 minutes and the instructions seemed good on a cursory view.

I tried entering a new video but I got a Heroku application error. Maybe it's a limits thing.

When I look at the Recent videos, a lot of them are not for instructions/tutorials. Perhaps people do not understand the purpose of this project. Maybe they are just testing it out with non-tutorial content.

Maybe you could add representative videos towards the top so that people would get a better sense of the use of this project?

I don't know why this isn't more popular here. It's a good idea. (Maybe it has already been implemented elsewhere?) Reading is much faster than watching a video for many instruction-based tasks. Good luck!

Yeah, you just said what was on my mind since I launched it. The code I wrote is for tutorial videos. Non-tutorial video responses are just gibberish. The representative videos on the top is a great idea. I'll look into it.

Can you tell me more about the video you entered? Did it have a transcript? How many hours long was it?

If you continue to this road, you should plan to fund the creators that this is siphoning from, or allow them some form of consent to agree to this.

What you are doing is, whether you’ve considered this or not, at risk of harming people who are building around video because it is financially viable. People produce these guides as videos because that’s how they can make money from them, whereas it is much more difficult to do so on a website.

You need to consider the implications of what you’ve built.

Hm, is this the right take? The YouTube player is embedded on the page, giving the creator YouTube views and more exposure. And I think when a person uploads to YouTube the idea is their video will be out there - including in embeds on 3rd party sites.

I just wouldn't use the word "siphoning" here. There are countless blog posts, news articles, how-to guides, etc. that will embed a video like this yet also provide supporting text for readers. I think it's a pretty normal way of sharing content.

I for one am not a person who learns by watching videos, step-by-step guides work better for me. The idea that all those video tutorials could be made available as text-based guides sounds actually very useful - and I would still be very aware of who originated that content as their video is embedded right there.

It would actually be great if when I search for a tutorial and the most relevant result is a video, if my browser could summarize that video the way search engines summarize results at the top or in the side bar.

It’s literally describing the entire video in a way that is intended to replace the purpose of the video, and only displays the video minimally in the context of the website.

It pulls out the information without adding anything of value, while making it impossible for the creator to make money from it.

This happens with text content, too. Ask publishers about the large number of AI rewrites of their content going around.

The issue here is not “consumer value,” it is “publishers not able to make money on their work,” the entire point of my original comment, which your reply doesn’t mention once.

It by definition promotes the video and the creator.

> publishers not able to make money on their work

Again you're wrong, the creator still gets the view and ad revenue not this third-party site where it's embedded, from YouTube:

  Only YouTube and the video owner will earn revenue from ads on embedded videos. The owner of the site where the video is embedded will not earn a share.
https://support.google.com/youtube/answer/132596?hl=en-do-i-...
I don’t know how you’re missing the bigger picture. It is taking away any reason to actually watch the video, which means nobody makes revenue from it. It is not a promotional tool—it is a replacement for the video.
I understand your take but I don't agree. By your logic no news site could display a video at the top then summarize the video in an article. This is one of the main use cases of the YouTube embed - which gives revenue to the creator when it's played on a third-party site (and the third-party site host gets no revenue) and the YouTube creator has the option to disallow embedding their video if they don't want it embedded anywhere - it's in their control.

The idea that the number of embed plays will be 0 on this site is just unfounded, and untrue as I just watched a video in an embed on this site. That creator just got a view, where otherwise I would have never seen their content, thanks to this website.

I'm not sure if it replaces the video.

It can be an invitation to the entire video because a summary may not cover all the details.

Maybe you are noticing some of the summaries might be better than the videos themselves writing wise?

The folks taking relatively simple videos and extending it out into a video script to milk watch time probably just need to make better scripts.

The YouTube player is there, poorly inserted at the bottom of the page on mobile.

No attribution is provided. There's no credit given. The description from the video is not provided.

People make their living from producing YouTube videos. This isn't cool at all.

Credit is given, for example this one: https://stepify.tech/video/Y4v8abG94CU

Says "Channel: Panigale Enthusiast" and the video is at the top-right of the page very prominently displayed.

On mobile - I agree it should be at the top, but for UI/UX reasons not because they're "siphoning from creators" which they are not - they are literally promoting the creators.

The people making a living from YouTube will make even more money now that their embed is on this site, won't they? Are you implying they don't get the views?

Why isn't it a link to the channel? Where is the description from the video which often includes the credits for the video, including writers, editors, etc?

People won't get views from this. This is taking the work of others.

The site is meaning to keep people on the site, not send everyone to youtube. It's a different form to consume and narrow down what to watch instead of investing less time in videos that don't have the information you might be after.

Many genres of videos dont really matter as much for a summary, so I'm not sure if this is super ubiquitious

Isn't an embed better than a link? Click play vs click a link then click play?

It's embedded and says the channel name in a giant font size. The creator gets the views.

> People won't get views from this

Do you know how YouTube embeds work?

Seems like we're adding colour to bits again.
This is fantastic for recipe videos: https://stepify.tech/video/wUFbhygzbqQ
Recipe ones are the best lol
The guides are generated from pure transcript so you don't have to worry about it being AI.

That just means you have to worry about voice recognition errors instead.

True, but voice recognition errors typically involve an oddly-out of place word or two which you can usually spot and mentally correct. That's less likely to make you take the wrong series of steps than a completely coherent and topic-relevant "hallucinated" sentence that just happens to not be part of the guide at all.

Edit: although in this instance the LLM pretty heavily editorialises the transcript anyway...

You know what would save me more time? ...if I could search a database of stepified videos.
This looks amazing! As a marketer, many times I struggle with repurposing long video interviews into shorter tactical videos, and this what you built looks promising. I'm excited to check it you!
You should probably rework the recent video thing? Or not. I mean it's engagement, I guess, but I'm pretty sure people are intentionally putting silly videos on the page.
Tried it on one of my latest videos. Interesting results. My video is not quite a tutorial video, so I can understand why the results are not perfect. But it has invented quite a lot of content...

https://stepify.tech/video/1-Rm0mgg2RI

Here's the video for reference:

https://www.youtube.com/watch?v=1-Rm0mgg2RI

Thank you for trying this out!
This is a great & useful resource! So many guides on YouTube are unfortunately padded with so much silliness and fluff. Would be great to link out to time codes if possible.
Thank you! Great suggestion. I'll try adding timecodes ASAP.
I could have used this on the weekend. I was working on my car, and though I had watched a few videos about removing the door, and electrical connections, etc etc. I missed on some of the details, or had to make a mental note of "this, then this, not the other way around".

What I think might be a great addition is if you had a screenshot for each point? Though I'm not sure how you'd figure out which image would best capture the action.

This is cool. I have been doing this a bit more manually, by using a Chrome plugin that does YT summaries and shows transcriptions using Claude. I don’t like those summaries so I paste the transcriptions into ChatGPT (GPT4) with a prompt “Provide detailed study notes of the following video transcript”. That gives me a very similar format to yours. Will have to do some side-by-side comparisons.
It’s tricky when you don’t do editorial on your homepage tho:

https://stepify.tech/video/623AC6a6org

is the first featured video…

In any case, it’s doomed- google will cut off the access or integrate the feature on their side. They thank you for the proof of concept though.

It is less tricky than watching a video on the subject. Very funny, but might not be a video you would want to watch at work (or home).
It's funny though
Interesting idea, but not quite useful. I tried two: How to replace a fiberglass window screen and how to replace the "cycle clutch gearh on an IBM Selectric Typewriter"

https://stepify.tech/video/KafAn1h4x14

Neither were good enough to use.

Great idea and congrats on shipping the project!

I'm curious if you noticed certain models worked better for summarizing and converting to steps. For example, in my projects I've found that Gemini outperforms "better" models like GPT for similar use cases, which I guess makes sense given Google's interest in summarization.

It looks like someone is flooding the service with questionable content (maybe to get you deranked from Google?)
Interesting; this is similar to an idea suggested by a Scott Galloway/Section weekly email.

1) record an SOP using Loom while you narrate, 2) grab a transcript of your narration, 3) feed transcript into ChatGPT to write list of instructions.

Was billed as a way to easily hand off processes to contractors or subordinates.

This seems like a cool riff on that. Neat.

Heh, it did more or less what I was hoping it would for the song ‘How To Be A Heartbreaker’: https://stepify.tech/video/vKNcuTWzTVw
> Internal Server Error

> The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

Hugs all around - I'd take it as a positive feedback. Congrats on the launch!

Very nice work :)) Guess we worked on similar topics in the past couple days haha

see: https://news.ycombinator.com/item?id=40112792

The difference is theirs can be used right away without needing any keys haha
Wow you might have done something, saved

How are you managing costs and offering this for free?

I am not. I'm from a 3rd world country and trust me when I say I this i've burned through half of my paycheck in a few hours which is like barely 3 digits.
I think, to be fairer to the people actually creating the content, you should make a much more obvious link back the original video.
I will. Could you suggest a place where it would be more obvious?
I would suggest to put it at the top of the instructions. What would be really useful - as someone else suggested would be to link directly to relevant parts of the video.
This is an ingenious and practical use of LLM technology, I'm thoroughly impressed.
Very awesome! Would be even neater if it pulled screenshots from the video for each step :)
It would be great if it was open source, as I might want to make some custom modifications.
Ha I had this idea a few months ago and didn’t pursue it. Love it
Looks like it might be down. Love this idea.
Hilarious thank you
Love the Filthy Frank survival guide!
that's a lot of sexual content in the front page... might want to moderate all that
this is super helpful, thanks for making it! bookmarked.
I've been looking for something like this for absolutely ages. If I want to figure out how to fix my cellphone, reset a warning sensor on my auto dashboard or more recently install a NAS box, there's always this long winded YouTube video packed full of ads. Thanks for helping cut through this nonsense.
Appreciate the kind words. This really means alot