Hacker News new | ask | show | jobs
by 35mm 639 days ago
As someone who has worked as a video editor, the most helpful AI tool would be prompt based editing.

For example “find all the interview sections where people are talking about x and make a sequence”.

OpusClip claims to have this but it’s behind a waitlist.

6 comments

As an outsider: sounds like the main value lies in the AI extracting detailed and accurate (but heuristic) metadata from video: audio transcriptions, text, people, environment and objects.

Once that’s there, you can use it for organizing, searching, filtering, or whatever you want. It does not need to be coupled with an LLM-based interface.

ML models for eg face & object recognition have been deployed in both local- and cloud based photo organization for at least a decade. I very much welcome transformers to do a much better job, but I also very much reject the everything-is-a-prompt hammer as a solution to all problems. Especially in deep and professional workflows where details matter.

Author here.

Yes, this is a big feature I've been working on, should be ready for a beta by the end of the month.

I allude to it in the post, but good search (for editing) is a challenge, and necessitates a mix of embeddings/vector search and text models.

Derushing in general is the most time consuming, so not only language pattern recognition but also image recognition: "From the rushes, extract all the sequences with bicycle crashes to give me a pile of clips to use in my edit" !
Yes, agreed.

I film a bunch of skateboarding, and it can take tens of tries to land a trick. Similarly, there's usually an unique sound that signals a trick was finally landed.

Good multi-modal search and discovery is a huge part of cracking the editing problem.

Looks like https://kino.ai addresses that derushing stage, but as a specialized tool rather than as a function inside a video editor - which makes a lot of sense to me.
Tens? It sometimes takes my crew hundreds of tries (all on DV tapes).

How far have you been able to come with search for trick variations? It would be interesting to see a system that can reliably recognize what’s switch, nollie vs fakie etc. Then have it generate a list of all tricks for each skater and perhaps outstanding fails. Just some thoughts.

Detect the cheer everyone makes when the trick lands. Lots of proxy indicators to key off of.
> I allude to it

And that’s why I read the comments to see if anyone mentioned it.

To be able to literally take the source files used to put the video together and edit each piece individually would be great.

I wanted to create a car driving down a road covered in arches if greenery. I got lots of great options but I wanted a particular combination of options. If I could do something like that with video, that would be terrific

Not a personal jab, but I am astounded how every day, HN is full of discussion around how articles, newsletters, podcasts, and videos need to be aggregated and summarized for actual consumption. Repeat ad infinitum in both directions.

In my experience, I’ve always listened to live discussions or read long form blog posts, specifically for the story and obscure points being made. Summaries never capture that and always miss nuances.

It's approaching a very strange situation where people make overly wordy and bloated AI generated content and other people try to use AI to compress it back into useful pellets vaguely corresponding to the actual prompts used to generate the initial content. Which were the only bits anybody cared about in the first place.

One guy pays the AI to dig a hole, the other guy pays the AI to fill in the hole. Back and forth they go, raising the BNP but otherwise not accomplishing anything.

I haven't worried about search engines since I was trying to get my site into yahoo, but my understanding is that they rank long flowery prose far higher than things that are straight to the point.

There's then the added "benefit" of being able to put more adverts in such long text.

One of the main appeals of chatgpt is it just gives you the answer

*an answer

Not necessarily the answer

So no different to searching online and finding some random page then. In my experience chatgpt is usually far more accurate, and as it gets right to the point you have far more time to understand if the answer is reasonable
No one searches online for a random page. You search for something you may or may not find. You don’t go in a library looking for Jules Verne and get out with any random book. I can agree that search engines may be bad, but they don’t create web sites out of thin air.
It’s clearly different in that ChatGPT sounds authoritative but you still have to track down sources and make sure they’re correctly summarized and accurate. Search doesn’t give you the impression that you’re doing anything else but ChatGPT always sounds authoritative even when it’s wrong, which makes it a hazard for the people who need it the most because they don’t have the personal expertise to recognize when it goes off track.
Insightful :)
Not sure about articles, but people keep recommending multi-hour-long podcasts and videos far beyond the ability of any employed person to keep up with what they might want, so a summary is a useful tool to extract the salient points and possibly consider if something meets the threshold of being better than all the other hour-long things I might want to spend my free hour on.

It sometimes feels like media has bifurcated into hyper-dense (let me explain a whole field of law in a 30 second tiktok) versus hyper-fluffy (documentary with 30 minutes of material spread out into six episodes, with a recap before and after each commercial break), depending on whether the target audience has a job or not.

Sounds like you're suffering from FOMO if you feel the need to consume summaries of multi-hour content you don't have time to consume.
It’s also changes in market dynamics. Professional podcasters sell ads so they need lots of content, and the pivot to video or podcasters which advertisers drove means that things which a decade ago would have been a blog post taking 15 minutes to read are now an hour or more commitment for the same amount of information.

This is a common complaint here because HN is so text heavy that you’re not going to find many people here who can’t read much faster than the average speaker can present information.

Yeah that's what I meant by spam.
If that’s what you meant, you didn’t say it and it’s not spam by normal definition of that term.
Or they are just interested in the content?
I doubt it.
I generally agree with you when it comes to learning-focused content but there are definite cases where using an AI summary makes a lot of sense.

Imagine searching for a guide on how to disassemble your laptop. Unfortunately, you can only find a 30 minute video which is full of rambling, ads or other things irrelevant to you. You can at least in theory use AI to produce a textual summary which contains only the disassembly instructions and relevant snapshots of the video.

All professionals I've ever talked to seem to agree that videos are a terrible form of reference information (i.e. you need information to accomplish a task right now).

The same applies to recipe websites: an AI that can throw all the fluff away is useful considering the annoying habit of the authors to seemingly write about everything but ingredients and the steps necessary to cook the dish.

I think this relates to the https://nick.groenen.me/posts/the-4-types-of-technical-docum... as in any documentation that serves immediate work rather than learning should be straight to the point with as little clutter as possible.

>All professionals I've ever talked to seem to agree that videos are a terrible form of reference information

It really depends. For most software things, I'd prefer to have written documentation. If it's purely for reference, then yes I agree text is better.

For working on my bicycle or car, often I like watching videos because you pick up on little ways the pros make the jobs easier - for example, the steps might do a poor job of describing the angle and movement of tyre levers, but it's easily understood via video (just an example).

As a result, it can be a much richer experience when you are building skills as opposed to just following a checklist.

I totally agree. What is life living with just summaries?

Podcasts and blog posts fall into "unique value/view/information I am learning" or entertainment "something that feels like a (parasocial) friend - content I can predictably expect and get some dopamine/sense of socialness from".

Summaries for the former remove the eureka moments and brain connections between ideas, replacing them with takeaways, and summaries for the latter are like summarizing a TV episode in text: no entertainment tends to really come from it.

I think it comes from having many messages at work, and I get that. When you have 50-100 messages/documents a day, quick summaries are a lifesaver, they help you filter, avoid, or get to the facts. But for things I select listening to.. for those hours of rest or (scientific) curiosity in my life.. summaries are not a virtue.

(for Parasocial - the feeling is: This person won't update me on their relationship problems, they'll explain a cool thing about castles to me and share their opinion, etc.)

It has a lot to do with the kinds of articles that appear on HN and across the internet. And also, that spending time on something requires being interested in it, and so, there's a larger audience for summaries.

I think, in general, most people have areas of interest to them where it would not occur to them to summarise what they're having fun engaging with.

People use these summaries to generate spam which they sell to advertising networks, that's why they keep talking about it.
Thats fair, and there will always be people who want summaries.
I don't read much online drivel, but the way I would describe my interest in AI summary/model building, is that I do read a few articles/books deeply, but these refer to many other things that it would be useful to have a general picture of in my mind, but I'm never going to put the manual effort into building that surrounding structure.

E.g. I'm interested in classical art, and come across a lot of "he painted this while he was in $X before he moved to $Y". I'd like information about $X and $Y to be also available, how far apart are they, were they ruled by the same people, etc. But I won't be doing that sort of digging myself, I'd like it to show up next to what I'm reading, because I (will) have an AI reading along and doing this work for me.

You don't understand! I need to procrastinate more efficiently!
that seems really hard
You should check out scenery.video (disclaimer: I have a relationship with the company)
Check out https://kino.ai (YC S23)