Hacker News new | ask | show | jobs
Ask HN: Video recommendation platform as a way to compete with YouTube?
67 points by mIOAMDA 2101 days ago
Content creators don't want to leave the platform because of discoverability. The audience doesn't want to leave because of content creators. People say that we should do decentralization like blogs in the past. However, it's hard to be discovered.

How about making a platform that crawls the internet for videos and build a recommendation algorithm. Therefore no matter where you host your videos you'll be easily discovered? This seems to fix the problem of discoverability while allowing decentralization or outright another platform to get traction.

EDIT : ghgr puts it nicely > Interesting concept. So you propose to have a landing page with a selection of videos tailored to the taste of the visitor? And seamless combining results from YouTube, Vimeo and others? You would be effectively relegating YouTube to a commoditized CDN for videos.

25 comments

Video hosting is actually a lot harder than it seems.

- For starters, you need the server space to host it. Videos can be large -- a 10 minute 4k video can be nearly 5gb in size.

- Then you need to the CPU to transcode it to various formats and bitrates (Youtube keeps more than 30 different formats sometimes)

- Then you need to bandwidth to actually serve all that data. Ideally you have servers geolocated close to the user.

As someone with a slow internet connection, I generally despise self hosted videos. If your video won't load because I don't have a >100mbit connection, I will likely leave.

This isn't really relevant to OP. There are lots of sites that host video already.. YouTube, Vimeo, Facebook, Twitter, Instagram, Reddit, etc.

A successful discovery platform would allow content creators to publish to any platform they want (or their own site), and still maintain their viewership.

> As someone with a slow internet connection, I generally despise self hosted videos.

To this I would say that at this point, storage and CDNs are becoming a commodity, and "self hosted" does not have to mean slow. I can build and host my own site on a home server while still serving videos from GCP CDN. Now I have similar to performance to YouTube without the YouTube.

Making mountains out of mole hills, IMHO. Streaming at large scale isn't hard, you just gotta do it. Certainly wasnt the hardest problem I've worked on. All of this is now easy as ever to solve, AWS can transcode, you don't even need to know anything about codec anymore.

Source: used to manage some of the largest adult tube sites.

Your comment seems to make the assumption that this is a technical problem. You're right that it'd be possible to make a YouTube site by oneself. If one did it with GCP they could be using the exact same hardware and very similar software as YouTube.

However, the real problem the parent talks about is financial. YouTube gets all this hosting at cost for Google. A YouTube competitor has to pay the AWS, GCP, etc. markups. They'd also have significantly less advertising income than Google. Plus, many of the professional content creators wouldn't even consider a platform that doesn't pay out ad revenue.

That's the internet, it's just computers connected together and each computer belongs to someone. The more you have, the more traffic you have. The only way I can see this working is if the internet or parts of the internet become a public asset, a national cloud (like roads etc) paid by income taxes. Your videos would be on this public storage, common services would work like AWS services etc. It will be hard since this means additional taxes but at least the government will not be interested in competition (in theory).
Valid point, without money to burn it's not possible in any market.
>Videos can be large -- a 10 minute 4k video can be nearly 5gb in size.

Not on YouTube, with the way YouTube compresses video it's usually somewhere around 10 gigabytes for an hour of 4k@60 video.

>Then you need to bandwidth to actually serve all that data. Ideally you have servers geolocated close to the user.

I don't see any serious benefits from decreased latency, if we are talking about video streaming.

Most users won't watch the video, if they need to wait for too long before it begins.
Even if user is on the other side of the planet latency is somewhere near 500ms. It isn't that much.
Latency isn't important for video streaming, peering is.
The goal is not necessarily to make self hosting easier. The goal is that if you're able to self-host than you could benefit from this recommendation platform by being discoverable. It's your hassle to be viewable the recommendation platform doesn't host anything.
And also text websites crash frequently when they go viral, it's going to be a bigger challenge for people to scale video going from thousands to views a day on their niche to a million if they make something that spreads.
I like going on Youtube and see what gets recommended to me, sometimes I discover useful/entertaining stuff that way. But most times, the recommendations tend to be terrible:

- repetitive, as if they insisted in the same specific video dozens of times

- not using the full breadth of a given channel I've subscribed to, e.g. just recommending a few videos out of 100s

- recommending stuff that has absolutely nothing to do with my interests, simply because it's going viral or it's been a previous big hit in the mainstream

- similar channels almost never get recommended to me. This would be super useful, since maybe I'm subscribed to 5 channels for a specific topic, but I could be subscribed to 20 as well - I just can't find them naturally

A good recommendation algorithm might be worth paying for, as it could provide endless entertainment.

This mirrors my experience. My theory is that YouTube falsely equivocates a video’s relevance to you with your desire to receive notifications about it - e.g. whether you clicked the little ‘bell’ when subscribing. This isn’t what we expect - I personally go to my feed to sees what’s new, but that’s not what YouTube creates the feed for. I don’t know what it makes the feed for, honestly.
I think their recommendations are really, really good.

To help the system you can click "not interested" to recommendations that you don't like, and you can also click the bell next to subscribe to get always get recommended and notifications for the creators you like.

Usually if I don't like the recommendations on the front page I just refresh, I get different ones.

Recommandations are hard because even for you, as an individual, they differet from one day to another and even from hour to hour, based on what you are in the mood to watch.

I've been trying to fine-tune my recommendation list by clicking on "not interested" and "don't recommend channel". "Not interested" option just removes a video and doesn't adjust anything according to what I observed. "Don't recommend channel" does a thing but after several months the channel again pops out on the recommendation list. Especially annoying reaction videos which I never watched and will never do.

Also when I try to refresh the front page I still see the same videos all over again in different order.

I feel like the rise of channel subscriptions has caused the recommendation algorithm to act like a dog chasing its tail. You get so silo'd with content on youtube reccs these days, it's really sad.
>Why don't we just make a platform that crawls the internet for videos and build a recommendation algorithm. Therefore no matter where you host your videos you'll be easily discover?

Because there are fundamental tradeoffs of _information_ and _meta_information_ that a system has that factors into recommendations.

People upload about ~500 hours of videos to Youtube every minute.[1]

Yes, that means that centralization to that degree has negatives such as the guitar channel being deleted but it also has positives such as better (not foolproof -- but relatively better) detection of spam and mistitled content. Because Youtube servers actually have the real bytes of actual videos, they can scrub the content with machine learning algorithms augmented with some human oversight.

If you only have a platform that crawls for video urls and doesn't actually ingest the petabytes of video like Youtube, the "recommendations-only" service will be susceptible to gaming such as people putting "Emma Watson and Scarlett Johansson nudes" as the titles but the actual video is an ad for somebody trying to sell a used car. Or the first few days of a video url has "Tips to deal with COVID" which is real but the video hoster later switches it out to be a video for that same url to be something else.

Also, a centralized service like Youtube can measure actual user behavior such as "watch time" (because Youtube's servers know they are sending bytes to the client) to see if the video is actually engaging viewers and this factors into future recommendations. If web surfers are closing out a 10 minute video after 20 seconds, the videos are appropriately penalized because users are "voting" that the content is bad. I.e. the viewers "voted" without even having to press the "thumbs down" button.

A new platform to recommend videos pointing to decentralized video hosting has advantages but also has unavoidable disadvantages that lower the quality of the recommendations.

EDIT to add my previous comments about the financial incentives (ads) that help video content creators which many techies overlook: https://news.ycombinator.com/item?id=21506992

[1] https://www.google.com/search?q=upload+about+~500+hours+of+v...

You bring some interesting points, but you are off the mark on what's the true challenge.

We [0] actually indexed "all" content on YouTube and dozens of other platforms. Not just metadata, but the videos.

We could easily run search engine like this. In fact, we have explored it in the past. We also came up with page rank-like algorithm that allows us to bootstrap the experience and eventually move to usage based sorting.

The true challenge is in customer acquisition. Google has such a strong hold on the browser market [1], that there is just no way to convince people to "go to pex" and search for videos. You can see it with DDG which after more a decade its penetration [2] is a fraction of Google's. Even Microsoft, that plunged billions of dollars into Bing is not able to acquire more than just few % of the market [3].

Finally, there are literally no investors to back this. After the fall of Blekko [4], nobody will touch this market. We spoke to dozens of investors. Most have a default position to "never compete with Google".

Up until this doesn't change, there is no way to bring it to the market.

[0] https://pex.com

[1] https://www.theverge.com/2020/7/1/21310591/apple-google-sear...

[2] https://duckduckgo.com/traffic

[3] https://gs.statcounter.com/search-engine-market-share

[4] https://en.wikipedia.org/wiki/Blekko

>Not just metadata, but the videos.

But calculating the fingerprints of the videos rather than store on disk and serve all the exabytes video means you're still at a fundamental _information_ disadvantage.

You already know the following but I'll spell it out for readers who may not see the distinction: web surfers don't watch fingerprints... they watch the actual video bytes.

Youtube employees and researchers found out that user behavior such as actual watch time of the videos was a _stronger_ voting signal of quality than the user hitting the subscribe button. To replicate that measurement, how would Pex get similar usage data without actually storing the exabytes of videos or convincing millions of web surfers to install a browser plugin to spy on their youtube usage?

>The true challenge is in customer acquisition. Google has such a strong hold on the browser market [1], that there is just no way to convince people to "go to pex" and search for videos.

I'm not convinced of your claim that the Chrome browser is the competitive moat that prevents Pex from rising up. E.g. Tik Tok got very popular without Chrome browser help. As another example, I found out that a gardening expert[1] gets most of her views from Facebook-hosted videos instead of Youtube. She gets more than 3x the views on Facebook. I don't have a Facebook account so I watch her on Youtube but it turns out I'm actually in the minority of her audience.

[1] https://www.youtube.com/c/gardenanswer/videos

>But calculating the fingerprints of the videos rather than store on disk and serve all the exabytes video means you're still at a fundamental _information_ disadvantage.

>You already know the following but I'll spell it out for readers who may not see the distinction: web surfers don't watch fingerprints... they watch the actual video bytes.

This is like saying "nobody goes to Google to look at links, they want the actual pages". These are search engines. People are more than happy to use them to find what they are looking for.

>Youtube employees and researchers found out that user behavior such as actual watch time of the videos was a _stronger_ voting signal of quality than the user hitting the subscribe button. To replicate that measurement, how would Pex get similar usage data without actually storing the exabytes of videos or convincing millions of web surfers to install a browser plugin to spy on their youtube usage?

Exactly the same way Google does it in their own search. User clicks on a link based on a keyword which creates a loop that you feed into the system.

We also have a lot of information. For instance we know what video is deployed where on the web, which pieces (down to 1s) are being taken out and how they are utilized and also general performance of the content on each platform. This way we are able to show things like "here is the best part of this video" or "here is the first occurrence of this video" or "here is the longest version of the video".

>I'm not convinced of your claim that the Chrome browser is the competitive moat that prevents Pex from rising up

I can tell you didn't read the link I posted. I never claimed this because of Chrome. They are the default search engine on EVERY browser. Firefox, Safari, Chrome. The tyranny of defaults is quite substantial.

>TikTok got very popular without Chrome browser help. This feeds into my second argument. No VC will fund it. TikTok spent billions on ads to promote their app. My point was exactly that. It's not the technology that is the issue, it's the marketing. We would just never stood the chance.

>As another example, I found out that a gardening expert[1] gets most of her views from Facebook-hosted videos instead of Youtube. She gets more than 3x the views on Facebook. I don't have a Facebook account so I watch her on Youtube but it turns out I'm actually in the minority of her audience.

YouTube is big, but small part of the UGC world, which is quite diverse. That's why a search engine would make sense.

>This is like saying "nobody goes to Google to look at links, they want the actual pages". These are search engines. People are more than happy to use them to find what they are looking for.

That's not my point. Yes of course the websurfers click away from a search engine to the original source url to read a blog etc. My perspective was not the websurfer but the statistics aggregation of websurfers playback behavior for Youtube as a factor to recommend videos for other websurfers.

For non-video search engines like Google/Bing/CommonCrawl, the web spiders download the actual HTML text and also execute some of the pages' javascript to add into their own index which is similar to users uploading actual bytes of video data. Google can then apply extra analysis on their copies of others documents. Pex fingerprints of videos are not the same. Unlike html of text pages with url links for Pagerank to cheaply extract and exploit, videos do not have a built-in "link" structure to other videos and don't form a graph for analysis. Therefore, signals like measuring actual user behavior drives a lot of the algorithm for recommendations. They have lot of this data ... because they host the majority of videos... which happens because hosting videos is expensive.

A pure search engine without hosting the actual video bytes and only links to others doesn't have the same virtuous loop of data feedback. It's fundamentally missing information that Youtube has. If I search Pex for "how to fix a faucet" and I click on a search result url that takes me to "http://ugc.com/fixfaucet.mp4" and it turns out it's a bad video and I abort the playback after 10 seconds, how would Pex know of my dissatisfaction based on realtime behavior happening on another website you don't control? In contrast, Youtube knows about my dissatisfaction without me having to click "thumbs down" icon.

> For instance we know what video is deployed where on the web, which pieces (down to 1s) are being taken out and how they are utilized and also general performance of the content on each platform. This way we are able to show things like "here is the best part of this video"

If you don't have visibility into the actual play/pause/stop buttons of the Youtube's video player and the browser's close-tab button, how do you get the same user behavior data Youtube has?

>They are the default search engine on EVERY browser. Firefox, Safari, Chrome. The tyranny of defaults is quite substantial.

And yet Facebook's walled garden of video hosting has many content creators with higher audiences than Youtube regardless of Safari/etc browser defaults for Google. Tik Tok found success as well. I disagree that it was billions in marketing.

Back in 2005, Google was also the default search engine for AOL and Yahoo and yet a little upstart like Youtube (without billions) was beating Google's own Video service! The irony! Susan Wojcicki was the early employee of Google who convinced Larry Page that they were losing and to acquire Youtube instead.

>TikTok spent billions on ads to promote their app. [...] It's not the technology that is the issue, it's the marketing. We would just never stood the chance.

I disagree with a CEO who boils down his business disadvantages to "marketing" expenditures. Let's consider Google of circa ~2002. How did the underfunded Larry & Sergei _pay_ for the search default deals with AOL and Yahoo when they themselves didn't have billions to spend on marketing? Remember, they only had $25 million in VC capital. The way they partnered with the then much bigger AOL/Yahoo was to offer them a revenue sharing deal for the Adwords revenue. They had the better technology which AOL/Yahoo wanted and Google used a clever "arbitrage" to fund the deals.

As the CEO, you can use a similar playbook. You need to come up with a compelling technical product (maybe more than content fingerprints, etc) and then work out a clever financial arrangement (subscriptions, or licensing, or partnering with Bing/Apple/Amazon to beat Google, etc) -- that makes your lack of billions irrelevant.

>YouTube is big, but small part of the UGC world, which is quite diverse. That's why a search engine would make sense.

Ok, that's a fine business thesis. (Seriously.) A universal search engine that's superior to Youtube because your algorithm would transcend the entire UGC landscape should easily prove to websurfers that they get much better recommendations.

If millions of others agree with you, that proves you have a compelling product. You don't need billions. Neither the early startup Google nor early Youtube had that type of money. Every company that seems formidable and unbeatable today because they have billions in the bank was previously a small company that didn't have billions. Likewise, you can be one of those companies that finds a way to billions instead of falling back on lack of marketing as the roadblock.

>Pex fingerprints of videos are not the same.

We also download the content to our servers. We analyze it, hence the fingerprint, and then delete the content. So yes, it's not apples to apples, but it works very similarly.

>They have lot of this data ... because they host the majority of videos... which happens because hosting videos is expensive.

Fair, but not that necessary. Usage data is great, but YouTube has only data on YouTube. It would be great to have them, but the search is about letting person to find what they are looking for. And for that the exact usage data is not necessary.

>A pure search engine without hosting the actual video bytes and only links to others doesn't have the same virtuous loop of data feedback

Yes, but again, that's the same thing as when Google points you to a page. Yes they hold the HTML page, but you are not reading it there. They don't know if you read what's on the page, what you did there, etc. Same as we don't know what you did with the video once you got there.

>I disagree that it was billions in marketing https://www.wsj.com/articles/tiktoks-videos-are-goofy-its-st...

>Back in 2005, Google was also the default search engine for AOL and Yahoo and yet a little upstart like Youtube (without billions) was beating Google's own Video service! The irony! Susan Wojcicki was the early employee of Google who convinced Larry Page that they were losing and to acquire Youtube instead.

Nuances matter. Google bought YouTube not because it was beating their video service but because they were perceived by most users as a search engine and it posed huge competitive challenge to their core business. Why did Facebook buy Instagram?

>I disagree with a CEO who boils down his business disadvantages to "marketing" expenditures.

Fair. You are free to go and run with it. You don't have to agree with me. I gave you my reasoning why I am not doing it.

>If millions of others agree with you, that proves you have a compelling product. You don't need billions. Neither the early startup Google nor early Youtube had that type of money. Every company that seems formidable and unbeatable today because they have billions in the bank was previously a small company that didn't have billions. Likewise, you can be one of those companies that finds a way to billions instead of falling back on lack of marketing as the roadblock.

I don't think you understood. We built quite large and growing company already. I have no interest entering the market as I found better business for myself and the company. I shared with you my analysis of why I decided against entering the market. Again, you don't have to agree with it.

Why not a chrome extension that would overlay recommendations on top of a regular youtube page? Then you just have to convince users to install it once.
There is no business there. Plus it's not just YouTube. The UGC world is significantly bigger than YouTube.
I don't feel these unavoidable disadvantages are as black and white and you describe. To circumvent the fake title/bait & switch practices, you could allow users to flag videos as misleading, vote them as spam, etc. Put the number of flags next to the video, add the ability to not even see flagged videos.

Further, scripts could be used to report video play statistics to the scrape site. Drawback is users would have to add the script to their own site, but adding analytics scripts is common practice.

>To circumvent the fake title/bait & switch practices, you could allow users to flag videos as misleading, vote them as spam, etc.

And now you've re-created the "Amazon 1-star ratings left by competitors instead of real buyers" and/or the "false DMCA takedown claims of public domain or original music".

E.g. an honest person creates a "How to sew a mask at home to protect from COVID" which is the real content but a bunch of coronavirus deniers falsely flag that video as "child pornography".

Game theory is hard and everybody has to think through all the future chess moves that adversaries will use.

Damn you're totally right. You'd have to have a subscription fee and a team of mods to fix that.
> To circumvent the fake title/bait & switch practices, you could allow users to flag videos as misleading, vote them as spam, etc. Put the number of flags next to the video, add the ability to not even see flagged videos.

And who would flag the bad users?

This just increases slightly the discomfort of bad guys, but does not prevent it at all. Instead they will flag competitive videos and such.

Nobody has solved "decentralized trust" yet, if ever.

What if instead we had the recommendations from the main sites YouTube, Vimeo, Dailymotion and if you want your personal video hosting site to be recommended you would fill in a form to add you sites to the collection of sites the recommendation system crawl through?

This might be helpful to avoid bad quality recommendations?

We definitely need to decouple discovery from hosting.

I'm not even sure you would need to do any crawling. Maybe you could just let people submit links to videos hosted elsewhere. Kind of like you can submit any URL to the Internet Archive. But in this case you would use the frequency of duplicate submissions (along with in-house upvotes) to build out a recommendation platform.

I like the idea of Peertube for this, and a search engine popped up the other day too

Search: https://sepiasearch.org/

Hosting: https://joinpeertube.org/

Does it include videos from platforms other than PeerTube, though? If it doesn't support at least YouTube that's going to be a huge uphill battle to adoption.
We definitely need to decouple hosting from Google. It's lunacy to have a giant actor of debatable integrity (ie: Alphaboogle) hosting video for the entire internet — the entire internet!!!
Discovery is probably the only reason why I still use YouTube. If somebody could make something that provides a non-Google recommendation system with tools to categorize/organize the videos I've watched/want to watch, that'd be great. Assuming the UI is at least on par with Google's, I'd switch in a heartbeat. Not that their UI is particularly good, but the alternatives I've seen (like PeerTube) are frustrating to use on a basic moment-to-moment level.
The tricky part is getting enough adoption for network effects to make it generally useful. So you need it to be useful to individual users before it has lots of users. Sort of like how Goodreads is useful for keeping track of books you've read/want to read, even if you never read the reviews or look at the lists that are a result of network effects.

The question is what features of a video aggregator would be useful to the individual? Watched list? Playlist management? URL shortening for sharing videos? Better UI? Export list to youtube-dl?

Content creators don't want to leave the platform because they profit via ad-revenue, monthly subscriptions and superchats.
>Content creators don't want to leave the platform because of discoverability

Discovery is still incredibly hard on YouTube. YouTube is a search engine, adding another one won't fix that. I'm also skeptical that anyone could do search better than Google. When creators say "they can't leave because of discovery", they mean "I spent 10 years telling people my address is youtube.com/creator, and if I lose that, most people won't know where to find me."

Regardless, most Content Creators don't leave the platform because YouTube is/(was) the only platform paying people for content. While the type of content that is watched by tech professionals may do incredibly well with Patreon, that isn't true for several other types of content. This IMO, is why Facebook was unable to get people to create for FB Video (Unless you were a corporation that handled monetization already), it's a large reason why Vine creators complained about the platform and it's something Twitch and TikTok get right.

This, to me, is the real issue with starting a YouTube competitor. It's far easier to innovate on the format (Twitch = livestreams, TikTok = clips+ML), then it is to offer "video hosting" but "better". If you want creators to create for your platform, then you have to pay them, which means you either have to get advertisers (and deal with the same content cleanup YouTube does) or have people pay (and deal with that chicken and egg).

I believe Twitch was early enough to not have to go the "pay creators on day 1" route, bu Musical.ly (what eventually became TikTok), Mixxer, and Quibi all had to open their wallets and spend a ton on marketing and exclusives to get people on their platform (and two of them failed hard).

It's probably easier to start with a niche since there are many types of videos hosted on YouTube that differ a lot in the way and intention they were created. For instance shows created for YouTube are completely different from videos that exist independently from YouTube and have been uploaded there. Personally I rarely watch videos from the first group but at the same time I find it hard to discover entertaining content although I subscribed to a lot of channels. Tech talks I find good to discover though, the recommendations for that work really well I think.

Of course YouTube has gone a long way and is a well working compromise for most formats...

> "I spent 10 years telling people my address is youtube.com/creator

channel urls are like /channel/UCoMdktPbSTixAyNGwb and i ve never heard youtubers spelling out usernames, but rather to search for the name

if one would start a video link service, they d start with a short url (+stats) maker

i think you re right about monetization though. As bad as google is at least they pay content makers, unlike the rest of big tech

I think the key here is a deep misunderstanding where YTs value to creator is: It's not just recommendations, it's the fact that they take care of ad pipeline which ends up as pure revenue for the creator. Enabling monetization for videos is very simple on YT and you get a paycheck at the end.

Any competing platform needs to compete with this value proposition: you upload video, you get money.

I think this a great idea! YouTube recommendations are extremely lacking and is leaving a ton on the table in terms of a great user experience. I don’t trust that I’m being recommended content that they think is best for me. Rather is it based on if the creator paid Youtube to promote their video?
I like this idea. There are going to be a million reasons why it won't work that people will bring up, but ultimately I suspect they're all addressable.

One concern that would have to be addressed upfront would be to avoid having this turn into another Youtube. For example, once you have a great video aggregation site, there will be a desire to start hosting videos yourself. At that point you can start monetizing them and will inevitably want to promote that content, turning yourself into another Youtube. Basically once you control the demand, the supply side is easy to displace. I guess making it open source could address some of this.

Also, should definitely call it opentube.com (already taken of course, but surprisingly not in use).

What about monetizing the recommendations themselves like some other commenters suggest. Would you pay for high quality recommendations?
Interesting concept. So you propose to have a landing page with a selection of videos tailored to the taste of the visitor? And seamless combining results from YouTube, Vimeo and others?

You would be effectively relegating YouTube to a commoditized CDN for videos.

Again, interesting concept.

This is exactly my idea!
But how would you make enough money to support a popular service? Will youtube allow you to place ads next to the videos on your site? The users would still receive the pre/mid/post roll ads.
I'm thinking of first targeting a very specific niche of content and making a recommendation client out of that. If the recommendations are really top notch than maybe people interested in that niche would be wiling to pay to get them (This is what another commenter suggested).
That's the core problem with such ideas: Blekko, metacafe et al have tried it. It's getting people to pay for this service that seemed an impossible hurdle to cross.
I think the topic you may want to explore is "SEO for video" and I'm not sure there's really a definitive guide.

Google has this guide: https://support.google.com/webmasters/answer/156442

Moz has an article from 10 years ago. https://moz.com/blog/video-seo-basics-whiteboard-friday-1108...

My first thought would be that need at least a text-based transcript for crawling, but something akin to audio description would probably be most valuable too.

Peertube is great, but discovery is definitely a problem. Search and recommendations are the way to go.

https://peertube-search.com/

> Why don't we just make a platform that crawls the internet for videos and build a recommendation algorithm.

You are just doing YouTube without the advantage of having to use YouTube's infrastructure. You'll still have a central authority deciding what to recommend and whatnot. The main issue with YouTube is not the fact that they host the content, it's that they also decide what gets shown and to whom.

This is a really great idea.

You can skip the complex mathematics to begin. You'll millionaires before it matters.

Is look at adult sites. Why one site has not taken over. Could be a local maximum. Maybe they get paid off by advertiser's before they get top big.

The hard bit is political. I want my Stormfront videos recommended (by default, like early Youtube) but not a dude shooting himself (gore). But I want to find things if I want.

For best user experience centralized is the way to go. Imagine a video goes viral hosted on a shared server. Not a good experience for anyone. You could build a cache layer on top of it but then how will you reward the creator?

Are you expecting people setting up, and maintaining their own servers to host videos? Those times are gone. Content creators want to upload where millions of users are already present.

No the idea is to just make the recommendation platform. There is no hosting going on. If though someone decides to self-host at least they'll be discoverable through the recommendation platform.
A national public cloud then. Paid by taxpayers for taxpayers. As a nation we are thousand of times wealthier than Youtube.
You are describing Metacafe from back in the day.
Videos from independent creators won't be able to go viral. Your viewership would be bounded by your network throughput. Even if you're already popular, a disproportionate number of your views will come as soon as you upload a video, so you would be end up paying for a lot of bandwidth you don't need.
YouTube recommendation sucks. It sucks so much. But YouTube has so much content. If you can build a page to link to YouTube videos, you don't even need to host videos. I would use anything but YouTube's homepage, I think it is a good idea.
Not sure if you really want to crawl and make suggestions based on that. It might be better to let people self host videos and essentially provide your service with an RSS feed of videos with some metadata provided.
Remember, convenience trumps all other concerns in capitalism. So your distributed solution has to be as convenient, or more so, to use than YouTube for content creators. You also didn't mention monetization, and again, it needs to be convenient. Allowing "roll your own" monetization is great, but putting a default revenue method is essential.

I think it can be done. The product looks to me like a cross-platform thick-client application that manages your content, both serving it locally, but also putting it on various platforms. I think it could be a real winner if done right! You might even be able to give it away, open source it, if you can sell default space, like Mozilla does.

The obvious starting point for this product would be the content creation tool itself. I'm sure Adobe (and Apple!) would love it if this was a reality, just to stick it to Google. The enemy of my enemy is my friend, after all.

Honestly, I think if you deliberately left monetization out, you could really improve the quality of videos you host. Gone would be all those "filler" videos that exist just to juice the ad algorithm. You'd also lose all that "Hey guys! Remember to Like and Subscribe!" junk because there's be no monetary incentive.

When I go searching for videos, I really don't want to see all those videos people pumped out in order to make a buck. I have TV for that. I want to see silly cat videos and how-tos from people's garages.

Not sure why you're downvoted, I think its a useful contribution. There is a TON of horrible content on youtube - text-to-voice slideshows that are clearly the output of someone clever enough to put it together, but not clever enough to actually contribute anything to society. If they had to host their own stuff, I think that would put a damper on it too. OTOH if you self host there would be NO gatekeeping at all, for good or ill (and honestly, I think it would turn out pretty badly if todays info-space is any indication).

The key, I think, is to overlay a network of blacklists over distributed content, and enforce one new law: communication requires consent. I can see a new industry of small, interconnected groups of curators helping individuals weed out the noise - all while maintaining an individuals right to consume as much noise as they want. Reputation would no longer be a single number, but rather in the context of these groups, to which individuals voluntarily belong.

The important thing would be to connect content to the creator, such that the blacklist is more effective. So, it may not be a real identity, but it should be consistent: that is, your id can't be used to find you in the real world, but it can (and should) be associated consistently with everything you make and distribute. (Although I would certainly want some sort of decay function there, since I don't think people should answer to the same degree for the stupid shit they did 10 or 20 years ago.)

This sounds awfully lot like metacafe.com , or mamma.com from even before - the so-called metasearch engines. You perhaps should try and see if you can monetize it.
Content creators probably don't want to host the videos themselves.

Apart from YouTube, what is there? Dailymotion? Vimeo?

Well, there is Nebula which is subscription-only video hosting which some youtube creators are pushing, buy it isn't general purpose hosting.
So, like Google Videos search?
Google video search does not recommend. You must make a search query. I'm talking about actual recommendations that you didn't explicitly search for which is where discoverability stems from.
So you intend to collect detailed view data to feed your model? Kind of like a browser extension spying on every video you watch anywhere, collecting detailed data about how you watch, etc? Don't count me in.
I'm thinking more about a website (Client) where you get the recommendations then if you click on one of the recommended videos (either by watching the embed or redirecting from the website) we'll assume that you liked it.

Outside of this website (client) you won't be tracked.

openvideodata.org wants to bootstrap open recommendations by scrobbling videos