Hacker News new | ask | show | jobs
by jasode 2101 days ago
>Why don't we just make a platform that crawls the internet for videos and build a recommendation algorithm. Therefore no matter where you host your videos you'll be easily discover?

Because there are fundamental tradeoffs of _information_ and _meta_information_ that a system has that factors into recommendations.

People upload about ~500 hours of videos to Youtube every minute.[1]

Yes, that means that centralization to that degree has negatives such as the guitar channel being deleted but it also has positives such as better (not foolproof -- but relatively better) detection of spam and mistitled content. Because Youtube servers actually have the real bytes of actual videos, they can scrub the content with machine learning algorithms augmented with some human oversight.

If you only have a platform that crawls for video urls and doesn't actually ingest the petabytes of video like Youtube, the "recommendations-only" service will be susceptible to gaming such as people putting "Emma Watson and Scarlett Johansson nudes" as the titles but the actual video is an ad for somebody trying to sell a used car. Or the first few days of a video url has "Tips to deal with COVID" which is real but the video hoster later switches it out to be a video for that same url to be something else.

Also, a centralized service like Youtube can measure actual user behavior such as "watch time" (because Youtube's servers know they are sending bytes to the client) to see if the video is actually engaging viewers and this factors into future recommendations. If web surfers are closing out a 10 minute video after 20 seconds, the videos are appropriately penalized because users are "voting" that the content is bad. I.e. the viewers "voted" without even having to press the "thumbs down" button.

A new platform to recommend videos pointing to decentralized video hosting has advantages but also has unavoidable disadvantages that lower the quality of the recommendations.

EDIT to add my previous comments about the financial incentives (ads) that help video content creators which many techies overlook: https://news.ycombinator.com/item?id=21506992

[1] https://www.google.com/search?q=upload+about+~500+hours+of+v...

3 comments

You bring some interesting points, but you are off the mark on what's the true challenge.

We [0] actually indexed "all" content on YouTube and dozens of other platforms. Not just metadata, but the videos.

We could easily run search engine like this. In fact, we have explored it in the past. We also came up with page rank-like algorithm that allows us to bootstrap the experience and eventually move to usage based sorting.

The true challenge is in customer acquisition. Google has such a strong hold on the browser market [1], that there is just no way to convince people to "go to pex" and search for videos. You can see it with DDG which after more a decade its penetration [2] is a fraction of Google's. Even Microsoft, that plunged billions of dollars into Bing is not able to acquire more than just few % of the market [3].

Finally, there are literally no investors to back this. After the fall of Blekko [4], nobody will touch this market. We spoke to dozens of investors. Most have a default position to "never compete with Google".

Up until this doesn't change, there is no way to bring it to the market.

[0] https://pex.com

[1] https://www.theverge.com/2020/7/1/21310591/apple-google-sear...

[2] https://duckduckgo.com/traffic

[3] https://gs.statcounter.com/search-engine-market-share

[4] https://en.wikipedia.org/wiki/Blekko

>Not just metadata, but the videos.

But calculating the fingerprints of the videos rather than store on disk and serve all the exabytes video means you're still at a fundamental _information_ disadvantage.

You already know the following but I'll spell it out for readers who may not see the distinction: web surfers don't watch fingerprints... they watch the actual video bytes.

Youtube employees and researchers found out that user behavior such as actual watch time of the videos was a _stronger_ voting signal of quality than the user hitting the subscribe button. To replicate that measurement, how would Pex get similar usage data without actually storing the exabytes of videos or convincing millions of web surfers to install a browser plugin to spy on their youtube usage?

>The true challenge is in customer acquisition. Google has such a strong hold on the browser market [1], that there is just no way to convince people to "go to pex" and search for videos.

I'm not convinced of your claim that the Chrome browser is the competitive moat that prevents Pex from rising up. E.g. Tik Tok got very popular without Chrome browser help. As another example, I found out that a gardening expert[1] gets most of her views from Facebook-hosted videos instead of Youtube. She gets more than 3x the views on Facebook. I don't have a Facebook account so I watch her on Youtube but it turns out I'm actually in the minority of her audience.

[1] https://www.youtube.com/c/gardenanswer/videos

>But calculating the fingerprints of the videos rather than store on disk and serve all the exabytes video means you're still at a fundamental _information_ disadvantage.

>You already know the following but I'll spell it out for readers who may not see the distinction: web surfers don't watch fingerprints... they watch the actual video bytes.

This is like saying "nobody goes to Google to look at links, they want the actual pages". These are search engines. People are more than happy to use them to find what they are looking for.

>Youtube employees and researchers found out that user behavior such as actual watch time of the videos was a _stronger_ voting signal of quality than the user hitting the subscribe button. To replicate that measurement, how would Pex get similar usage data without actually storing the exabytes of videos or convincing millions of web surfers to install a browser plugin to spy on their youtube usage?

Exactly the same way Google does it in their own search. User clicks on a link based on a keyword which creates a loop that you feed into the system.

We also have a lot of information. For instance we know what video is deployed where on the web, which pieces (down to 1s) are being taken out and how they are utilized and also general performance of the content on each platform. This way we are able to show things like "here is the best part of this video" or "here is the first occurrence of this video" or "here is the longest version of the video".

>I'm not convinced of your claim that the Chrome browser is the competitive moat that prevents Pex from rising up

I can tell you didn't read the link I posted. I never claimed this because of Chrome. They are the default search engine on EVERY browser. Firefox, Safari, Chrome. The tyranny of defaults is quite substantial.

>TikTok got very popular without Chrome browser help. This feeds into my second argument. No VC will fund it. TikTok spent billions on ads to promote their app. My point was exactly that. It's not the technology that is the issue, it's the marketing. We would just never stood the chance.

>As another example, I found out that a gardening expert[1] gets most of her views from Facebook-hosted videos instead of Youtube. She gets more than 3x the views on Facebook. I don't have a Facebook account so I watch her on Youtube but it turns out I'm actually in the minority of her audience.

YouTube is big, but small part of the UGC world, which is quite diverse. That's why a search engine would make sense.

>This is like saying "nobody goes to Google to look at links, they want the actual pages". These are search engines. People are more than happy to use them to find what they are looking for.

That's not my point. Yes of course the websurfers click away from a search engine to the original source url to read a blog etc. My perspective was not the websurfer but the statistics aggregation of websurfers playback behavior for Youtube as a factor to recommend videos for other websurfers.

For non-video search engines like Google/Bing/CommonCrawl, the web spiders download the actual HTML text and also execute some of the pages' javascript to add into their own index which is similar to users uploading actual bytes of video data. Google can then apply extra analysis on their copies of others documents. Pex fingerprints of videos are not the same. Unlike html of text pages with url links for Pagerank to cheaply extract and exploit, videos do not have a built-in "link" structure to other videos and don't form a graph for analysis. Therefore, signals like measuring actual user behavior drives a lot of the algorithm for recommendations. They have lot of this data ... because they host the majority of videos... which happens because hosting videos is expensive.

A pure search engine without hosting the actual video bytes and only links to others doesn't have the same virtuous loop of data feedback. It's fundamentally missing information that Youtube has. If I search Pex for "how to fix a faucet" and I click on a search result url that takes me to "http://ugc.com/fixfaucet.mp4" and it turns out it's a bad video and I abort the playback after 10 seconds, how would Pex know of my dissatisfaction based on realtime behavior happening on another website you don't control? In contrast, Youtube knows about my dissatisfaction without me having to click "thumbs down" icon.

> For instance we know what video is deployed where on the web, which pieces (down to 1s) are being taken out and how they are utilized and also general performance of the content on each platform. This way we are able to show things like "here is the best part of this video"

If you don't have visibility into the actual play/pause/stop buttons of the Youtube's video player and the browser's close-tab button, how do you get the same user behavior data Youtube has?

>They are the default search engine on EVERY browser. Firefox, Safari, Chrome. The tyranny of defaults is quite substantial.

And yet Facebook's walled garden of video hosting has many content creators with higher audiences than Youtube regardless of Safari/etc browser defaults for Google. Tik Tok found success as well. I disagree that it was billions in marketing.

Back in 2005, Google was also the default search engine for AOL and Yahoo and yet a little upstart like Youtube (without billions) was beating Google's own Video service! The irony! Susan Wojcicki was the early employee of Google who convinced Larry Page that they were losing and to acquire Youtube instead.

>TikTok spent billions on ads to promote their app. [...] It's not the technology that is the issue, it's the marketing. We would just never stood the chance.

I disagree with a CEO who boils down his business disadvantages to "marketing" expenditures. Let's consider Google of circa ~2002. How did the underfunded Larry & Sergei _pay_ for the search default deals with AOL and Yahoo when they themselves didn't have billions to spend on marketing? Remember, they only had $25 million in VC capital. The way they partnered with the then much bigger AOL/Yahoo was to offer them a revenue sharing deal for the Adwords revenue. They had the better technology which AOL/Yahoo wanted and Google used a clever "arbitrage" to fund the deals.

As the CEO, you can use a similar playbook. You need to come up with a compelling technical product (maybe more than content fingerprints, etc) and then work out a clever financial arrangement (subscriptions, or licensing, or partnering with Bing/Apple/Amazon to beat Google, etc) -- that makes your lack of billions irrelevant.

>YouTube is big, but small part of the UGC world, which is quite diverse. That's why a search engine would make sense.

Ok, that's a fine business thesis. (Seriously.) A universal search engine that's superior to Youtube because your algorithm would transcend the entire UGC landscape should easily prove to websurfers that they get much better recommendations.

If millions of others agree with you, that proves you have a compelling product. You don't need billions. Neither the early startup Google nor early Youtube had that type of money. Every company that seems formidable and unbeatable today because they have billions in the bank was previously a small company that didn't have billions. Likewise, you can be one of those companies that finds a way to billions instead of falling back on lack of marketing as the roadblock.

>Pex fingerprints of videos are not the same.

We also download the content to our servers. We analyze it, hence the fingerprint, and then delete the content. So yes, it's not apples to apples, but it works very similarly.

>They have lot of this data ... because they host the majority of videos... which happens because hosting videos is expensive.

Fair, but not that necessary. Usage data is great, but YouTube has only data on YouTube. It would be great to have them, but the search is about letting person to find what they are looking for. And for that the exact usage data is not necessary.

>A pure search engine without hosting the actual video bytes and only links to others doesn't have the same virtuous loop of data feedback

Yes, but again, that's the same thing as when Google points you to a page. Yes they hold the HTML page, but you are not reading it there. They don't know if you read what's on the page, what you did there, etc. Same as we don't know what you did with the video once you got there.

>I disagree that it was billions in marketing https://www.wsj.com/articles/tiktoks-videos-are-goofy-its-st...

>Back in 2005, Google was also the default search engine for AOL and Yahoo and yet a little upstart like Youtube (without billions) was beating Google's own Video service! The irony! Susan Wojcicki was the early employee of Google who convinced Larry Page that they were losing and to acquire Youtube instead.

Nuances matter. Google bought YouTube not because it was beating their video service but because they were perceived by most users as a search engine and it posed huge competitive challenge to their core business. Why did Facebook buy Instagram?

>I disagree with a CEO who boils down his business disadvantages to "marketing" expenditures.

Fair. You are free to go and run with it. You don't have to agree with me. I gave you my reasoning why I am not doing it.

>If millions of others agree with you, that proves you have a compelling product. You don't need billions. Neither the early startup Google nor early Youtube had that type of money. Every company that seems formidable and unbeatable today because they have billions in the bank was previously a small company that didn't have billions. Likewise, you can be one of those companies that finds a way to billions instead of falling back on lack of marketing as the roadblock.

I don't think you understood. We built quite large and growing company already. I have no interest entering the market as I found better business for myself and the company. I shared with you my analysis of why I decided against entering the market. Again, you don't have to agree with it.

Why not a chrome extension that would overlay recommendations on top of a regular youtube page? Then you just have to convince users to install it once.
There is no business there. Plus it's not just YouTube. The UGC world is significantly bigger than YouTube.
I don't feel these unavoidable disadvantages are as black and white and you describe. To circumvent the fake title/bait & switch practices, you could allow users to flag videos as misleading, vote them as spam, etc. Put the number of flags next to the video, add the ability to not even see flagged videos.

Further, scripts could be used to report video play statistics to the scrape site. Drawback is users would have to add the script to their own site, but adding analytics scripts is common practice.

>To circumvent the fake title/bait & switch practices, you could allow users to flag videos as misleading, vote them as spam, etc.

And now you've re-created the "Amazon 1-star ratings left by competitors instead of real buyers" and/or the "false DMCA takedown claims of public domain or original music".

E.g. an honest person creates a "How to sew a mask at home to protect from COVID" which is the real content but a bunch of coronavirus deniers falsely flag that video as "child pornography".

Game theory is hard and everybody has to think through all the future chess moves that adversaries will use.

Damn you're totally right. You'd have to have a subscription fee and a team of mods to fix that.
> To circumvent the fake title/bait & switch practices, you could allow users to flag videos as misleading, vote them as spam, etc. Put the number of flags next to the video, add the ability to not even see flagged videos.

And who would flag the bad users?

This just increases slightly the discomfort of bad guys, but does not prevent it at all. Instead they will flag competitive videos and such.

Nobody has solved "decentralized trust" yet, if ever.

What if instead we had the recommendations from the main sites YouTube, Vimeo, Dailymotion and if you want your personal video hosting site to be recommended you would fill in a form to add you sites to the collection of sites the recommendation system crawl through?

This might be helpful to avoid bad quality recommendations?