Hacker News new | ask | show | jobs
by itsankur 492 days ago
Yeah I mean the salad example is great because in that case it feels ridiculous to have to look at ads.

Maybe we don’t need to protect websites that rely on advertised content? Maybe it makes a better internet?

Based on what you’re saying here it sounds like good content will inevitably go behind paywalls in this case, and the number of paid subs people need to have will only continue to explode. Aka no free internet. Maybe that was bound to happen one way or another.

And all LLM providers will pay their data providers directly by API or something?

original internet (DARPA) was meant for communicating research and science. Maybe we return to that being the primary thing that will continue to be free?

1 comments

And him and his cofounder created a service that scrapes websites and uses AI for something or other…

https://www.goharvest.ai/

We built Harvest to reduce the pain of gathering web data by clicking through websites to copy data into excel sheets, databases, and CRMs. Something millions of people do everyday.

We recognize it as an unrewarding, tedious, and time-consuming thing humans have had to do until the latest abilities of browser agents.

As we built and learnt more about the industry we started to understand the underlying problems. For 99% of web sites web scraping isn’t the problem, the lack of compensation is.

We think there’s actually a better way to do this. If there’s enough demand, we can facilitate a rev share between agent scrapers and websites. Scrapers will pay less than what they pay for proxies and websites get a new revenue stream.

These are our thoughts at least so far. We aren’t ashamed of what we’ve built by any means in the way your comment implies lol. We want to see if we can benefit both parties in a win-win marketplace.

So what you are doing is scraping the data without asking permission and using AI so people won’t have to go to the original site.

How is what you’re doing any better than what you are complaining about?

1) Public websites don’t require any more permission than taking photos of a public storefront. We abide by privacy laws and make sure we don’t overload website servers.

2) We aren’t complaining. We’re curious how others view this topic and space because it’s a contentious topic. We recognize that we might be able to address the larger issue of lack of compensation for websites being scraped by facilitating a win-win marketplace (only loser is proxy providers).

So you really don’t care that “the rise of AI has put the free web at risk”, you care that it is putting your company at risk when you are doing the same thing and making the same argument that the companies training the models are doing?

Are you paying any content providers now?

Why didn’t you just admit that up front or at least disclose you have a business interest in being able to scrape others content for free?