Hacker News new | ask | show | jobs
by plasticeagle 557 days ago
I mean, that's fair to some extent.

But on the other hand you could have just purchased any cookbook that covers the basics, instead of taking all this web-scaped content without attribution or compensation. I mean, look, I totally get it and I'm certainly guilty of this too - but let's not pretend that we're not basically stealing other people's content here. Much of the time those people running those recipe websites are just trying to cover their hosting costs and make a squeak of money on the side.

A friend of mine tried to set up a website that would host open-source recipes for people - he called it The Open Sauce - but in the end there just wasn't enough input from recipe creators.

Also, and by the way, the top google hit for bechemal is this : https://www.allrecipes.com/recipe/139987/basic-bechamel-sauc.... Few ads, and the recipe is at the top of the page. No life story in sight.

2 comments

Apparently the recipe for bechemal sauce dates back to at least 1733. I think it’s pretty fairly in the public domain at this point. Those poor “content creators” are also just copying the recipe from someone else, just like chatgpt is. I’m sure I even own multiple cookbooks which cover the recipe - it’s just easier and faster to ask chatgpt than go hunting in my bookshelf.

I feel a little sorry for the good quality cooking websites out there. I’m just so burned by the bad ones that I’d rather skip the Google search. ChatGPT is also a straight out better resource because I can ask followup questions to chatgpt - “How much should I make for 6 people?” / “What is rue, anyway?” “It’s been a few minutes and my milk isn't thinkening. Am I doing anything wrong?” - etc. It’s an incredible cooking aide at my level of skill.

There are interesting parallels between LLMs and downloading pirated movies/shows.

In the first case its a trillion dollar business based on scraping the entire internet and sharing out a lossy, compressed version of the content with no attribution or financial contributions to the original creator. In the second case its a shady, technically illegal practice of scraping DVDs or online video streams and sharing a lossy, compressed version without attribution or financial contributions to the creator.

Maybe Napster just needed VC backing to make it seem legit.

> no attribution or financial contributions to the original creator

This is an interesting idea, but I don't think it makes much sense to apply that logic to classic kitchen recipes. Who, exactly, is the original creator here?

The common recipes I'm asking chatgpt about - crepes, homemade pasta or bechamel sauce - are hundreds of years old. We could extend your metaphor to say that the bechamel sauce recipe has been "pirated" by generations of cookbooks for hundreds of years. Chatgpt is just continuing the well established tradition of recipe piracy, in order to bring these amazing recipes to the next generation of chefs.

After all, allrecipes.com didn't invent bechamel sauce either. Do they make financial contributions to the original creator of the recipe? I think not.

I think the underlying question there, and one I don't have a solid answer for, is whether ChatGPT is considered to be scraping the underlying recipe or the webpage itself and all the content that goes along with it. The recipe may be centuries old potentially, but the page, content, images, etc are all content created and owned by the site creator

Edit: for a better example - Brothers Grimm stories aren't protected, but if someone makes a movie based on those stories the movie absolutely protected.

I think the real question is this: Is chatgpt "just copying" the content in its training set? What constitutes plagerism, exactly?

If ChatGPT is reproducing content verbatim from its training set, then I think the claim its violating copyright holds a lot of water. (And I think there was a NYT lawsuit claiming such - and I wish them well).

But if chatgpt learns from 100 recipes for bechamel sauce, and synthesizes them into its own, totally original description, then I don't see how what its doing is any different from what the authors of those recipe books & websites are doing. If anything, its probably synthesizing a lot more sources than any recipe author. If the only common factor between chatgpt's output and any specific source is the (public domain) recipe itself then that seems ethically in the clear to me.

I can't see a justification to criminalise what chatgpt is doing with recipes, without casting so wide a net as to open recipe authors up for persecution in the same way.

Scraping a website isn't illegal. When humans do it, we call it browsing the web.

At a minimum it's a big legal gray area. Writing a book review isn't illegal and requires no financial engagement with the publisher, but I can't actually find if SparkNotes or CliffNotes have to pay royalties. Those would be a pretty good parallel in my mind, they are doing more than a quick summary or review and are effectively compressing the content.

It feels wrong to me but that says nothing of the laws we currently have or how a judge would rule on it. Personally if I were on a jury I'd be inclined to side with the NY Times in their case against OpenAI, with the huge caveat that I only know the basic of their case and am not bound to only what's officially evidence.

Yeah, I feel the same re: NY Times. But thats because (iirc) the model was reproducing large parts of their articles word-for-word.

But so long as chatgpt doesn't reproduce any of its sources word for word, I don't think its a problem. Especially since cookbooks have been doing the same thing for centuries.

At least, I think that's where I would draw the line. But I agree - we're in very new territory. Who knows what a judge will think.

> Maybe Napster just needed VC backing to make it seem legit.

That's more or less what took Uber from criminal enterprise to mainstream.