Hacker News new | ask | show | jobs
by capableweb 1258 days ago
What about a Humans.txt (different than the existing one https://humanstxt.org/ ) that can dictate what humans can do with the information on my website? I might want to publish my thoughts, but I don't want people to be able to use that information, so I could have a humans.txt that forbids people from learning or remembering anything they read.

Like the footer in some emails about "Reading this email if you aren't the intended receiver is forbidden" but for websites, so I can protect my writing without having to suffer the consequence of not being able to publish it.

5 comments

Ridiculing a reasonable position that is worthy of discussion and debate isn't a good way to bring people to your side.
Maybe it's borderline ridiculing (though I don't think so) but I do think I'm offering a different perspective on what's happening with AI; it's "learning", not "owning" or "taking" anything, just like a human brain (obviously not scientifically just like a human brain, exaggerated for easier understanding).

And I'm also not trying to bring people to any side, I'm not personally for or against AI learning on publicly available material, just providing a different perspective by veiling it in a slightly contrived example made with the position but "against" humans rather than AI.

Even if you think the model is learning in the exact same way a human does, you can't pretend it uses the information in the same way. A human, if it really wanted to, could study your style and imitate your blog posts maybe once a day, with varying levels of quality. An AI can do that a million times a day, perfectly, forever, basically for free. It's simply not a good comparison.

Nobody is concerned with AI learning from their content, they are concerned with how the AI will use their content, and there is no useful comparison between AI usage and human usage.

It’s a machine learning algorithm. It’s not learning like humans do, it’s not producing output like humans do.

It’s not human… Its not even GAI. It has no inherit rights, deserves no inherit rights.

Can we all please stop anthropomorphising a computer algorithm?

No, it's not learning like a human, it's fitting parameters to a function. AI will never conduct a science experiment to determine the veracity of the data it's indexing, or have an emotional reaction to it.
I bet it can do some experements if given a (thoughtful) request, resources/access. (I had ChatGPT control my Linux box through me, it gave commands and reasoning for it, I gave it the output... Another time I had it write an answer, give search terms to verify the claims, I gave the output, and it fixed the answer with actual citations.) It can imitate reasoning and emotions better than some people I know at least.
The problem is copyright. Current AI landscapes are basically "copyright for me but not for you". I would be fine if we just abolish copyright. I wish for a world with AttributionRight instead of copyright.
I don't think you're being charitable enough. GP point is actually thought-provoking (at least to me): doing it for humans doesn't make sense. Does it make sense to do it for AI? The more advanced the AI, the less sense it makes.

I think it's related with the "substantive elaboration" test with regards to copyright (sorry I can't remember the proper term). If I just copy a scene from your movie, you can probably sue me. If I take some elements from it, but do my own thing so that it read more like a parody/comment / meta-comment, then it's ok.

Is AI just parroting back stuff or is it creating new things? A few years ago, I would have said it's just parroting. With the current models I think we are midway (it will depend on the propmt too!). In a couple of years my bet is that we'll be clearly on the "creating new thinfs territory".

We're not discussing self aware AIs like we see in science fiction; it's just a bunch of cool machine learning algorithms. I think that the word learning here is a bit of a red herring.

It's cool that the algorithms are generic enough an can be specialised with training data, but it's a very very far cry from anything resembling basic awareness, much less the level of awareness mammals have.

This is your opinion. Many experts (the majority right now?) do see the beginning of general intelligence and I would urge you to at least entertain the possibility that those experts are right.

(Also, general intelligence is not usually synonymous with self awareness in the literature. You might want to argue that, but it's, I think, a minority opinion)

That seems more like what you’re doing. The comment you’re replying to is making a reasonable point. Maybe try engaging with it?
I don't think that GP's stated opinion is any less reasonable than the article's.
Is it really that unreasonable to believe that an author:

a) is fine, and maybe even motivated by, the fact that a fellow human will read or watch their work to learn, improve their craft, grow as a person, etc.

b) is offended by the fact that a faceless multi-billion company will appropriate their work to create a for-profit product that will make their shareholders a little bit richer?

It is entirely conceivable that an author would have those views, I don't see why you should expect it to be the default view of an author.
"reasonable position"

It's not.

Similarities already exist in any kind of media. I've seen books stating that i couldn't lend it. Copyright is cancer, it hides knowledge, it hides everything, but these closed dataset/implementation form of AI strips copywrite, but hides itself anyway, it is no better. In fact it is worse than copywrite, because at least copywrite law was available to everyone, AI tools are just for the ones with $$$ and technical know how to train an AI on the necessary data (that also needs to be gathered and prepared). We where already hostages of the decision of big tech into our daily lives, with the advent of AI, we will be much more so.

Stallman was right, Free Software for a Free society.

You can do this but it has no power.

You can forbid and dictate terms but you are unable to enforce it, just like those email footers are a waste of space and have no enforcement power whatsoever.

robots.txt only works because the search engines agree to follow it. Likewise, the DNT browser setting is ignore by most despite it being a loud and clear signal to user intent.

How about calling it license.txt?
I had the same thought - we already have this concept with licenses and laws. When I buy a movie it comes with a bunch of terms limiting what I can do with it, if I publish something I can limit how other people are allowed to use it - including if they're even allowed to view it
> I might want to publish my thoughts, but I don't want people to be able to use that information, so I could have a humans.txt that forbids people from learning or remembering anything they read.

I lean anti-copyright, so I am fairly sympathetic to your point of view. But that sympathy comes from a general distrust of copyright and access restrictions in general, not out of a concern over the philosophical differences between AI and human creativity. So I will point out that quite literally every single argument around "well a human could do it, so why can't a computer" can also be used to argue against having a robots.txt standard.

Logging into HN, I had to fill out a captcha. Which... why, should we now have standards about when during the day humans are allowed to log in? Should I be able to have a text file that tells humans to make sure they only browse the site a few hours a day? Should I be able to have publicly available HTTP pages that follow a public standard that tell humans that they're not allowed to read my information? There's nothing philosophically special about a human vs a robot logging into HN. The robot is just doing the same thing that I do; it just does it faster, and why should that make a difference? /s

But scale/practical effects matter, and if I were to jump onto Hackernews and argue that human beings have a right to automate their website access (something I have actually argued on HN before[0]), I would instantly be jumped on by like 10 different people telling me that I was naive and sure in theory humans have the right to automate but in practice malicious/poorly-coded bots are destroying servers and abusing public access. I'd be told how I was letting ideals blind me to reality, and how the modern web just couldn't exist if we didn't have the power to ban bots.

And honestly, those are not unreasonable arguments.

What is different about AI? In the tech industry we have (for better or worse) generally accepted that it's not that unreasonable for websites to dictate that their content and features are for humans only. Maybe it was a mistake for us to accept that, but I don't see why AI should get a pass when the majority of this community is generally hostile to every other kind of software automation when it's performed against unwilling targets. I don't see how the people who are upset about their content being scraped for a training model designed to put them out of business are any more unreasonable than the people who are upset about their content being scraped by Selenium, or who are mad that allowing search indexing means their website gets hammered with extra requests that just cost them money and don't see ads.

The practical effects are very similar, and saying that technically philosophically the automated requests are doing the same things that humans do means very little in either discussion.

----

[0]: https://anewdigitalmanifesto.com/#right-to-delegate