Hacker News new | ask | show | jobs
by cattown 1150 days ago
I believe that laundering licensed or copyrighted content for reuse that fails to recognize the original authors or usage restrictions is likely to be one of the biggest commercial applications of generative machine learning algorithms.

I also believe this is where a lot of the hype about "rogue AIs" and singularity type bullshit comes from. The makers of these models and products will talk about those non-problems to cover for the fact that they're vacuuming up the work of individuals then monetizing it for the profit of big industry players.

8 comments

I don't think this theory holds up. Singularity concerns long predate LLMs and are mostly expressed by people who want OpenAI to stop what they're doing right now. Sam Altman has publically disagreed with AI doomers. If you're willing to believe that OpenAI is pretending not to be concerned but is quietly hyping the concerns up, I have to wonder what standard of evidence is letting you simultaneously write off the concerns as bullshit.
For me personally it's that everyone who is expressing these concerns has clearly done less critical thinking about the subject than your average extremely high teenager. When you ask them about details they get defensive, resort to even stranger ground like "Well a human is nothing more than an autocomplete" (clearly not true).
I don't believe that rogue AIs are a threat for the next few years, but the claim that the likes of Geoffrey Hinton have done less thinking about the subject "than your average extremely high teenager" is absurd.
The fear I have isn't an AI doing things by itself, but being good enough so that if Joe Evil gets his hands on the AI, he can single-handedly (with AI help) break into secure databases, or something.

You know how a lot of us on HN talk about how security is just a latent concern for companies, but luckily there aren't enough hackers to take advantage of the massive number of bugs in every bit of code ever written? Well, a future powerful coding AI running on second-hand Etherium mining rigs in some extremist's basement in Chicago can probably do a lot more damage than a handful of state sponsored hackers in Russian and North Korea!

Surely some guy in his basement will have access to far worse models than the people he is trying to attack. If the AI can be used for offense it can be used for defence, especially since when used for defence you can give the AI access to code/design docs which make finding exploits much easier.
I will personally pay you a $100 if this even gets close to happening in the next 1000 years.
Ok you contact me because I’ll forget in a thousand years.
Hinton would never agree with the stuff I read on a daily basis here on Hacker News, don't even try to suggest that he's one of these weirdos I'm talking about that's huffing on the idea that ChatGPT is going to replace programmers, LLMs are sentient, and that AI is going to take over the world.
I think cattown might be referring to statements such as this: https://www.theguardian.com/technology/2023/mar/17/openai-sa...

Not sure if I'd say there's a conspiracy per se, but I do think generative AI players are going to be careful about the optics of the technology and how it works. Anecdotally from speaking to non-technical family members there's very little understanding for how the technology actually works, and it seems there's not a great deal of effort to emphasize the importance of training data, or the intellectual property considerations in these companies marketing materials.

> what standard of evidence is letting you simultaneously write off the concerns as bullshit.

Negative marketing is good marketing. Look at all of us debating this scale theft promoting the brand of this non product.

Ok, so Sam Altman disagreed with AI doomers, great, but the point is still generally valid, for a couple of reasons:

1. What about Elon Musk and hundreds of other AI investors? It's in their interest to overhype AI, while temporarily slowing down competition by spreading singularity fears.

2. OpenAI released the GPT4 report where they claim better performance of their model than it's in reality [1].

[1] https://twitter.com/cHHillee/status/1635790330854526981

> The makers of these models and products will talk about those non-problems to cover for the fact that they're vacuuming up the work of individuals then monetizing it for the profit of big industry players.

Also why they claim these are "black boxes" and that they "don't understand how they work". They are prepping the markets for the grand theft that's unfolding.

I think you underestimate just how careful “real” businesses are when it comes to violating the (copyright) law. Any legal advisor at any corp will strongly advice against using code that’s generated like this, until there is clear legal precedent that it’s OK to do this.
Does that involve a ban of stackoverflow use as well?

https://stackoverflow.com/help/licensing

I don't think I've heard anyone warn people not to copy code snippets from stackoverflow due to licensing issues, although "real" businesses should be rightfully concerned.

It's already a common practice to put a StackOverflow link as a comment when you copy code from them. It provides valuable context to future readers.

That's probably enough for attribution, but I suppose one could copy the author name as well.

I think you underestimate how easy it is for developers to disregard what the Corp lawyer said about AI code gen tools.

Manager: "we asked, legal says you can't use copilot", dev: "okay, so from now on, I'll not discuss how I use copilot and will remember to disable it when someone sees me working, gotcha".

I'm not saying everyone will do this, I'm saying some people will know that the corp doesn't always have a way to verify how the code was written, and they will think that a lawsuit cannot really happen to them.

> Manager: "we asked, legal says you can't use copilot", dev: "okay, so from now on, I'll not discuss how I use copilot and will remember to disable it when someone sees me working, gotcha".

Manager: "Everyone else is running through their feature list faster than you. What gives? Remember, you're not allowed to use Copilot."

IC: "I'm not using Copilot."

Manager: "Remember, you're not allowed to use Copilot."

Doesn't Microsoft already use Copilot internally?
Of course if only used on internal software that isn’t distributed, then copying GPL code is fine. Until a developer inadvertently distributes it or copies code from one place to another…
Yep they do, but I did not see anyone generating chunks of gpl'd .NET code yet.
Microsoft puts out a lot of non-.NET code, including internally.
True, and that will cause a departure between companies large enough to worry, and all the startups that don’t.
AI will just make non-permissive open source licenses more pointless than they already are. The GPL and similar licenses have been on a slow death march for over a decade. AI isn't doing anything that Human Intelligence isn't already doing. Every single developer has looked at non-permissive open source code for inspiration.
The reason people can use code for inspiration is because of GPL and similar, do you see the problem with the logic you provided?

If all software starting being non-permissive and closed source, there would be no training data and no new innovation and even if there was, it would probably suck like it did before GPL and similar licensing was mainstream.

Yup. gg, gpl
"those non-problems"

Why is that a non-problem? It's a really important concern that we need to take more seriously

I pasted this from another comment I wrote but:

The concerns about AI taking over the world are valid and important; even if they sound silly at first, there is some very solid reasoning behind it.

See https://youtu.be/tcdVC4e6EV4 for a really interesting video on why a theoretical superintelligent AI would be dangerous, and when you factor in that these models could self-improve and approach that level of intelligence it gets worrying…

I don't think the reasoning is solid at all. I mean yes, a theoretical superintelligent AI would be very dangerous, but I see exactly no reason to think that current models could get there.
Yeah feels a bit like we invent planes and worry about wormholes and time travel.
I don’t think we’re as far off as you think
People had no reason to believe that today's models would exist.

We are on this part of the ai takeoff graph. https://waitbutwhy.com/2015/01/artificial-intelligence-revol...

> People had no reason to believe that today's models would exist.

People had no reason to believe one day we would finally understand what causes the thunder. We finally did, and it is not made by Zeus.

That's not exactly true. There was plenty of reason to believe that. The only question was what the timeline would be.
Personally, I wasn't expecting anything as good as GPT-4 so soon. So I no longer have any real confidence in how far away 'real AI' is, whatever that means.

I would not be shocked to find out that AGI (using Altman's definition) is more than 50 years away, but I also would not be shocked if it came in 5.

It's really hard to know how scared to be, I think that rationally I should be pretty terrified but I'm not.

Well hardware and parameter count are scaling exponentially, so it seems very feasible that it could happen very soon. Of course it's possible that we'll hit a wall somewhere but it seems that just scaling current models up could be enough to get to the point where they can self-improve or gain more compute for themselves
We've been out of exponential territory for a few years now (https://en.wikipedia.org/wiki/Moore%27s_law). Yes, we are still bounding forward at a crazy pace, but I think the pace is slowing down somewhat
Hardware isn't scaling exponentially anymore (Moore's law is dead). Parameter count isn't really scaling exponentially anymore either. GPT3 had 175b parameters 3 years ago. There are some attempts at training 1 trillion parameter models, but they are not better than GPT3.
While I agree we probably aren't getting exponentially increasing parameter counts (GPT4 is by all accounts 1T paramaters and of course, it is significantly better than GPT3) we are still seeing lots of improvements - 3.5 is much better than 3, based "just" on InstructGPT/RLHF training. Models are getting better as well - LLaMA 30B beats/matches GPT-3 on raw eval benchmarks at 1/6 the parameter count.

We're also seeing lots of optimizations with new models (RoPE/RoPER embedding, Swish/GeLU activation, Flash Attention, etc) but I think some the most interesting gains we'll be seeing soon is with inference-optimized training (-70% parameters for +100% compute) [1] combined with sparsity pruning (-50% size w/ almost no loss in accuracy) [2] and quantization [3] which will lead to significantly smaller models performing well.

[1] https://www.harmdevries.com/post/model-size-vs-compute-overh...

[2] https://arxiv.org/abs/2301.00774

[3] https://openreview.net/forum?id=tcbBPnfwxS

What I doubt is that the current approach can lead to AGI at all, regardless of scale. But I'm just speculating along with everyone else. We will see.
as moores law is dead it's hard to see more exponential scaling

they're also not going to find another 2, 4, 8, 16 ... internets worth of content to parasitise

It’s still exponential, but a little slower. (edit: wait, is that still exponential if it slows down?) Anyway we only need to get to human level (or maybe a bit less) and we’re not that far off (maybe 10 or 20 years at current rates of progress?)

Not all types of AI need external training data, you can train on how effectively a goal is achieved

> maybe 10 or 20 years at current rates of progress?

how can the rate be maintained?

exponential chip scaling is over, and they've parasited, sorry, trained on the entirety of accessible human knowledge

the rate may drop to zero

the exponent may even go negative once LLMs start ingesting their own hallucinations

I watched the video.

> has preferences over world states

I think that part is a leap. I don't think is given that a super intelligent AI will "want" things.

> presumably a machine could be much more selfish

This feels like we're projecting aspects of humanity that evolution specifically selected for in our species with something that is coming about though a completely different process.

> It's a mistake to think about it as a person.

I agree, but I feel like that's what these concerns about AI are doing, because that's what people do.

> (The whole stamp collector thing)

It also seems to me there is a huge gap between a super intelligent AI and the ability to have a perfect model of reality along with the ability to evaluate within that model the effect of every possible sequence of packets sent out to the internet.

> I think that part is a leap. I don't think is given that a super intelligent AI will "want" things.

But if it has no goal then it can’t act rationally or intelligently. Something like an LLM might not appear to “want” anything, but it “wants” to predict the next token correctly which is still a goal (though since it’s only related to its internal state it might be a little safer)

There’s another good video about why this would be the case here if you’re interested: https://youtu.be/8AvIErXFoH8

> This feels like we're projecting aspects of humanity that evolution specifically selected for in our species with something that is coming about though a completely different process.

That’s because evolution is a process that optimises for a goal. The only reason altruism is a thing is because it actually indirectly benefits the goal, which is for our genes to survive and be passed on, and fellow humans tend to share our genes, especially relatives (who we tend to be kinder to). AI training is also a process that optimises for a goal, but unless having humans around helps that goal it wouldn’t display any human empathy. In this case “selfishness” is just efficiency which a training process definitely selects for

> I agree, but I feel like that's what these concerns about AI are doing, because that's what people do.

I feel like they’re doing a pretty good job at modelling AI as a theoretical agent, which does share some similarities with humans because humans are agents, but the main mistake people make is assuming their goals will be similar to humans because human values are somehow a universal truth

> It also seems to me there is a huge gap between a super intelligent AI and the ability to have a perfect model of reality along with the ability to evaluate within that model the effect of every possible sequence of packets sent out to the internet.

That’s very true, it’s an unrealistic thought experiment, but it’s a a good introduction to the concept that something significantly more intelligent than us can be dangerous and pursue a goal with no regard to what we actually wanted

> but it’s a a good introduction to the concept that something significantly more intelligent than us can be dangerous and pursue a goal with no regard to what we actually wanted

I think thing significantly less intelligent can do this too. See any computer program that went wrong. I don't think that is a novel idea.

Perhaps it is a lack of imagination on my part, but I can't help but think, in this stamp collector example, someone would just be like "wait why are these machines going crazy printing stamps" and just like turn them off.

I feel like any argument on the dangers of superintelligent AI rests on the belief it can also use that intelligence to manipulate humans to complete any task and/or hack into any computer system.

I don't agree evolution optimises for a goal at all. IMO optimising for a goal means you first define a goal, then you work towards it.

Evolution has no goal, it's simply a process determined by chemical reactions. Any goals we attribute to it, e.g. "for our genes to survive and be passed on" are emergent phenomena, a rationalisation after the fact that that is indeed what's been observed.

It's plausible that AI "goals" emerge evolutionarily as well, but for that to happen we first need to create not AGI but Artificial Life, which is a huge leap from today, and I certainly don't understand how that's inevitable.

Then by that definition AI training has no goal, it's simply a process defined by calculations. But whether you want it call it a goal or not, the fact remains that they look very, very much like goals. "If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck."

> It's plausible that AI "goals" emerge evolutionarily as well

AI training is vaguely similar to evolution, except more efficient and directed

> Then by that definition AI training has no goal, it's simply a process defined by calculations.

No, the very definition of training is that there is a goal which to train for. Those calculations were created by humans with goals. For LLMs, the goal is token prediction.

Evolution has no training.

> then monetizing it for the profit of big industry players

Looks like LLMs are universally useful for individual people and companies, monetisation of LLMs is only incipient, and free models are starting to pop up. So you don't need to use paid APIs except for more difficult tasks.

It was already more than possible to just copy stuff, a court is not going to recognize a very convoluted way to copy stuff I don't believe.

The same thing is preventing intentional use of AI tools if you copy as is preventing regular copying, the willingness of the owner to sue.

It seems to me, from a copyright perspective, all commercial use of generative AI depends on whether the output is transformative fair use (vs derived work). While the courts will have its say, ultimately whether new rules are carved out or not is going to be again (as all copyright law is) based on commercial interests - I have the feeling that the potential productivity upside across all industries (and in terms of national interests) is going to be big enough that it'll work itself out largely in the favor of generative AI.

That being said, IMO, that's completely separate from the safety issues (that exist now and won't go away even if somehow, all commercial use is banned):

Urbina, Fabio, Filippa Lentzos, Cédric Invernizzi, and Sean Ekins. “Dual Use of Artificial-Intelligence-Powered Drug Discovery.” Nature Machine Intelligence 4, no. 3 (March 2022): 189–91. https://doi.org/10.1038/s42256-022-00465-9.

Bilika, Domna, Nikoletta Michopoulou, Efthimios Alepis, and Constantinos Patsakis. “Hello Me, Meet the Real Me: Audio Deepfake Attacks on Voice Assistants.” arXiv, February 20, 2023. http://arxiv.org/abs/2302.10328

Mirsky, Yisroel, Ambra Demontis, Jaidip Kotak, Ram Shankar, Deng Gelei, Liu Yang, Xiangyu Zhang, Wenke Lee, Yuval Elovici, and Battista Biggio. “The Threat of Offensive AI to Organizations.” arXiv, June 29, 2021. http://arxiv.org/abs/2106.15764.

I don't think most people have thought through all the ways perfect text, image, voice, and soon video generation/replication will upend society, or all the ways that the LLMs will be abused...

As for AGI xrisk. I've done some reading, and since we don't know the limits of the current AI paradigm, and we don't know how to actually align an AGI, I think now is a perfectly cromulent time to be thinking about it. Based on my reading, I think the people ringing alarm bells are right to be worried. I don't think anyone giving this serious thought is being mendacious.

Bowman, Samuel R. "Eight Things to Know about Large Language Models." arXiv preprint arXiv:2304.00612 (2023). https://arxiv.org/abs/2304.00612.

Ngo, Richard, Lawrence Chan, and Sören Mindermann. “The Alignment Problem from a Deep Learning Perspective.” arXiv, February 22, 2023. http://arxiv.org/abs/2209.00626.

Carlsmith, Joseph. “Is Power-Seeking AI an Existential Risk?” arXiv, June 16, 2022. http://arxiv.org/abs/2206.13353.

I think Ian Hogarth's recent FT article https://archive.is/NdrNo is the best summary of where we are why we might be in trouble, for those that don't care for arXiv papers.