Using a neural network for things that have clear cut rules is wrong. When you know the exact rules, implement them as such, instead of bruteforcing a guesstimation. This is also why I'm sceptical of the usr of GPT-3 for all sorts of purposes where accuracy is important. Think of the code generation case. Bugs may be very subtle and may go unnoticed.
I once (2010) had an oral exam in neural networks during which I had to design a system to solve a particular task. My solution used two neural networks connected by a simple logic circuit. The professor, who was actually a neuro scientist with little understanding of the technical, asked why not a third neural network. This was a neural network course, after all.
Thinking this was a trick question I excitedly explained how stupid it would be to build and train a network to approximate a function which could easily be precisely described with a tiny circuit or code statement.
The professor was not amused. He said that's what he would have done. After a few similar incidents he concluded the exam giving me an 8/10 saying my answers were perfectly correct but he didn't like my attitude.
Yeah, there’s a better way of saying that and it sounds like that was an expensive lesson in communication.
“Something something, it’s better to spend training time on the parts of the network where we don’t know the function beforehand than to train a subnetwork to do a function that we do know exactly at the outset.” and still you need to be ready to be asked about how to backpropogate through your hard coded function.
In a test you have to prove your knowledge by transmitting symbols through language. If you don't "sugar coat" it, how do you expect that the right symbols will be interpreted by the receiver? It is part of the test to use the appropriate language to ensure the best understanding of what you are saying. Nonviolent communication tries to do exactly that and is essential to this end.
You can argue that since the professor understood what was being said, the language shouldn't matter, but it does. If again you don't use the correct language you risk offending the listener so much he can't get past that. After all you are dealing with humans, not machines, and in either case you are responsible for clear communication.
And the contrary view is that the professor has a pronounced responsibility to see past unfortunate framing and phrasing of intricate subject matter details. Both are worthwhile goals, I think.
There's actually more to the story than what I intended to share to begin with.
I usually get along well with teachers and have enough tact to make do when I don't. But this person was such an extreme case. He was always grumpy, would regularly take several minutes out of lectures to call out and belittle any student who was late or he didn't think was paying attention. The only times he wasn't sour was when he was describing unethical and torturous experiments on animals. Then he was overly excited and giddy.
The exam was, as was common at that university, essentially an essay which was then reviewed by the examiner who would ask for any clarifications or push with further questions. This exam asked us to describe a system solving a fairly standard AI task using neural networks. My answer was a few pages split into a few sections: a brief overview of the whole system, followed by detailed descriptions of each part. Wherever I skipped details in the overview I'd written "(see p. X)". It took a bit of planning to get those numbers in there when writing that by hand with limited time.
When I finally gave the answer the professor skimmed through the overview and then proceeded to berate me for having neglected to write about some important detail. I politely pointed out that it was in the following section and that I'd included a page reference for things which I referred to before defining. He grumbled and kept skimming. After a while he complained about another thing he thought I really should have explained which was missing. I told him on which page he could find it and politely reminded him that he was reading the overview and that the details were in the next part.
This went on for a while and I got a bit more annoyed and a bit less polite each time, because it was getting ridiculous. After the tenth or so time I flat out told him that if he'd read past the overview, like I kept telling him, he'd find all the details there and if he didn't want to I'd be willing to describe any part of it orally but really, it is all there if he would just read it instead of complaining.
That's when he asked the question about a third network and I, still very frustrated, was relieved to get an actual question, and I was certain it was a trick question trying to get me to say I didn't think of it and then he'd call me an idiot and explain why it was unnecessary. I was really surprised it wasn't a trick and frankly a bit delighted at inadvertently insulting him.
When I told my wife (well, girlfriend at the time) she was a bit shocked that I'd been rude to a professor, but I didn't need those points and while it was very out of character for me I never regretted it.
For code, I could see it being super useful for a beefed up auto-complete. There are many times I find myself searching for things like "how do I do X in Y language" to copy a snippet that I'm sure has been written 10000x times before. I can review the code and verify its correctness by writing tests.
Either OpenAI or Microsoft demoed something similar to this some time in theist 12-18 months.
The fact that it’s not released (and also the fact that GPT-3 etc are still not publicly available) makes me suspect that these models are far too unstable for actual production use. It’s also why I’m getting a bit tired of these overhyped cherry-picked samples with seemingly nothing solid to ever back it up.
> I’m getting a bit tired of these overhyped cherry-picked samples with seemingly nothing solid to ever back it up.
Most of the times a "fantastic GPT-3 result" is shown, you have to dig a bit and then you'll find out how it was primed[0] and how many different texts they had it generate. Then the one(s) carrying out the experiment go on and pick the most shocking writings. If you read all of the outputs (there are a few articles around that show you 5 or 6 different outputs) you'll see the variations that it took duing those. I understand that 5 or 6 is actually small, to get shocking results they usually go into de dozens of tries.
[0] usually the priming phrases are given, but depending on how much of a snake-oil-salesman the person writing/giving a talk is, they may even hide this part
They are too expensive to run - hundreds of GB of GPU memory - so they can't be deployed for the public at large yet, kind of like the SGI workstations from 20 years ago. You can do that and more for cheap today, but not then.
I think we can get models about 1/100th the size for general use. That's also the main reason Google is developing TPUs.
> They are too expensive to run - hundreds of GB of GPU memory - so they can't be deployed for the public at large yet, kind of like the SGI workstations from 20 years ago. You can do that and more for cheap today, but not then.
I don't buy this. OpenAI literally released pricing for GPT-3, so either they grossly miscalculated their cost base (unlikely) or there's some scaling/instability/resourcing issue preventing them from doing so (much more likely).
I think it's telling that they spent the last 6 months on yet another flashy demo (DALL-E) rather than actually productionizing GPT-3. It just feels like constant smoke and mirrors.
You’re (sadly) assuming everyone would verify its correctness. Proper programming would mean one would write tests, but not everyone does. I’m guilty of it too.
When banks got ATMs, they thought it'd drastically cut headcount - instead it went _up_: freeing up time from doing the basics meant you had more time to focus on more profitable activities
I hope to live long enough to be mostly writing tests for a gloriously hacky code generator that gets it right 80% of the time
GPT-3 is a generator, that's just half the equation. It needs a discriminator (critic) to check out its outputs and in the case of program synthesis and realistic physics, a simulator. That's how humans do creative stuff - generate silly ideas then check them out.
This is not like we are asking GPT what is 2 times 3. It is not even a bug. Just that the author is using quantized multiplication which takes subset of floating point space and just makes all the value closest to nearest point. So 2 can be approximated as say 2.5 if there are few points. And that is not deterministic. Also it is a known thing that something like that could occur. It is just that neural network still seem to learn.
I see where you're coming from, but given the (usual) flexibility and generality of neural neys, it's often very tempting to assume that you can just keep adding data, and I think these are good reminders that being able to do complicated things doesn't mean they can do simple things efficiently (which is a conversation that has come uo several times at work)
How is it that human brains can follow deterministic logic but neural networks can't? What's the missing piece? Is it just that people are a lot more complex than algorithms like GPT-3?
That's essentially like asking "how is it that human feet can walk up mountain slopes, but car wheels can't?"
That is, there is no relation between human brains and artificial neural networks, other than them serving similar purposes in particular environments.
I respectfully disagree - I believe this is a philosophical viewpoint that shouldn't be presented as a straightforward truth.
Unless you are a dualist, I would say that it's reasonable to view that it is in principle possible to produce an artificial network accurately emulating the function and behavior of a human brain.
If you are a dualist, then there is no further discussion to be had as we are very unlikely to ever be able to prove anything like the existence of a soul.
Apologies if you were speaking to some more subtle nuance that I was unable to pick up.
TL; DR: I'm saying that there is no structural resemblance between the human brain and artificial neural networks specifically (even though there likely is a strucutral resemblance between the human mind/brain and a computer in the general sense). You can believe in AGI and not believe it will be achieved with ANNs.
> I would say that it's reasonable to view that it is in principle possible to produce an artificial network accurately emulating the function and behavior of a human brain.
I think that claim is far too strong. As a non-dualist, I do believe that it is possible to create an artificial "brain" that has the same cognition as a human brain. However, simply rejecting dualism does not tell you anything about what the artificial brain has to be.
You can further say that a non-dualist who accepts the Church-Turing thesis must accept that there must exist a Turing machine which has the same cognition as the human brain. Since the PCs we use are Turing machines, it follows that we should be able to program one to behave like a human brain, in theory at least (disregarding hardware requirements, of course).
Still, that does not mean that a brain Turing machine has to look anything like a neural network trained through gradient descent & back propagation. This was my point: artificial neural networks and the methods we use to train them have no resemblance to the human brain, and there is no reason to believe that they are the way to create an artificial general intelligence. So, there is no reason to be surprised that a neural network, especially one as small as any of the ones we have realized so far, doesn't exhibit complex properties of the human brain.
Artificial neural networks are just a statistical model that was once inspired by a very, very simplistic idea of what biological neural networks are. As we have discovered more about biological neural networks, we've abandoned any notion of comparing ANNs with biological neural networks in terms of actual structure.
This is all not to say that it's impossible for a complex enough ANN to actually be an AGI. It's just not going to be that surprising if it won't be, if an AGI program will look significantly different, and will be trained in completely different ways.
This is a very interesting question that I've been thinking of as well. Why can't neural network based AI learn deterministic logic? I think because it doesn't have internal representations inside its "head" so to speak. It does not know that it can manipulate such structures as we can.
Code generation only needs to generate code with n bugs where n is less than the number of bugs a human developer generates for it to have usefulness, and maybe some other factor of severity where they are generally less severe than human developers. I think it'll make neat autopilot functionality for developers but not replace the need to have someone look over and understand the code.
This is a very simplistic of what code is and the role it plays in a system.
There are many implementations that can fulfill a set of requirements. Not all of them are created equal. The ways in which they behave as the system changes can be wildly different. Well-written code will be able to handle those changes gracefully. Poorly-written code may end up proving brittle and bug-prone. Generated code will be completely unpredictable.
Imagine you're trying to build a street network for a city. Some designs are much more predictable than others. If you've played Factorio, the distinction between a spaghetti base and one that has some design is abundant. Even if they currently fulfill the same requirements now, the ability to improve upon and reason about how it will behave after changes is vastly different.
I don't know what you're arguing against but it sure isn't what I wrote.
"Code generation only needs to generate code with n bugs where n is less than the number of bugs a human developer generates for it to have usefulness, and maybe some other factor of severity where they are generally less severe than human developers."
Point to the part you're arguing against because you way extrapolated what "have usefulness" means I think.
I don't want to die when I crash my own car, and I already debug my own apps at 12am. If your argument is that things need to be perfect than my god you must never leave your home! I'd trust a machine to drive more accurately than most people I see on the highway.
Humans aren't special, in fact more often than not we're sloppy, subject to fatigue, and a whole bunch of other negative things.
That considered, I had a pretty strict qualifier in my above post which means the machine must perform better than the average human in the respective task and therefore I'd be more likely to die driving my own car than a machine meeting my prerequisites.
This is naive. The point is that code is a well defined system with clear rules that can be expressed through logic and mathematics. GPT is suited to approximate systems where the rules are not well defined. Until AI can actually learn the principles of logic, it may not be useful for code generation on a meaningful scale, other than things just like simple auto-completions.
Not only that, AI would also have to learn the principles of system design, performance, security, readability, maintainability. That's what makes "good" software. It's a far stretch to say that AI could achieve anything of the sort based on current abilities.
It's clearly not if you read the rest of that sentence, that's my metric for it to be useful. My metric for it being good is much deeper. Which is why I questioned what you responded with.
We are already at the point of useful and context aware code generation anyway which is why I've found everyone on this thread questioning it to be kind of funny, Microsoft was demoing complex generations a year ago. So we're well on our way.
I disagree that that is enough to be useful. To give a deliberately extreme example: if it produces code which has half the number of bugs as a human, but it only outputs Malbolge source code, nobody else will be able to fix those bugs which remain.
As someone who builds neural networks routinely, this sort of non-reproducibility sounds troubling to me. We expect small differences for floating point arithmetic between platforms, but integer math is typically exact.
This is all the more concerning for 8-bit quantized arithmetic, where off-by-one means a relative error of about half a percent. If a individual layers in a quantized neural net have off-by-one errors with a consistent bias, I can imagine these errors accumulating into significant losses in model quality in deep networks. There isn't a huge margin for error in quantized neural nets.
One concern about the article: it uses the word "non-deterministic" in a slightly misleading way. I assume any specific hardware is still expected to produce consistent results when run twice on the same input. So it's more non-reproducible than non-deterministic. Compensating for inconsistent arithmetic on different devices sounds much more feasible than compensating for stochastic arithmetic.
Thanks for your comments. Regarding determinism, potentially a fair point. Here are a few comments:
(1) A driver which randomly produces different output when running the network would be valid according to these restrictions.
(2) It is conceivable that a driver would produce non-deterministic input with the same hardware. One commonly known example is that tensorflow will run multiple different convolution kernels and then choose the fastest one. In that case, you can run the same network on the same hardware and get slightly different results. Its not that hard to imagine that a mobile driver could do something similar.
(3) It's not true that specific hardware will produce consistent results on the same input. You can run a model today, the driver gets updated, and tomorrow you get different output. This happens frequently.
All good points! "Non-deterministic" behavior within the same program/process is still a bridge I would not want to cross. This could result in subtle glitches, e.g., when a user hits "refresh" with the same inputs, and could make reproducing bugs impossible.
I am a strong believer in always using a seed for random number generation for exactly these sorts of reasons. (Side note: deterministic RNGs is one of my favorite features about JAX.)
Regarding bias: This is exactly true especially with the authors method, as the learned quantization ranges are fixed and accumulating biases would lead to the entire batch being clipped to 0 or 255, depending on the direction of the biases. Luckily the bias parameters are kept in int32, so the overall bias produced by them will be much smaller than 2 pct. The arithmetic errors of the int8 matmults are summed within matmul, and are therefore an unbiased estimate of the true entry in the result matrix.
It’s an interesting observation, and a shocking title, but the applicable lesson seems to be “don’t use an aggressively quantized network if your application is sensitive to quantization errors”
My pencil can sometimes be in China according to quantum mechanics, but the probability is extremely low. I think the fact that neural networks are almost right is not really concerning at all. As long as your network can produce a result within an acceptable error boundary, who cares? That is literally how nature works.
What is called AI, or "Artificial Intelligence" should in reality be called "Artificial Intuition".
It is similar to the subconscious mind that is able to get approaches to a solution very fast, but does not give you the solution itself. You need the logical conscious mind(similar to the CPU) to refine the solution.
The logical conscious mind is so slow that will never get the solution on its own, but being so close to the solution it can.
AI 1.0 was about solving all problems just using rational methods alone, like Lisp programming. AI 2.0 is solving all problems by neural networks and training alone without understanding or testing if a solution is right or why it is right.
Real artificial intelligence should be about integrating both approaches. E.g You use intuition to train a network in the English language, but then you use it to develop the english Grammar from it. You extract the structure from the data.
Why would use use a neural net to approximate 2 x 3 when there is a clear definition of the result. Or as a fun side affect, neural nets are prone to off by one errors too :)
It's a neural network. It gives approximate results. Here's a newbie question that asks basically the same question, with some interesting answers.
> codesternews: Any deeplearning expert here. Why Neural network can't compute a linear function Celsius to Fahrenheit 100% accurately. Is it data or is it something can be optimised.
print(model.predict([100.0]))
// it results 211.874 which is not 100% accurate (100×1.8+32=212)
>Some intelligence is simply less intelligent than others.
I completely agree with you there, you're preaching to the choir.
To compare apples & oranges I could say how would you feel if you were surrounded on a dangerous freeway with nothing but noticeably below-average drivers including the vehicle you were in.
Natually I expect many passengers have become familiar with that particular traffic situation a time or two.
IOW not just below average but below ordinary expectations, and as mentioned dangerously so.
Natural intelligence, or lack of enough in the case of many who are performing noticeably below average, can only take you so far and it has always been a limitation.
OTOH would you feel more comfortable with all automated drivers instead having noticeably below-average performance due to their less intelligent below-average automaton behavior?
What if you noticed something your driver did not?
What could you do to alert a driver that truly needs a little advice from the back seat for instance, whether for navigation, safety, or far more elusively a sense of danger or even courtesy, in either case?
Would your observations as a passenger have any possibility of ever being helpful in either situation?
Would the relative artificiality of the intelligence or lack of it involved be a factor?
What if it was not just below-average drivers but some of the traditionally worst who are barely acceptable and realistically for them it's only under ideal conditions?
Seems to me risks increase exponentially the further from ideal, and the deviation between natural and artificial types of risks could result in a valley having its own kind of uncanniness.
Personally speaking as the strongest advocate toward ML & automation most people have met over the last 50 years.