Hacker News new | ask | show | jobs
by cjhanks 2198 days ago
We have also been watching these machine learning models for 6 months:

- increase the volatility in virtually every financial market they touched

- be exploited by adversarial learning networks to amplify funded propaganda as news

- use poorly contrived sentiment analysis to generate incomprehensibly meaningless news headlines

These non-linear "function approximators" have absolutely unpredictable and insane non-linear behavior where learned information was non-existent or sparse.

God help us all if one of these artificial intelligence devices is driving the road and sees a red stop sign that is a square, rather than a hexagon.

12 comments

"AI" is a very vague term. What you described aren't entirely "machine learning", but a combination of existing linguistic techniques and machine (deep) learning.

People confuse what AI can do, and what is AI all the time. It also doesn't help when there are so many inexperienced data scientist making promises that they can't achieve.

In your example, I'd argue that a human is not necessarily a better driver than a machine. An attentive and careful driver is certainly better than a machine right now, but there are many who drive carelessly. While a person is unlikely to mistake a square stop sign as something else, there are so many drivers that would simply ignore the sign, and traffic lights in general. They'd also drive dangerously because of road rage, and inattentiveness. And the majority of traffic accidents are caused by these drivers. A machine is unlikely to do these.

That said, until we figure out how to run all these deep learning models without a crazily expensive and power-consuming GPU, it is unlikely AI would be used as general purpose programs.

Whether humans or AI are "better" drivers is completely beside the point. The point is that we can characterize human drivers. We know where they succeed and where they fail, both in a statistical sense and in an individual sense based on their age, attention, vision, chemical impairment, etc. But we cannot characterize ML networks. We take it on faith that they work and then we find (because somebody dies) that they run right into an overturned truck or a pedestrian or under a truck crossing the road.

Until we can characterize the behavior of these systems, they must not be put in control of life-critical processes like driving.

Just playing the devil's advocate but : when you take a taxi, what do you know about the driver? You can vaguely see if he's sober and that's all. You EXPECT him to have a driver's license, to have a good eyesight, etc but you KNOW nothing about it. If he has an heart attack while he's driving on the highway, could it have been predicted (by you or the company)? No.

So i don't see why this distinction between AI and humans is made : both are black boxes. Perhaps humans have less "edge cases" but as long as the error level of AI is the same or lower than the one of humans, I don't care if the car crashed because the human driver looked at a sexy woman on an ad on a billboard or because a variable was poorly set in the car's code.

I also agree on this. I think in terms of liability humans who one can sue when they make a mistake is more valuable than a machine.

That's why in life critical applications companies who are capable of taking the risk are scarce, because when accidents happen, the company has to take responsibility. It cannot be resolved by just firing employees.

You can fix a software, but you can only punish a human driver, hoping it will fix itself. Also, both can be forced to train, but you can reproducibly test only the software, no guarantee that your retrained human driver will not succomb to the same road rage in the near future.
Nonsense.

You can't fix a model to handle unknowns, and you can't test that.

We've seen with Tesla's autopilot software that things like obsolete road markers and overturned trucks are meaningless to software.

Of course you can!!

Even in something not very well defined as a neural network, you can try to retrain it, or also to modify its architecture, or its postprocessing, and verify reproductibly on test cases that it behaves better.

Also, to address your critics, you can add test cases (just like in any sotware. But actually they also do exactly that for hardware too).

Where are stop signs hexagons?

Am I being a pedantic numpty or am I illustrating a point about the many ways errors creep in, regardless of the natural- or artificial-ness of the intelligence?

You're being a pedantic. Human beings are tremendously better at driving than machines despite sometimes saying hexagonal rather than octagonal. Humans and current AIs both make mistakes but humans manage a kind of robustness, ability to deal gracefully with unexpected situations, that current AIs don't seem to be progressing towards.
What happened to sensor fusion? There's no reason self-driving AI has to be as unreliable as toy or research AI. People made these same FUD arguments about computer in cars decades ago. Home computers were unreliable so cars will surely crash if their brakes or throttles are controlled by computers too.
> Human beings are tremendously better at driving than machines

Human drivers: 1 death per 88 million miles traveled (in the US) [1]

Tesla Autopilot: 5 deaths per 3 billion miles [2]

[1] https://www.iihs.org/topics/fatality-statistics/detail/state...

[2] https://electrek.co/2020/04/22/tesla-autopilot-data-3-billio... and https://en.wikipedia.org/wiki/List_of_self-driving_car_fatal...

Tesla autopilot is just a very fancy form of cruise control. Without corrections by human drivers it will happily run into stationary obstacles.
Autopilot isn't used in the same conditions.
Yeah. Totally makes sense to compare a human driver driving an average 7 year old 30k$ car with not so good safety ratings driven by average person in snowy, rainy pothole ridden roads to a newish 70k$ luxury car with good airbags/crumple zones driven in mostly sunny California roads by mostly young drivers with a driver assistance system.

And then for you to argue that the driver assistance system is actually better than a human driver if given the car alone!

Wow

What humans are good at that machines can't do yet is bullshit justifications for screwing up.
North America. Where are stop signs not hexagonal?
Oh, I checked, they're octagonal xD
When obscured by overgrowth.
Serious (and likely ignorant) question - what does linearity have to do with anything here? linear over what and why does non-linearity make something 'unpredictable'?
Linear models have more bias, so they represent current data less well and are more predictive of future, unseen data (think of a straight line through a point cloud).

Non-linear models have more variance so they represent current data better and are less predictive of future, unseen data (think of a line snaking around a point cloud).

An added complication is that deep neural net models are, in practice, vectors (or, well, tensors) of numbers so they are difficult to interpret. This and their extreme variance makes it hard to know how they will behave in the future.

The bias/variance trade-off is not really related to extrapolation. Think of a point cloud following a quadratic shape. A linear model will extrapolate terribly.
Well, "more predictive" doesn't mean it's a perfect fit. Every model has error. A line through a point cloud curving upwards will still represent some of the points in the cloud. So it will have high error, but it's still a representation of the data.

And yes, the bias-variance tradeoff is about generalisation (i.e. the ability to extrapolate to unseen data). But this is more related to the fact that in the real world, problem spaces don't have nice, friendly, regular shapes nor do their shapes stay put after we've trained a model.

My understanding is that generally, the error when extrapolating to areas not covered by the training data distribution would be considered to be part of the "bias" part of the bias-variance tradeoff.

The way I see it, the variance is the part of the error that you can reduce by collecting more data from your distribution and increasing model complexity if needed.

The bias part is what will not get better no matter how much you sample your distribution, and extrapolation problems fall into that category.

>> The way I see it, the variance is the part of the error that you can reduce by collecting more data from your distribution and increasing model complexity if needed.

Ah, apologies, I see what you mean. That is true, but this "error" is in-sample error, so increasing your model's variance will increase its ability to interpolate but not extrapolate to out-of-sample data, as I explain in my longer comment.

"In-sample" means all the data you've collected to train and test with. It includes training/validation/test splits. At the end of k-fold cross-validation, your model has "seen" all the data in your sample and the model that performs best is the model that best represents that data.

But, because the data was sampled from a distribution that is most likely not the true distribution of the data (since that distribution is unknown), the sampling error (i.e. the differences between the true and sample distributions) will be reflected in the model. A high-variance model will suffer more from this than a high-bias one.

Sorry I didn't understand immediately what you meant. The longer comment above is correct but probably doesn't help answer your question directly.

Bias and variance are characteristics of the model, not components of its error as I think you're saying. In the most simple sense, bias and variance refer to the shape of the function represented by the model (let's say "the shape of the model" for simplicity). A model with a more "rigid" shape (approaching a straight line) has more bias and one with a more "relaxed" shape (further from a straight line) has more variance.

The extent to which a model can extrapolate to out-of-sample data depends on how well the shape of the model follows the true distribution of the data. This is true regardless of the bias and variance of the model. It just happens that most of the time, in interesting, real-world problems, the true distribution of the data is more or less different than the sampling distribution of the training data- i.e. there's always some amount of "sampling error".

Sampling error can't be reduced by collecting more training data- you just have more data with the same sampling error. Increasing model complexity increases variance, so if you start with high sampling error, you wil get a high error on out-of-sample data because your model matches the "off" distribution of the training data too closely. What training with more data and with a more complex model can do is increase the ability of the trained model to interpolate, i.e. to accurately represent (new) data points that are in the same region of "instance space" as the training data points.

A high-bias model can extrapolate well if the sampling error is not too high and the shape of the true distribution is not too irregular. However, a high-bias model will also not interpolate as well as a high-variance model. Its rigid structure will "miss" many data points. Like you say, this will not change if you train with more data. Anyway, that's the tradeoff.

Now, the reason why deep neural nets, which are extremely high-variance models, are trained with large amounts of data, is that they can interpolate very well but can't extrapolate very well. If a model doesn't extrapolate very well but its training sample is a large enough chunk of instance space, it can still be very useful, because it's still representing a large number of instances.

How to put it? Mabye your high-variance model has seen examples of white dogs and black dogs in training, but no green dogs. Your model will not be able to generalise to green dogs, but if green dogs are rare, it will still be able to represent most dogs, so it's still useful.

Of course, looking at the output of a trained model (its behaviour) doesn't tell you anything about what it was trained on. So a model that has very high accuracy on a large number of tasks will look impressive, even if it can't generalise at all.

I'm not good at math, but I'm confused by the association of AI with non-linear stuff, setting aside the association of non-linear with "bad". I thought ML involved linear algebra or something (says xkcd!) which would presumably be...linear?
The inner activation function (AF) of neurons is inherently nonlinear; it has to be in order to solve any problem that is not linearly decomposable (which is basically all of the interesting problems). Often the AF nonlinearity shows up as a thresholding operation following a linear weighted sum, but that's not the only mechanism.

And yet neurons are not "pure" binary thresholders the way logic gates are because you can't take the derivative of a binary function, and you can only do backpropagation on differentiable functions. The compromise neurons make is a "smoothed threshold" or sigmoidal curve which is differentiable but still very nonlinear.

I'm not sure where the "linear" in "linear algebra" comes from. You hear about linear algebra in relation with machine learning a lot because training a neural net (with the backpropagation algorithm and friends) requires some matrix arithmetic. Inputs to neural nets are vectors or matrices, their weights are (arrayed in) vectors or matrices, their outputs are - well, usually scalars but can also be vectors or matrices.

Also, the use of linear/ nonlinear in machine learning is a bit misleading. A "line" is not necessarily a "straight line", but usually when we say "linear" we mean "straight" and so when we want to say "not straight" we use "nonlinear".

In any case, when we say "line" in machine learning we mean a function, the function of a line. So a "nonlinear" function is a function that curves and turns, e.g. a sigmoid, whereas a "linear" function is straight as a rod.

Why a line? Classifiers er classify by drawing a line through space. "Space" means a Cartesian space where our training examples are represented as points (hence, "data points"). Data points are located in Cartesian space according to coordinates that represent their attributes, or features (these coordinates are the "feature vectors" that are input to neural nets). We classify data points by drawing a line between those that belong to one class and those that belong to other classes. More to the point, when we train a classifier, we find the parameters of a function of a line that separates the points of separate classes and when we want to classify a new point, we look at where it falls with relation to that line.

So that's where all that stuff about lines and "linear" and "nonlinear" models comes from. A "linear model" or "linear classifier" can only draw straight lines. A "nonlinear model" can go twirling around madly.

Finally, "non-linear" doesn't mean "bad". There are tradeoffs- in particular, the "bias variance tradeoff" that I hint at in my earlier comment. A linear model is more limited in what it can represent, but a nonlinear model is less likely to represent data that it hasn't seen in training.

- "linear" in "linear algebra" comes from "system of linear equations"

- NN can absolutely represent non-linear functions, and they are based on solving system of linear equations.

- The non-linear function here has nothing to do with the linearity of the system of linear equations used to construct it.

- The two main sources of non-linearity are, (a) the inputs (e.g., an image, or a series of images varying a non-linear fashion), and (b) the activation functions.

The underlying derivatives are linear (like all derivatives) but neural networks' ability to approximate arbitrary non linear functions is one of their biggest strengths.
Yes, so I'm left wondering, when making the association of the math to the badness, how do you decide if the linearity or the non-linearity is the salient part?
Mathematically, you can think of "linear" AI problems as "easy to solve", and non-linear as "difficult". That's part of what the parent means.

Some function being linear means it's easier to guess. If a real world phenomenon is tied to a linear function, then it's easy for AI to guess/approximate.

If you have ever opened up Excel or a similar program. One of the more useful options is to generate a regression line-fit on your data points.

One option is to specify a polynomial function, you can specify how many coefficients you want. One of the measurements is the mean-squared-error between the line-fit and the points.

You can add as many polynomial coefficients as you want, and you will be able to decrease the mean squared error. But the more polynomial's you choose, two things will be true:

1. The line-fit will be far more likely to go through the points.

2. At points in the line where there was no data, the line will less approximate the underlying physical reality.

That same mathematical property is what is relevant here. There is nothing inherently evil about non-linearity, when the non-linear math model properly maps to the physical reality. But when you over fit a line, many of the functional solutions may be completely wrong.

I'm confused. I agree that overfitting can lead to very bad models.

But, what I don't understand is that I thought that "linear" in ML contexts was normally used in the sense of 'linear transformations', which is a sense of linear that 'line-fit' from excel isn't -- it's affine.

Is a linear model with thousands/millions of weights/parameters (like deep learning models) really substantially simpler to understand? Can it do anything useful?

[1]: https://en.wikipedia.org/wiki/Linear_map

I suppose from the perspective of someone implementing these models, yeah - it is linear, but it is not bijective. In a system with only one layer, that manifests as an alias (assuming the output dimensions are smaller). In a system with multiple layers of either `N->M` or `M->N`, those aliases tend to manifest as apparent "non-linearities".

So, I guess looking from the bottom up the system may look non-continuous and linear. But if you look from the top down, it would look continuous and non-linear.

Really, I am not sure which one is "true".

I assume they are using non-linear to mean non-continuous, which implies that there can be large, hard-to-understand changes in behavior when the input is changed only a small amount.
Polynomials with large degrees are continuous. It's just that they can still change by a large amount (i.e. having a large derivative) when the input is changed by a small amount.

I invite you to construct the Lagrange polynomial (i.e. interpolating polynomial) for points on a nice, simple curve with some noise. They will, by definition, pass through every point given, and yet it will likely behave very badly outside the range of the given points.

There is nothing wrong with using a non-linear model, though; x^2 or x^3 regressions make sense on many datasets.

Non-continuous is also not the perfect terminology, but I argue that it is more precise than non-linear: the chief idea being that the model "changes unpredictably."

Sure you can argue things however you want, if you also decide to ignore hundreds of years of mathematical terminology.
How have machine learning algorithms negatively affected financial markets in the last 6 months? Markets have been volatile because information about the real world has been volatile. I don't think markets in an earlier era would have handled a global pandemic any more robustly than they did in 2020.
But it has also generated prodigious amounts of erotic fiction so that balances out some of those points, right?
I honestly don’t know if this is a joke or if there is a bunch of erotic fiction I’ve been missing
Deepfakes (faceswap but for porn). Decensored hentai. And of course the question came up again: how ethical are generated pictures depicting illegal content?
I was referring to AI Dungeon, actually, and yes it was a joke. But 100% true.
Intelligence is the amalgamation of many smaller problems working together and building on top of each other.

* Facial recognition/detection

* Facial synthesis (deepfakes)

* Speech synthesis, including mimickry

* Speech recognition

* Natural language processing

* Gait/walking algorithms

* Motion planning

* etc.

Complexity arises from simple units working together in parallel. We're working on the smaller, specialized problems that will, in the next generation, be put together to build more complex and complete systems.

I'm no fan of the 'black box' nature of neural networks but it's clear they're getting results. As they become more accessible to the lay person, we'll see a profusion of use cases that are both anticipated and surprising.

I'm always flabbergasted by the doom prediction. The path we're on seems apparent.

I agree with the notion that artificial intelligence is a graph of smaller problems, as is human perception.

The problem is a question of informational density. Biological systems are computationally very dense. Far more dense than the 4nm transistor fabrication available today, and with a far larger volume of size.

Consequentially, the computational capability of most AI systems is far lower than its biological equivalent. And as you find in most information finite discretization problems - the lower density information system will alias against the higher information system.

So, that means you will have a hierarchy/pipeline of computational stages - each aliasing reality. Eventually, you will find that your parameterization of each perceptual stage has a strange property. The size of each subsequent layer is important... but the relative computational space of each subsequent stage is even more important. Because mismatched stages results in nothing but numerical interference and noise.

And I think that is where we are today. The IQ of a krill shrimp.

Isn't "the amalgamation of many smaller problems working together and building on top of each other" a fair description of the theorical Unix system?

Aren't your criteria for "intelligence" human-centric, implying that there is no other form of "intelligence"?

Aren't your criteria of the "black box" type, given that AFAIK no human can really completely explain how he recognizes faces/does NLP/walks/...?

> Isn't "the amalgamation of many smaller problems working together and building on top of each other" a fair description of the theorical Unix system?

Yes. Note the success of Unix and the ability to scale, do work and provide an environment to be productive in.

> Aren't your criteria for "intelligence" human-centric, implying that there is no other form of "intelligence"?

Your use of 'human-centric' is odd. I would have thought the traditional 'human-centric' theory of the mind is something monolithic and indivisible. Suggesting that it's many small processes communicating with each other is basically taken straight out of nature, from ants, schools of fish, birds flocking, etc.

Whether there are other forms of intelligence or no, it's clear that incremental progress in individual processes that can then be composed together is a productive way to traverse the energy landscape. This is why (imo) we see so many symbiotic relationships from cells on up to higher level animals.

> Aren't your criteria of the "black box" type, given that AFAIK no human can really completely explain how he recognizes faces/does NLP/walks/...?

I'm not quite sure what your point is here. If you're critiquing me about neural networks being black boxes and not giving us real insight into the underlying system, that's fair and the reason why I said I didn't like the black box aspect of neural networks. I will say that if there is a black box model that can be easily manipulated, this will probably lead to deeper models much quicker.

If you're saying that human cognition is not describable by any human and, I guess, implying that it's indescribable, I would point out that one doesn't follow from the other. Not having a good model right now doesn't imply we won't understand it at some future date and, in my opinion, this is precisely what's happening. Having no human be able to describe the underlying computation (of face recognition, nlp, walking etc.) doesn't mean it's indescribable, it means it's not describable by anyone right now.

At one point we didn't know how birds flew. We still might not know, to your satisfaction, but we have a basic understanding of how to make things fly, both in practical and theoretical terms. Planes fly and we understand how even though they don't flap their wings. I have no doubt we'll figure out how to do complex human-level computation even if we don't have a deep model of the specifics of human thought.

>> Aren't your criteria for "intelligence" human-centric, implying that there is no other form of "intelligence"?

> Your use of 'human-centric' is odd. I would have thought the traditional 'human-centric' theory of the mind is something monolithic and indivisible.

Sorry, my answer wasn't clear. It was not tied to the "small processes communicating..." approach but to your very list of "problems" (facial recognition/detection, facial synthesis (deepfakes), speech synthesis, speech recognition..) which seems to me expressed in a way tied to human activities, while at least part if the underlying "intelligence" underlying many of them may also exists in other forms of life (other mammals, birds, fishes...).

> Suggesting that it's many small processes communicating with each other is basically taken straight out of nature, from ants, schools of fish, birds flocking, etc.

Exactly. My point is that analyzing the ways "the smaller, specialized problems" are tackled by non-human living beings seems pertinent as self-analysis (as humans analyzing human intelligence) is difficult, and as various species may apply various solutions, some more easy to grok. Focusing on "problems" too specific to the human being may be a sort of "framing" detrimental to the quest. Moreover the famous Dijkstra quote ("The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.") may be pertinent.

> incremental progress in individual processes that can then be composed together is a productive way to traverse the energy landscape. This is why (imo) we see so many symbiotic relationships from cells on up to higher level animals.

I agree. My point is about _how_ we consider the system(s) (our "point of view"): framing it to human characteristics, globally or locally (dualism)... It seems to me that the very organization of a system may be neglected when we consider it a stack of "small processes communicating...". Pirsig's "Metaphysics of Quality" may be pertinent.

>> Aren't your criteria of the "black box" type, given that AFAIK no human can really completely explain how he recognizes faces/does NLP/walks/...?

> I'm not quite sure what your point is here. If you're critiquing me about neural networks being black boxes and not giving us real insight into the underlying system, that's fair and the reason why I said I didn't like the black box aspect of neural networks.

This was my point and I agree with you.

> if there is a black box model that can be easily manipulated, this will probably lead to deeper models much quicker.

I'm less optimistic, as it is only 'probable', and AFAIK won't give us more real insight into the underlying system.

> Having no human be able to describe the underlying computation (of face recognition, nlp, walking etc.) doesn't mean it's indescribable, it means it's not describable by anyone right now. > I have no doubt we'll figure out how to do complex human-level computation even if we don't have a deep model of the specifics of human thought.

I agree, we will enhance ways to "approximate" (tricks leading us to a solution to each "local" problem) up to the point of being able to solve real-world problems. However it may reach some hard limit (as far as I understand this is the point of the article), and using a powerful tool/method insufficiently understood may be dangerous.

You're right: a human driver will stop even at a hexagonal stop sign, even though most are octagonal. Much safer behavior!8-))
> complicated when you get to the long tail of it

Well, there's the problem right there.

Watching this presentation did not make me any more confident about going in a self-driving car.

Based on the way it was presented, I got the feeling that they are just essentially manually identifying cases and addressing them as they see them. Is that solution really helping to make the system more robust when encountering an unexpected situation?

I'd argue that the main driver of volatility over the last few months was the Coronavirus, and not AI...
I’m curious if you have a source that ML has increased financial markets volatility.
So Skynet will be insane?
Most definitely insane and probably well dressed.
This is so strange. If you use facebook, google, netflix, apple, microsoft, amazon or a whole host of other services you are interfacing with AI all the time. To think there’s no value there is asinine. Comes up a lot on HN. Seems like people set in their ways who don’t want to progress forward.
oh yeah, magnificient AI at Google search. Picked up my ebook-reader again, wanted to know about the state of linux there. so do a search: "<model of ebookreader> linux ssh" (since a good shell is the point, where you can start developing). Turns out, the first 3 pages want to sell me the same thing I already own, with one outlier selling nutritional supplements. Oh well done AI!
It’s doing exactly as it’s trained. Nudge the useds to buy more trinkets.

Now just imagine how good it could be if it was being trained to actually give good search results instead of selling.

Unfortunately, Google Search (and Amazon and MS search and all three companies' assistants) doesn't work that well when it comes to selling you what you want to buy either:

https://github.com/elsamuko/Shirt-without-Stripes

But it sure can identify stripes.

idk, the thing Google has been doing lately where they suddenly render a block of ads under your mouse as you're clicking on a result seems like the kind of thing an AI would do to increase ad clicks
There's a nearly unimaginable amount of money and computing power going to minimize the net value of the service + ads. If you get significant value, it's going to be optimized out in a few minutes. The ads on Youtube recently ramped up to where even something short is unwatchable for me, and it doesn't work at all with an adblocker.
Isn't copy and pasting within a thread generally discouraged?