| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by george3d6 2315 days ago

That is untrue,

Here's a code example (actually took me ~20 minutes to get it "right" so I'll admit it's not the most trivial problem)... it includes seeds so that you can replicate locally (it should hit 100% accuracy all the time on the 1200 examples testing set reliably by about epoch 700):

'''

import torch import random from sklearn.metrics import accuracy_score

random.seed(61) torch.manual_seed(61)

X = [[random.random() for x in range(2)] for x in range(2000)] X_train = torch.FloatTensor(X[0:800]).cuda() X_test = torch.FloatTensor(X[800:]).cuda() X = torch.FloatTensor(X)

Y = [] for x in X: y = [0] * len(x) y[torch.argmax(x)] = 1 Y.append(y)

Y_train = torch.FloatTensor(Y[0:800]).cuda() Y_test = torch.FloatTensor(Y[800:]).cuda()

shape = [2,2] layers = [] for ind in range(len(shape) - 1): layers.append(torch.nn.Linear(shape[ind],shape[ind+1],bias=False))

net = torch.nn.Sequential(layers).cuda()

optim = torch.optim.SGD(net.parameters(), lr=1) criterion = torch.torch.nn.CrossEntropyLoss()

dataset = torch.utils.data.TensorDataset(X_train, Y_train) dataloader = torch.utils.data.DataLoader(dataset, shuffle=True, batch_size=10)

for i in range(pow(10,6)): for X,Y in dataloader: Yp = net(X) loss = criterion(Yp, Y.max(1).indices) loss.backward() optim.step() optim.zero_grad()

    if i > 500:
        optim = torch.optim.SGD(net.parameters(), lr=0.002)

    if i % 20 == 0:
        Yp = net(X_test)
        print('Training loss: ', loss.item())
        print(f'\nAccuracy score for epoch {i}:')
        print(accuracy_score(Yp.max(1).indices.tolist(),Y_test.max(1).indices.tolist()))

'''
This is as basic as you can get, predict the max out of 2 numbers, only uses a total of 4 node:
2 inputs (the 2 number) -> 2 outputs (the index of the maximum numbers). just 2 weight being optimize, no biases no nothing, as simple an implementation as you can get in terms of size.
There's also way to do it (apparently) where instead of treating it as "find the max index" you treat it as "output the maximum number": https://www.quora.com/Can-deep-neural-networks-learn-the-min...
But the approach I have will generalize to e.g. "Find the max of 5 or 100 or 1000 numbers" (although I assume it might take some time)
And overall you have no guarantee, that's why I qualified the statement and didn't say "Literally any imaginable problem that a human can solve without context".

To some extent it also matter how you encoder your number, you can train a 10000000000 parameter FCNN with RELU activations until the end of time to learn a simple mutliplication, and it won't be able to do so if you don't log encode your numbers or use some encoding or activation that means `` can be transposed in the `+` operations being done inside the nodes to combine the outputs... because that's outside of the scope of mathematics that given netwrok can do.

But, unless you are specifically trying to come up with an edge case and are instead looking at real world problems and trying to design the network in such a way as to best handle them (and this doesn't have to be all manual, you can use various NAS techniques), the rule will hold most of the time I believe.

1 comments

manthideaal 2315 days ago

You are right. Also the Universal Approximation theorem (1) for neural networks guarantees that neural networks can approximate continuous function on compact subsets of R^n, in this case max(x,y).

argmax([x,y]) = (sign(x[0]-x[1])+1)/2

Going beyond continuous functions, can deep learning be used for primality test?

(1) https://en.wikipedia.org/wiki/Universal_approximation_theore...

george3d6 2315 days ago

https://escholarship.org/content/qt5sg7n4ww/qt5sg7n4ww.pdf

> A long-standing difficulty for connectionism has been to implement compositionality, the idea of building a knowledge representation out of components such that the meaning arises from the meanings of the individual components and how they are combined. Here we show how a neural-learning algorithm, knowledge-based cascade-correlation (KBCC), creates a compositional representation of the prime-number concept and uses this representation to decide whether its input n is a prime number or not. KBCC conformed to a basic prime-number testing algorithm by recruiting source networks representing division by prime numbers in order from smallest to largest prime divisor up to √n. KBCC learned how to test prime numbers faster and generalized better to untrained numbers than did similar knowledge-free neural learners. The results demonstrate that neural networks can learn to perform in a compositional manner and underscore the importance of basing learning on existing knowledge.

But again, I think things such as prime number tests are the exact kind of edge cases where one needs too many heuristics built into the model for it to be practical to use.

But I think something like a prime test is not included under the definition I gave anyway, because the idea of "prime" actually implies a lot of context.

You can take a baby and he will be able to classify images, you can take a human that speaks a language with no concept of numbers and he will be able to play or sing music and distinguish patterns in it.

You can't talk about "prime" without a mathematical apparatus that takes years for humans to understand. However, since we learn it as such an early age, it ends up in the background.

Granted, that could be said about almost any cognitive ability (the fact that there's a lot of "subconscious context" required to use it).... so I don't know.