| That is untrue, Here's a code example (actually took me ~20 minutes to get it "right" so I'll admit it's not the most trivial problem)... it includes seeds so that you can replicate locally (it should hit 100% accuracy all the time on the 1200 examples testing set reliably by about epoch 700): ''' import torch
import random
from sklearn.metrics import accuracy_score random.seed(61)
torch.manual_seed(61) X = [[random.random() for x in range(2)] for x in range(2000)]
X_train = torch.FloatTensor(X[0:800]).cuda()
X_test = torch.FloatTensor(X[800:]).cuda()
X = torch.FloatTensor(X) Y = []
for x in X:
y = [0] * len(x)
y[torch.argmax(x)] = 1
Y.append(y) Y_train = torch.FloatTensor(Y[0:800]).cuda()
Y_test = torch.FloatTensor(Y[800:]).cuda() shape = [2,2]
layers = []
for ind in range(len(shape) - 1):
layers.append(torch.nn.Linear(shape[ind],shape[ind+1],bias=False)) net = torch.nn.Sequential(layers).cuda() optim = torch.optim.SGD(net.parameters(), lr=1)
criterion = torch.torch.nn.CrossEntropyLoss() dataset = torch.utils.data.TensorDataset(X_train, Y_train)
dataloader = torch.utils.data.DataLoader(dataset, shuffle=True, batch_size=10) for i in range(pow(10,6)):
for X,Y in dataloader:
Yp = net(X)
loss = criterion(Yp, Y.max(1).indices)
loss.backward()
optim.step()
optim.zero_grad() if i > 500:
optim = torch.optim.SGD(net.parameters(), lr=0.002)
if i % 20 == 0:
Yp = net(X_test)
print('Training loss: ', loss.item())
print(f'\nAccuracy score for epoch {i}:')
print(accuracy_score(Yp.max(1).indices.tolist(),Y_test.max(1).indices.tolist()))
'''This is as basic as you can get, predict the max out of 2 numbers, only uses a total of 4 node: 2 inputs (the 2 number) -> 2 outputs (the index of the maximum numbers). just 2 weight being optimize, no biases no nothing, as simple an implementation as you can get in terms of size. There's also way to do it (apparently) where instead of treating it as "find the max index" you treat it as "output the maximum number": https://www.quora.com/Can-deep-neural-networks-learn-the-min... But the approach I have will generalize to e.g. "Find the max of 5 or 100 or 1000 numbers" (although I assume it might take some time) And overall you have no guarantee, that's why I qualified the statement and didn't say "Literally any imaginable problem that a human can solve without context". To some extent it also matter how you encoder your number, you can train a 10000000000 parameter FCNN with RELU activations until the end of time to learn a simple mutliplication, and it won't be able to do so if you don't log encode your numbers or use some encoding or activation that means `` can be transposed in the `+` operations being done inside the nodes to combine the outputs... because that's outside of the scope of mathematics that given netwrok can do. But, unless you are specifically trying to come up with an edge case and are instead looking at real world problems and trying to design the network in such a way as to best handle them (and this doesn't have to be all manual, you can use various NAS techniques), the rule will hold most of the time I believe. |
argmax([x,y]) = (sign(x[0]-x[1])+1)/2
Going beyond continuous functions, can deep learning be used for primality test?
(1) https://en.wikipedia.org/wiki/Universal_approximation_theore...