Hacker News new | ask | show | jobs
by dartos 900 days ago
Transformers can be considered a kind of neural network.

It’s mainly fancy math. With tools like PyTorch or tensorflow, you use python to describe a graph of computations which gets compiled down into optimized instructions.

There are some examples of people making transformers and other NN architectures in about 100 lines of code. I’d google for those to see what these things look like in code.

The training loop, data, and resulting weights are where the magic is.

The code is disappointingly simple.

1 comments

  > The code is disappointingly simple.
I absolutely adore this sentence, it made me laugh to imagine coders or other folks looking at the code and thinking "That's it?!? But that's simple!"

Although it feels a little similar to some of the basic reactions that go to make up DNA: start with simple units that work together to form something much more complex.

(apologies for poor metaphors, I'm still trying to grasp some of the concepts involved with this)

Yes neural networks, and even the math required to build them, are very simple calc 1 stuff generally. It’s more coming up with these models that takes powerful intuition
I spent a solid month very confused after reading up on how to implement some basic neural networks.

I was sure I missed something, so I didn’t even try to implement it since I was so sure I missed the complicated bit.

But no, all the complexity is in the mathematical implications