| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zaptrem 815 days ago
	Transformer LLMs are just a bunch of MLPs (linear layers) where you sometimes multiply/softmax the output in a funny way (attention). In other words, they're arguably more "vanilla deep net" than most architectures (e.g., conv nets). (There are also positional/token embeddings and normalization but those are a tiny minority of the parameters)

2 comments

grungegun 815 days ago

So there's no performance gain for quantization enabled by the transformer architecture? It seems very strange that quantization works so well since in most of my experiments, the internal model weights of mlps look random.

link

amelius 815 days ago

Ok, but what does a perceptron look like in 1-bit? Would it be just some simple logic gate, like an OR-gate?

link

zaptrem 815 days ago

Not my area of expertise but I'd assume it becomes a decision tree or something.

Edit: lol https://news.ycombinator.com/item?id=39868508

link