Hacker News new | ask | show | jobs
by lumost 919 days ago
Outside of hardware/implementation optimizations, and position embedding choice - has the SOTA transformer architecture evolved that much?

Llama-2 code appears to be about the same as gpt-2.

1 comments

You can look at https://github.com/ggerganov/llama.cpp/blob/master/llama.cpp... for examples of the different layers in a number of different models, and further down in the code for their implementations. tldr, yes they are very similar. I can see lots of value in something that can just run these models. Even if you just supported llama2 there are tons of options available.