Hacker News new | ask | show | jobs
by anentropic 1138 days ago
the naming is confusing... these models are aiming to equal or beat LLaMa by reproducing the trainign data and methodology that was used for LLaMa

But the actual model architecture is slightly different, based on Pythia

I guess what is needed is a pythia.cpp https://github.com/ggerganov/llama.cpp/issues/742

1 comments