Hacker News new | ask | show | jobs
by momenti 1506 days ago
The model only has about 1B parameters which is relatively small.

The language models that produced very impressive results have >>50B parameters, e.g. GPT-3 with 175B, Aleph Alpha Luminous (200B), Google PaLM (540B). GPT-3 can understand and answer basic trivia questions, and impressively mimic various writing styles, but it fails at basic arithmetic. PaLM can do basic arithmetic much better and explain Jokes. Dall-E 2 (specialized on image generation) has 3.5B parameters for the image generation alone and it uses a 15B language model to read in text (a version of GPT-3).