It seems actual domain specific usefulness (say specific programming language, translation, etc) starts at 3B models.