| There’s a lot of controversy about “7B is good enough and small enough for consumer hardware so it’s good enough fullstop” …but, although it is true that for a fixed compute budget that these small models can have impressive results with good training data, it is also true that smaller models (7B) appear to have an upper performance bound that is beaten easily by larger well trained models. It’s just way more expensive to train larger models. They specifically note they are training a smaller 3B model In the future. So… it seems reasonable to assume that this is a proof of concept, and that no, the Berkeley AI lab will not be fielding the cost for training a larger model. This is probably more about exploring the “can we make a cheap good-enough model?” than “here is your GPT4 replacement”. |
30B is within reach, with compression techniques that seem to lose very little information of the overall network. Many argue that machine learning IS fundamentally a compression technique, but the topology of the trained network turns out to be more important. Assuming an appropriate activation function after this transformation.
No… definitely not your GPT4 replacement. However this is the kind of PoC I keep following… every… 18 hours or so? Amazing.