|
|
|
|
|
by LoganDark
1072 days ago
|
|
apple probably has the attention to detail to train the absolute shit out of their models. they will not need 8x220M parameters to do what GPT4 does, if they ever get to that point. see LLaMA2 7b and 13b being (subjectively) far better than LLaMA1 even with the same number of parameters, just by having been trained more apple is known to care a lot about stuff like this. like, a lot. they are pedantic as heck |
|