|
|
|
|
|
by famouswaffles
644 days ago
|
|
It's not about how simple B64 is or isn't. In fact i chose a simple problem we've already solved algorithmically on purpose. It's that all you've just said, reasonable as it may sound is entirely speculation. Maybe "no idea" was a bit much for this example but any idea certainly didn't come from seeing the matrices themselves fly. |
|
It doesn't explain higher level generalizations like being a transpiler between different programming languages that didn't have any side-by-side examples in the training data. Or giving an answer in the voice of some celebrity. Or being able to find entire rhyming word sequences across languages. These are probably more like the kind of unexplainable generalizations that you were referring to.
I think it may be better to frame it in terms of accuracy vs precision. Many people can explain accurately what an LLM is doing under all those matrix multiplies, both during training and inference. But, precisely why an input leads to the resulting output is not explainable. Being able to do that would involve "seeing" the shape of the hypersurface of the entire language model, which as sibling commenters have mentioned is quite difficult even when aided by probing tools.