But my hunches/experience is that `proper prompting + nature of code being logical` really showcases the power of whatever statistical distribution and alignment occurs during generation. Further, there must be major upstream efforts of high-quality training data curation + creation, an advanced training tricks like using a LLM's strong ability in one task area to support the training of an area it is weak in.
My understanding is the transformer layer in the LLM is basically doing something akin to message passing, it’s like a mini computer. In predicting the next word, it has to understand a lot about a lot of different kinds of topics
My understanding is kinda fuzzy because I haven’t coded it up myself, but this was the takeaway I got from this explanation (starts at 36:21)
But my hunches/experience is that `proper prompting + nature of code being logical` really showcases the power of whatever statistical distribution and alignment occurs during generation. Further, there must be major upstream efforts of high-quality training data curation + creation, an advanced training tricks like using a LLM's strong ability in one task area to support the training of an area it is weak in.