|
|
|
|
|
by A1kmm
101 days ago
|
|
Yes Transformer models are non-deterministic, but it is absolutely not true that they can't generalise (the equivalent of interpolation and extrapolation in linear regression, just with a lot more parameters and training). For example, let's try a simple experiment. I'll generate a random UUID: > uuidgen
44cac250-2a76-41d2-bbed-f0513f2cbece Now it is extremely unlikely that such a UUID is in the training set. Now I'll use OpenCode with "Qwen3 Coder 480B A35B Instruct" with this prompt: "Generate a single Python file that prints out the following UUID: "44cac250-2a76-41d2-bbed-f0513f2cbece". Just generate one file." It generates a Python file containing 'print("44cac250-2a76-41d2-bbed-f0513f2cbece")'. Now this is a very simple task (with a 480B model), but it solves a problem that is not in the training data, because it is a generalisation over similar but different problems in the training data. Almost every programming task is, at some level of abstraction, and with different levels of complexity, an instance of solving a more general type of problem, where there will be multiple examples of different solutions to that same general type of problem in the training set. So you can get a very long way with Transformer model generalisations. |
|