| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by A1kmm 101 days ago

Yes Transformer models are non-deterministic, but it is absolutely not true that they can't generalise (the equivalent of interpolation and extrapolation in linear regression, just with a lot more parameters and training).

For example, let's try a simple experiment. I'll generate a random UUID:

> uuidgen 44cac250-2a76-41d2-bbed-f0513f2cbece

Now it is extremely unlikely that such a UUID is in the training set.

Now I'll use OpenCode with "Qwen3 Coder 480B A35B Instruct" with this prompt: "Generate a single Python file that prints out the following UUID: "44cac250-2a76-41d2-bbed-f0513f2cbece". Just generate one file."

It generates a Python file containing 'print("44cac250-2a76-41d2-bbed-f0513f2cbece")'. Now this is a very simple task (with a 480B model), but it solves a problem that is not in the training data, because it is a generalisation over similar but different problems in the training data.

Almost every programming task is, at some level of abstraction, and with different levels of complexity, an instance of solving a more general type of problem, where there will be multiple examples of different solutions to that same general type of problem in the training set. So you can get a very long way with Transformer model generalisations.