| HN Mirror

>a model aware of the (programming) language constructs explicitly

You could never include that in the model's training. The best you could do would be to construct an AST on the model output and discard suggestions with invalid syntax. And provide enough negative examples (invalid syntax) to reduce false positives.

What you proposed would never work with a language model, and makes no sense with how backprop works. The model will learn the grammar (syntax), but will always output some percentage of false positives (invalid syntax).

You can't hardcode the syntax into the model. Another approach is to encode token types after tokenization, which will give the model more information about the syntax/meaning of tokens.