| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by UltraSane 386 days ago
	I think long term LLMs should directly generate Abstract Syntax Trees. But this is hard now because all the training data is text code.

3 comments

saurik 386 days ago

The training data is text code that can be compiled, though, so the training data can also easily be an Abstract Syntax Tree.

link

UltraSane 385 days ago

But is anyone actually doing this?

link

undfined 386 days ago

There's a fair amount of experimental work happening trying different parsing and resolution procedures such that the training data reflects an AST and or predicts nodes in an AST as an in-filling capability.

link

catfacts 386 days ago

Do you know if any such experimental work is using a special tokenizer for example in Lisp a special token for left or right parenthesis?

link

kenjackson 385 days ago

It's possible that LLMs build ASTs internally for programming. I have no 1st hand data on this, but it would not surprise me at all.

link

astrange 385 days ago

LLMs don't have memory, so they can't build anything. Insofar as they produce correct results, they have implicit structures corresponding to ASTs built into their networks during training time.

link

kenjackson 385 days ago

"LLMs don't have memory"

That's interesting. Is there research into adding memory or has it been proven that it provides no pragmatic value over any context it outputs?

link