| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Starlord2048 466 days ago
	[flagged]

3 comments

andai 466 days ago

Fascinating. I was thinking how the factory should be communicated to the model, and represented "internally". Images aren't the right solution (very high bandwidth for no real benefit). An ASCII grid of the game's tiles (more likely, a small chunk of it) is orders of magnitude better, but you still don't need to simulate every tile in a conveyor. It's just a line, right? So the whole thing is actually a graph!

That compresses nicely into text, I imagine.

I'd like to hear more details about your symbolic approach!

link

HideousKojima 466 days ago

>An ASCII grid of the game's tiles (more likely, a small chunk of it) is orders of magnitude better, but you still don't need to simulate every tile in a conveyor. It's just a line, right? So the whole thing is actually a graph!

Until you accidentally feed a different material into your belt and need to clean it up

link

nostrademons 466 days ago

Probably the memory model of the game itself is the best representation. The devs have already spent a significant amount of development cycles optimizing this down to a minimal compressed form - belt runs, for example, are one entity regardless of how long they are. The LLM is then effectively modeling the degrees of freedom of the game simulation and picking code paths within them.

link

noddybear 466 days ago

This is really interesting, do you have a repo or anything describing the approach? I would be particularly interested in trying your approach in FLE to see how it affects layout design. How are you performing the spatial reasoning?

link

mlsu 466 days ago

Yes!

The way I think of it is this. Yes, the LLM is a "general reasoner." However, it's locked in a box, where the only way in and out is through the tokenizer.

So there's this huge breadth of concepts and meanings that cannot be fully described by words (things like, spatial reasoning, smells, visual relationships, cause/effect physical relationships etc). The list of things that can't be described by words is long. The model would be capable of generalizing on those, it would optimize to capture those. But it can't, because the only thing that can fit through the front door is tokens.

It's a huge and fundamental limitation. I think Yann Lecunn has been talking about this for years now and I'm inclined to agree with him. This limitation is somewhat obscured by the fact that we humans can relate to all of these untokenizable things -- using tokens! So I can describe what the smell of coffee is in words and you can immediately reconstruct that based on my description, even though the actual smell of coffee is not encoded in the tokens of what I'm saying at all.

link