Hacker News new | ask | show | jobs
by jsheard 629 days ago
Yeah but when all you have is an LLM hammer, everything looks like a text2text nail.
1 comments

LLMs are fundamentally one dimensional which works fine when you're generating next tokens for text which because that's a 1D problem.

I do wonder how much progress we could make on a problem like this with a 3D transformer architecture.

I’m not sure I follow this. Isn’t an LLMs dimensionality measured by how many parameters the model supports? Ie 10s of billions in some cases? If I understand it correctly, then, the model is already evaluating things in lots of dimensions and reducing it down to 1, as you say in the case of text, 2 dimensions in image generation, 3 should be pretty straightforward.
I think they're referring to the dimensionality of the input / output space, not the intermediate internal representation.
The neat thing is, you can rasterize 1D space into 2D, 3D and so on. Trick as old as analog TV signal processing.
If I am understanding you right... I don't think this gets you anywhere useful.

Even if you could do what you're suggesting with an LLM (I have my doubts) this result would be a mesh or 3D pixel grid or something, yes?

This is terrible for interoperability and it's the opposite of what mainstream CAD packages do.