Hacker News new | ask | show | jobs
by echelon 504 days ago
> ai that performs at intern level vs phd vs domain leading expert level.

These are different things:

- Regurgitating advanced text that has been shifted into a shape matching your query

- Understanding intimately the $100M screw to turn

2 comments

How do you know that these are different things? They could be, I genuinely don’t know, but I’m not sure where people are getting these kind of confident assertions about what modern architectures could never do. Would you have predicted in 2020 that photorealistic text to image generation was within the scope of current theory?
> Would you have predicted in 2020 that photorealistic text to image generation was within the scope of current theory?

Yes, and I've been working in this area with excitement since about that time.

The physics of optics are well understood. We've been writing ray tracers for forever and coming up with clever hacks like Blinn–Phong, PBR, etc. for ages. SIGGRAPH has always felt like tangible magic. We have had the map in our hands and now we're coming up with new ways to traverse a familiar landscape.

Reasoning is an undiscovered country. There are lots of exciting claims being made, but nothing concrete.

I expect lots of advancements in signal processing, spatial computing, and beyond because those things are obvious and intuitive.

The mathematical definition of a language model is the probability distribution of tokens that follows the previous context. It's literally deciding the most probable response, which while at many times may match the correct response, is not a 1 to 1.
welcome to earth where 98% of forum/political interactions are confident assertions from nowhere used to dismiss people. enjoy your stay ;).
obviously theres also a multimodality gap to be overcome to intimately understanding the $100M screw to turn, but i suspect most reasoning that matters has already been translated and is embedded into/in words. i wouldn’t underestimate the amount of useful knowledge that exists in embedding advance texts into an LLm model. the challenge is contextually hierarchalizing it (a matter of reasoning) and decoding it back into reality (words are dimensional squished encodings of reality).