Hacker News new | ask | show | jobs
by Agentus 499 days ago
wait until it crosses a few thresholds we’re fast approaching. ai that performs at intern level vs phd vs domain leading expert level. ai that has the full context of what youre doing immediately accessible. ai that can agenticly autonomously navigate the computer environment without stumbling blocks.

i think the difference between intern level ai and domain leading expert ai is a few algorithmic adjustments away using a type of reasoning reinforcement framework (like GRPO) which deal with competing signals in a better way. instead of averaging them, it reasons which contextually signal should take precedent. its the difference between lets say taking a vote among the common populace on how to build a nuclear power plant and finding an expert and figuring out where the experts decisions should take precedent and where other experts decisions should override.

square that away and the embarrassing feeling on the promise of ai should wash away.

1 comments

> ai that performs at intern level vs phd vs domain leading expert level.

These are different things:

- Regurgitating advanced text that has been shifted into a shape matching your query

- Understanding intimately the $100M screw to turn

How do you know that these are different things? They could be, I genuinely don’t know, but I’m not sure where people are getting these kind of confident assertions about what modern architectures could never do. Would you have predicted in 2020 that photorealistic text to image generation was within the scope of current theory?
> Would you have predicted in 2020 that photorealistic text to image generation was within the scope of current theory?

Yes, and I've been working in this area with excitement since about that time.

The physics of optics are well understood. We've been writing ray tracers for forever and coming up with clever hacks like Blinn–Phong, PBR, etc. for ages. SIGGRAPH has always felt like tangible magic. We have had the map in our hands and now we're coming up with new ways to traverse a familiar landscape.

Reasoning is an undiscovered country. There are lots of exciting claims being made, but nothing concrete.

I expect lots of advancements in signal processing, spatial computing, and beyond because those things are obvious and intuitive.

The mathematical definition of a language model is the probability distribution of tokens that follows the previous context. It's literally deciding the most probable response, which while at many times may match the correct response, is not a 1 to 1.
welcome to earth where 98% of forum/political interactions are confident assertions from nowhere used to dismiss people. enjoy your stay ;).
obviously theres also a multimodality gap to be overcome to intimately understanding the $100M screw to turn, but i suspect most reasoning that matters has already been translated and is embedded into/in words. i wouldn’t underestimate the amount of useful knowledge that exists in embedding advance texts into an LLm model. the challenge is contextually hierarchalizing it (a matter of reasoning) and decoding it back into reality (words are dimensional squished encodings of reality).