Hacker News new | ask | show | jobs
by kubiton 950 days ago
There are research papers left and right.

Stuff like certain architectures, reading LLM for promoting multi modal llms.

Then we have stuff like insteuctgpt, ml models for robots, lots and lots of research from Nvidia for virtual simulation and transfer to real world, digital twin is also a relevant art in agi.

Object detection is also much better and has nothing to do with llms. Segment anything from FB for example.

Whisper and sd are also not LLM.

There are a ton of puzzle peaces slowly falling in place left and right.

2 comments

They may not be "large" in the same sense that GPT4 is "large" but apart from then simulator stuff, every single one of the models you mentioned is transformer-based. Every one of them basically includes encoders to project different modes of information (images and audio) into a "language-like" space so that it can be compared with and mapped to and from text. I think it's fair to say that language models, if not LLMs, unlocked a surprising amount of power.
"There are a ton of puzzle peaces slowly falling in place left and right."

Yet, we do not seem to have a very good understanding of how many pieces there are in the puzzle.

True.

But I feel well entertained watching them fall. Like using them and experimenting around.

But it also shows the road ahead quite clear. For example were is the money coming from? From millions of people paying for GitHub copilot for example.

How is it sold? Per webui, API and cloud providers.

Digital twin will also play a huge role in this as a bridge between AGI <> real world.