|
|
|
|
|
by rkorlimarla
53 days ago
|
|
this is super interesting - LLM's do not distinguish between pictures/screenshots and text - all are vectorized. LLM's process everything together and is part of the thinking process- it is magic and breakthrough.. My guess is that this was not by design but a nice after-effect of the core attention design.. a lot of papers are written on it - you will find it a very interested read. |
|