|
|
|
|
|
by WarmWash
2 hours ago
|
|
LLMs can't really "see", so I challenge you to draw a pelican on a bike without any visual feedback, just code. Because that is how they are doing it. Vision tokens for transformers aren't really well solved yet, which is why they can smash a phd math problem and trip over a "count the cats on the chair" problem. |
|