| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Jerrrrrrry 624 days ago

  it should draw/write ASCII like an expert.

Not a lot of conversations incrementally totaling ASCII conversations in the training data - you are essentially asking a gold fish to climb a tree.

  It should have a lot of RGB image training data with associated captions => So it should understand images very well.

you seem to have conflated the architectures. ChatGPT was trained on text, and text-image embedding - it can recognize, but cannot project. Thats the DALL-E portion - it leverages a similar transformer arch but they are not the same model nor architecture.

However, ask a Generative Adversarial Network for ASCII, you'll get what you expect. Absent the infra-word character cohesion that LLM's token-ization provides, it will give realistic, if sometimes "uncanny" images - ones that "make sense" sequentially, or in the short term, but not the longer, or larger context.

The language portion of your brain, that works faster than you do - else you would be at a loss of words constantly - is not nearly as equipped to deal with spatial problems that your posterior parietal cortex is.

Ultimately we are converging towards a Mixture-of-Experts model that we will one day realize is just....us, but better.