|
|
|
|
|
by Jerrrrrrry
624 days ago
|
|
it should draw/write ASCII like an expert.
Not a lot of conversations incrementally totaling ASCII conversations in the training data - you are essentially asking a gold fish to climb a tree. It should have a lot of RGB image training data with associated captions => So it should understand images very well.
you seem to have conflated the architectures. ChatGPT was trained on text, and text-image embedding - it can recognize, but cannot project. Thats the DALL-E portion - it leverages a similar transformer arch but they are not the same model nor architecture.However, ask a Generative Adversarial Network for ASCII, you'll get what you expect. Absent the infra-word character cohesion that LLM's token-ization provides, it will give realistic, if sometimes "uncanny" images - ones that "make sense" sequentially, or in the short term, but not the longer, or larger context. The language portion of your brain, that works faster than you do - else you would be at a loss of words constantly - is not nearly as equipped to deal with spatial problems that your posterior parietal cortex is. Ultimately we are converging towards a Mixture-of-Experts model that we will one day realize is just....us, but better. |
|