|
|
|
|
|
by valine
941 days ago
|
|
We already have multimodal models that take both images and text as input. The bulk of the training for these models was in text, not images. This shouldn’t be surprising. Text is a great way of abstractly and efficiently representing reality. Of course those patterns are useful for making sense of other modalities. Beyond modeling the world, text is also a great way to model human thought and reason. People like to explain their thought process in writing. LLMs already pick up on and mimic chain of thought well. Contained within large datasets is crystallized thought, and efficient descriptions of reality that have proven useful for processing modalities beyond text. To me that seems like a great foundation for AGI. |
|
It's only one part, predicting text is relatively straightforward because it doesn't require predicting complex sequences like 'a S23mz s.zawsds'. Based on statistical analysis, there is a limited number of word combinations that humans use. With hundreds of billions of parameters, significant compression is possible. Mathematics is different as it requires actual reasoning, an area where LLMs often struggle significantly because they lack the capability for genuine reasoning.