| I find it difficult to understand certain math and science papers/articles due to ambiguous use of language. For example "all previous tokens can be passed to the current token." That seems like a poorly constructed sentence. A token is not a function and it's not an algorithm either... How can you pass tokens to a token? This type of ambiguous language in academic papers makes it hard to read... Maybe the phrase 'every token has an association with every other previously encountered token' would be better? Or every token is used to compute the token vector for each token... I don't know, all I can do is guess the meaning of the word 'passed'. They want us to infer and fill in the gaps with our own assumptions. It assumes that we are primed to think in a certain highly constrained way... For some reason a lot of academia around AI is littered with such imprecise language. They choose to use niche concepts and repurposed wording that their own small community invented rather using words and ideas that are more widely understood but which would convey the same information. Rational people who aren't directly involved in those fields who generally resist jumping to conclusions will struggle to understand what is meant because a lot of those words and ideas have different interpretations in their own fields. I studied machine learning at university and wrote ANNs from scratch and trained them and even I find the language and concepts around LLMs too ambiguous. I'd rather just ask ChatGPT. One thing that bothers me is that the community has moved away from relating concepts to neurons, interconnections, input layers, hidden layers and output layers. Instead, they jump straight into vectors and matrices... Pretending as though there is only one way to map those calculations to neurons and weights. But in fact, this abstraction has many possible interpretations. You could have fully connected layers or partially connected layers... Maybe you need a transformer only in front of the input layer or between every layer... So many possibilities. The entire article means little if considered in isolation outside of the context of current configurations of various popular frameworks and tools. |