|
|
|
|
|
by minimaxir
399 days ago
|
|
> The BERT-like and other transformer embeddings far outperform word vectors because they can take into account the context of the word. In addition to being able to utilize attention mechanisms, modern embedding models use a form of tokenization such as BPE which a) includes punctuation which is incredibly important for extracting semantic meaning and b) includes case, without as much memory requirements as a cased model. The original BERT used an uncased, SentencePiece tokenizer which is out of date nowadays. |
|
Little did I know that people were going to have a lot of tolerance for "short circuiting" of LLMs, that is getting the right answer by the wrong path, so I'd say now that my methodology of "predictive evaluation" that would put an upper bound on what a system could do was pessimistic. Still I don't like giving credit for "right answer by wrong means" since you can't count on it.