Hacker News new | ask | show | jobs
by miohtama 217 days ago
Shouldn't all caps normalised to tokens like low caps? There are no separate tokens for all caps and low caps in Llama, or at least not in the past.
1 comments

Looking at the tokenizer for the older Llama 2 model, the tokenizer has capital letters in it: https://huggingface.co/meta-llama/Llama-2-7b-hf