Hacker News new | ask | show | jobs
by haensi 657 days ago
Very interesting to see an LLM with weights and the code base. They also talk about tokenizer fertility in the HF model card [1]

[1]: https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control

1 comments

"Tokenizer fertility is a metric used to evaluate tokenizer performance and measures a tokenizer’s ability to represent text, calculated by dividing the number of tokens in a text (after tokenizing) by the number of words in that same text"