Y
Hacker News
new
|
ask
|
show
|
jobs
by
hrimfaxi
86 days ago
Why does this G character appear to prefix most of the output? ("Ġlike")
2 comments
frwickst
86 days ago
It is a tokenizer artifact most likely (
https://github.com/huggingface/transformers/issues/4786
). So the output is not properly decoded in this case, it should just be a space.
link
kgeist
86 days ago
The original tokens have Ġ instead of space. I had this issue too when writing an inference engine for Qwen. You have to "normalize" those special characters.
link