|
|
|
|
|
by maytc
607 days ago
|
|
The difference in the dates example seems right to me
20 October 2024 and 2024-20-10 are not the same. Months in different locales can be written as yyyy-MM-dd. It can also be a catalog/reference number. So, it seems right that their embedding similarity is not perfectly aligned. So, it's not a tokenizer problem. The text meant different things according to the LLM. |
|