Hacker News new | ask | show | jobs
by jinushaun 1161 days ago
The French example is strange and shows that the language model has an English bias.

  - “I want a pizza” = 4 tokens
  - “Je voudrais une pizza” = 7 tokens
Why is “want” only 1 token in English, but “voudrais” 4 tokens? Following the French example, would “wants” and “wanted” map to 1 or two tokens?
1 comments

I think it’s because the article itself is a bit wrong: ‘voudrais’ in French is more analogous to ‘I would like’ in English than ‘want’. Specifically, the ‘v-‘ indicates that this means ‘to want’, ‘-oud-‘ means that it is in the conditional or future, while ‘-ais’ would indicate its first person conditional. This being said, it makes sense ‘voudrais’ is more tokens than ‘want’, because it encodes more information.