|
|
|
|
|
by lopuhin
1496 days ago
|
|
Not 'char' - because it's using BPE (byte pair encoding), so after tokenization you might get ["Transform", "ers"] instead of ["T", "r", "a", ...]. This is relevant to how it struggles to reverse words. Not 'largest' because there are larger models like Pathways Language Model (PaLM) with 540 Billion parameters. |
|