Hacker News new | ask | show | jobs
by lopuhin 1500 days ago
Technically it's not the largest, not char, not rnn... but it's close :)
1 comments

Why not 'char', and not 'largest'?
Not 'char' - because it's using BPE (byte pair encoding), so after tokenization you might get ["Transform", "ers"] instead of ["T", "r", "a", ...]. This is relevant to how it struggles to reverse words. Not 'largest' because there are larger models like Pathways Language Model (PaLM) with 540 Billion parameters.