Hacker News new | ask | show | jobs
by IshanMi 1126 days ago
I think the 8 trillion parameters is accurate- Tangora is an N-gram model with a vocab size of 20,000 words and N = 3.

Parameters for an N-gram model = V^(N-1) * (V-1) Plugging in V=20,000 words and N = 3 for Tangora, you'd get 7.9996E12.

Most of the parameters are likely zero or close to it because many 3-grams are possible but not likely to occur. (However the aggregate probability of all 3-grams is substantial and thus they have to be included.)