Hacker News new | ask | show | jobs
by snyhlxde 769 days ago
A bit more intuition on how training works: in natural language processing, some phrases/collocations, for example "remind ... of ...", "make a decision", "learn a skill" etc. are used together. We can ask LLMs to learn such collections & frequently appearing n-grams. After learning, the model can use parallel decoding to predict many tokens that are frequently appear together in one forward pass.