Hacker News new | ask | show | jobs
by Guvante 943 days ago
Except that if you narrow to a tiny training set you are back to problems that can be solved almost as quickly with full text search...
1 comments

Narrow to a tiny training set? What are you talking about now? That has nothing to do with deep learning.

GPT-3.5 was trained on at least 300 billion tokens. It has 96 layers in its neural network of 175 billion parameters. Each one of those 96 stacked layers has an attention mechanism that recomputes an attention score for every token in the context window, for each new token generated in sequence. GPT-4 is much bigger than that. The scale and complexity of these models is beyond comprehension. We're talking about LLMs, not SLMs.

I misread context window as training set and thought you were switching to SLMs. My mistake.