Y
Hacker News
new
|
ask
|
show
|
jobs
by
smcin
1231 days ago
> For clarity, this is ONLY the forward pass of the model. There's no training code, batching, kv cache for efficiency, GPU support, etc ...
Neat, but please add one-line comments/docstrings where these missing bits would go.