Hacker News new | ask | show | jobs
by nikki93 287 days ago
A relevant paper: https://arxiv.org/abs/2306.11644 -- the Phi models (and many others too) are based on this idea.