Hacker News new | ask | show | jobs
by swiftcoder 59 days ago
Does this sort of thing scale? Would a 30B or higher model see similar performance/memory gains under this scheme?