Hacker News new | ask | show | jobs
by divamgupta 310 days ago
Mostly model size, and input size. Some models which use attention are O(N^2)