Hacker News new | ask | show | jobs
by knuppar 380 days ago
One could argue TF-IDF is a case of an attention layer... but not quadratic in inference/training and kinda just a quotient. Yeah maybe we should go back