Hacker News new | ask | show | jobs
by fcanesin 11 days ago
Yes, DFlash is currently a SOTA speculative decoding method that Xiaomi just used in their MiMo model for >1000tkps