Hacker News new | ask | show | jobs
by rfoo 708 days ago
... which is currently the most cost-efficient and environment-friendly way to do LLM inference [0].

[0] Small footprint time: before B100 ships; for actually large language models; for prefill only; may cause cancer in California.