For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.
For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.