| > Though, I've been saying for a while that the local AI inflectiom point is the death knell for these frontier labs. "Death knell" is a touch hyperbolic. Hardware that can only run quantized models that take up GBs in VRAM falls short of even an A100 (by almost an order of magnitude[0]), which in turn falls short of what an 8xH100 cluster can do (also by another order of magnitude[0]). I'm an avid believer in local LLMs, but I cannot deceive myself - data center accelerators will win on power dissipation numbers alone[1], even when giving generous allowances for higher efficiency on Apple chips - and assuming the Apple-efficiency advantage persists on the same TSMC process node. 0. Based on my unscientific fine-tuning training experiments across local and rented GPUs. YMMV for inference. 1. Unless Apple surprises everyone and brings back the XServe with M7, if not, then laptop and desktop for factors simply can't dump heat fast enough to compete head-to-head, and will be designed for lower input wattage. |
The frontier models are faster, and better at coding, but not so much that i’ll pay $200/month for them.