| Author here. I've updated the article based on your feedback. Thank you. Key corrections: Ollama GPU usage - I was wrong. It IS using GPU (verified 96% utilization). My "CPU-optimized backend" claim was incorrect. FP16 vs BF16 - enum caught the critical gap: I trained with BF16, tested inference with FP16 (broken), but never tested BF16 inference. "GPU inference fundamentally broken" was overclaimed. Should be "FP16 has issues, BF16 untested (likely works)." llama.cpp - veber-alex's official benchmark link proves it works. My issues were likely version-specific, not representative. ARM64+CUDA maturity - bradfa was right about Jetson history. ARM64+CUDA is mature. The new combination is Blackwell+ARM64, not ARM64+CUDA itself. The HN community caught my incomplete testing, overclaimed conclusions, and factual errors. Ship early, iterate publicly, accept criticism gracefully. Thanks especially to enum, veber-alex, bradfa, furyofantares, stuckinhell, jasonjmcghee, eadwu, and renaudr. The article is significantly better now. |