That's great I have been eyeing a Strix Halo and was wondering how well smaller models are doing. This is great news from the perspective of running local agents.
Is the performance decent? I'm looking at using it with 30b coding models with a local agent framework like goose to see if we can do this locally as developers instead of risking leaking code to the big models.
The chip in general is fast, it builds llvm in ~12m or so. Whisper on it is at least real time but I only ran the stream binary before sending the box away to SC25. I'm expecting it to need some work to exploit the zero copy the APU permits. So it probably will be fast but isn't just yet, at least on my toolchain.