EDIT: Looks like Granite 4 hybrid architecture support was added to llama.cpp back in May: https://github.com/ggml-org/llama.cpp/pull/13550
Yes and no. They've written their own "engine" using GGML libraries directly, but fall back to llama.cpp for models the new engine doesn't yet support.
Yes and no. They've written their own "engine" using GGML libraries directly, but fall back to llama.cpp for models the new engine doesn't yet support.