I tried running a model larger than ram size and it loads some layers into the gpu but offloads to the cpu also. It's faster than cpu alone for me, but not by a lot.