Llama 7B is quite dumb. Using the 13B you'd get significantly better results, and you can train a qlora on a single 3090 (I think even less is possible but not sure)
Ooof. I'd expect this to cost like 5 bucks on runpod using a single 3090.
I use axolotl for training, I didn't check your notebook but axolotl likely comes with more optimized defaults for speed and vram than what you're doing.