I was under the impression that it was mostly GPU vram based but once the model is loaded, it could produce output quickly? I'm probably over-simplifying things...
The latest gpt-3.5-turbo model generates very quickly and cheaply (in part to some recently-discoverd optimization techniques... older versions cost 10x more). While the required hardware to run GPT-4 is currently unknown, it generates considerably slower on average and its much higher cost points to a higher hardware cost.