If you're broke, get a refurbished/used Nvidia P40. Same amount of vram as a 3090, between 4 and 10 times cheaper depending on how cheap you can find a 3090.
Granted it's slower of course, but best bang for your buck on vram, so you can run larger models than on a smaller bit faster card might be able to. (Not an expert.)
Edit: if using in desktop tower, you'll need to cool it somehow. I'm using a 3D printed fan thingy, but some people have figured out how to use a 1080 ti APU cooler with it too.
If you're able to purchase a separate GPU, the most popular option is to get an NVIDIA RTX3090 or RTX4090.
Apple Mac M2 or M3's are becoming a viable option because of MLX https://github.com/ml-explore/mlx . If you are getting an M series Mac for LLMs, I'd recommend getting something with 24GB or more of RAM.
You don’t need MLX for this. Ollama, which is based on llama.cpp is GPU accelerated on a Mac. In particular it has better performance on quantized models. MLX can be used for eg fine tuning etc. It’s a bit faster than PyTorch for that.
Granted it's slower of course, but best bang for your buck on vram, so you can run larger models than on a smaller bit faster card might be able to. (Not an expert.)
Edit: if using in desktop tower, you'll need to cool it somehow. I'm using a 3D printed fan thingy, but some people have figured out how to use a 1080 ti APU cooler with it too.