i haven't noticed 4bit quantization affecting the quality of LLaMA-7B, it produces very coherent outputs, the trick is having a good example in your prompt so it has a good idea of what's expected of it.
Quality and quantity: I've had the best luck cramming a bunch of examples into the input, just like with GPT-J where you're only working with 6B parameters. Making sure the format stays consistent and ideally presented in the shape you'd encounter that same text if you found it on a webpage somewhere.