In the Llama 2 paper benchmarks they did mention that Llama 2 is much worse at any kind of code generation than the OpenAI models, they were optimizing for conversational / natural language use-cases.
Interesting, what other openly licensed models are better at codegen? Or perhaps there is a version of llama 2 already fine tuned for codegen? There is starcoder but I had also not had great results with that one in my brief experiments