It would be great if you could share an example of the inconsistent output problem -- we also faced it. GPT-4 was much better than GPT-3.5 in output quality.