I’ve also recently started in this space: building an agent, for a client, who can communicate in multiple languages.
evaluating your agent: we are documenting the details, but it should give you some idea about an approach https://news.ycombinator.com/item?id=47232903
Also, you might find this useful - https://open.substack.com/pub/bytebytego/p/how-roblox-uses-a...