Super cool! If all the components of this metric are models, thoughtful versioning and documenting it will be important. What are your thoughts on this ?
This is actually something we struggled with, and why we released everything this way. There is something to say about why BLEU and ROUGE have stood the test of time, they are extremely simple to use and static. Hopefully we can bridge this gap.