From https://huggingface.co/databricks/dolly-v2-12b#benchmark-met..., it seems like dolly-v2-12b's benchmark results are actually slightly worse than dolly-v1-6b.
A commercially viable instruction-tuned LLM is still a huge deal.