Hacker News new | ask | show | jobs
by stegrot 1775 days ago
Here are some draft guidelines for validation that have been translated a lot: https://discourse.mozilla.org/t/discussion-of-new-guidelines...

But you are right, the process has some flaws. Maybe we can review the dataset automatically on some common errors, once an STT system is ready for a language?

The only other option I can think about is a validation process that includes more people per sentence. Right now, only two people validate a sentence, and if they disagree a third person decides. We could at least double check sentences with one "no" vote one more time.

1 comments

The community guidelines are good but they’re hidden away on the forum. I was asking them for years to just make those the official guidelines and link them prominently on the CV site but they never did.

However, Hillary, the new community manager, seems good and she’s making a lot of positive changes so hopefully this will be addressed soon.

Long-term the best approach may be some kind of user onboarding before they can record / validate.

Hey,

Thank you for the compliment and feedback.

Following community feedback voice validation criteria is now available on Common Voice platform (released as part of the recent dataset).

This is one of many steps we are making to improve Common Voice contributors and everyone using the dataset.