Hacker News new | ask | show | jobs
by rasz 1775 days ago
So in other words we provide data for free to Mozilla, and Mozilla turns around and sells it for millions to Nvidia to fund ... not open source, they killed that so umm ee, to fund ceo salary?
2 comments

You seem to imply that Nvidia are paying for data that is freely available.

Anyone can use the Common Voice data within the terms of the license and NVIDIA contributing towards the continued gathering of data (that will continue to be made publicly available) won't change that.

It's a huge shame that Mozilla didn't continue the DeepSpeech project but Coqui is taking on the mantle there and there are plenty of others working on open source solutions too, all whilst the existence of CV will make a big difference to research, in the academic, commercial and open source spheres.

Coqui is phenomenally good and well done, so this new data should lower the barrier to entry for the represented languages.
> and sells it

If that was true that would be a profoundly bad purchase for NVidia since the data is already freely licensed and available for anyone to use at no cost.

This is like saying that Epic "bought" Blender when they gave it a development grant, or that Google contributing patches to upstream Linux means they own it now. Mozilla didn't give NVidia any kind of special license, when NVidia contributes data to Common Voice they're doing so under Common Voice's license, not their own.

We want to encourage more companies to treat software and training data as a public commons that is collectively maintained, this is a good thing.

Its the kind of "bad" Nvidia purchase like when they pay game publishers for incorporation of physx/cuda/hairworks/gameworks resulting in

https://techreport.com/news/14707/ubisoft-comments-on-assass...

https://techreport.com/review/21404/crysis-2-tessellation-to...

https://arstechnica.com/gaming/2015/05/amd-says-nvidias-game...

Here it appears they purchased this https://venturebeat.com/2021/04/12/mozilla-winds-down-deepsp...

This is silly. Common Voice is not adding NVidia-specific features; what would that even look like for a database? There is no comparison to be made between donating resources to an openly licensed database and encouraging developers to optimize their games for proprietary APIs.

And the assumption the shutting down Deep Speech was specifically for NVidia's benefit seems like a fairly large leap to me, given that Deep Speech is already mature, still being developed under Coqui.ai, and surrounded by a wide diversity of other deep learning projects that also aren't controlled by NVidia.

Decreasing barriers of entry for those models and providing raw data is probably the right thing for Mozilla to be focusing on right now. Any team can build a language model, only companies like Mozilla can coordinate mass data collection for those models.