| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by albertzeyer 1279 days ago

I tried Librispeech, a very common dataset for speech recognition, in both HF and TFDS.

TFDS performed extremely bad.

First it failed because the official hosting server only allows 5 simultaneous connections, and TFDS totally ignored that and makes up to 50 simultaneous downloads and that breaks. I wonder if anyone actually tested this?

Then you need to have some computer with 30GB to do the preparation, which might fail on your computer. This is where I stopped. https://github.com/tensorflow/datasets/issues/3887. It might be fixed now but it took them 8 months to respond to my issue.

On HF, it just worked. There was a smaller issue in how the dataset was split up but that is fixed now, and their response was very fast and great.