Hacker News new | ask | show | jobs
by nyamhap 3332 days ago
My bottleneck is still speed of reading data from json. I wonder whether I should wait for features to be built out here or go down the path of writing a custom data reader in C++
3 comments

If your data has a little extra structure that isn't shared by JSON in general, you could probably get serious performance gains by rolling your own.
Utilizing multiprocessing for reading and processing jsons (or any type of data) then feeding the output into a shuffle_batch* op works great for me.
You could use Tensorflow-on-Spark to read your JSON into a RDD in Spark. Then the Tf-RDD-Reader will be in-memory and can feed your training.