Hacker News new | ask | show | jobs
by riku_iki 2427 days ago
That's why I mentioned scale in my first comment. For sub-TB datasizes with 16 cores CPU and NVME raid (you can get such machine for less than $1k nowdays) PG will be just fine.

Also in typical ML pipeline as I mentioned you can generate ngrams in input function of your model (Dataset API in TF), you don't need to store it somewhere.