Hacker News new | ask | show | jobs
by ricardobeat 1988 days ago
In the linked CLIP paper they say it is trained on 256 GPUs for 2 weeks. No mention of the size of the trained output.