Hacker News new | ask | show | jobs
by cjbprime 793 days ago
(For a model of GPT-4's size, it could also be 8 nodes with several GPUs each, each node comprising a single expert.)