Hacker News new | ask | show | jobs
by sp332 812 days ago
Even better is the result on page 7 that perplexity drops faster by wall-clock time. Even if you're getting fewer iterations per hour of rented GPU time, you're still coming out ahead in model performance.