Hacker News new | ask | show | jobs
by dimitry12 2976 days ago
An important hidden cost here is coding a model which can take advantage of mixed-precision training. It is not trivial: you have to empirically discover scaling factors for loss functions, at the very least.

It's great that there is now wider choice of (pre-trained?) models formulated for mixed-precision training.

When I was comparing Titan V (~V100) and 1080ti 5 months ago, I was only able to get 90% increase in forward-pass speed for Titan V (same batch-size), even with mixed-precision. And that was for an attention-heavy model, where I expected Titan V to show its best. Admittedly, I was able to use almost double the batch-size on Titan V, when doing mixed-precision. And Titan V draws half the power of 1080ti too :)

At the end my conclusion was: I am not a researcher, I am a practitioner - I want to do transfer learning or just use existing pre-trained models - without tweaking them. For that, tensor cores give no benefit.

2 comments

Author here.

Yes, thanks for mentioning that! That's what the article is alluding to at the end. There's also something like a "cost-to-model" and that's influenced by how easy it is to make efficient use of the performance and how much tweaking it needs. It's also influenced by the framework you use... However, that's difficult to compare and almost impossible to measure.

How did you get your hands on Titan V 5 months ago? I still can't find it anywhere in retail in EU...
It was in stock on and off and I was able to order it directly from Nvidia US.

After 59 days of playing with it, I sent it back (initiated return on 30th day, after I already figured out it doesn't live up to the hype, then had another 30 days to actually send it back).

With $3,000 I can buy 4 1080ti's, while only two are necessary to beat Titan V (in Titan V's best game). I only bought one though. NowInStock.net helped with buying 1080ti directly from Nvidia.