I have been evaluating buying a 1080Ti recently; it appears to also have the highest cores per $. Is the 1080Ti really then the most efficient card for general purpose HPC work on Linux (numerics)? The 1080 is a step down but also competitively priced. Curious about your thoughts, as I couldn't find a guide on this stuff on the web.
edit: The Tesla K20 is also in competition in my view (despite the much higher cost) due to its focus on higher double-precision performance.
We do a lot of work on video encoding.
We have had a K80, Titan X(Maxwell), Titan X(Pascal), 1080, 1080Ti, and others (including render-farms based on GTX980's).
General thoughts: Don't expect to get _any_ information out of NVidia unless you are running everything on their hardware compatibility lists (i.e. server-case)
Do not mix & match consumer-rated gear with 'professional' gear. (i.e. If you put the K80 in a system with a GTX1080, then the Nvidia drivers restrict the number of available processing cores to 2 per device)
Air-flow: The Tesla's run HOT even with a blower attached, and/or installed in the recommended case.
NVENC: the Pascal-based cards performance is incredibly faster AND better then the Kepler-based cards.
For anybody else doing Video encoding work: Grab an Nvidia TK1/jetson dev-kit. This little card is a MONSTER and can handle everything we throw at it without breaking a sweat.
> Do not mix & match consumer-rated gear with 'professional' gear. (i.e. If you put the K80 in a system with a GTX1080, then the Nvidia drivers restrict the number of available processing cores to 2 per device)
Huh? Not sure what exactly do you mean by "number of processing cores"?
I use two development boxes on a regular basis with Teslas side-by-side with GeForce cards and they all work just fine.
I was unintentionally vague. I should have said 'output'.
At the bottom of this post I have copy/pasted output from my original tests.
That was not CUDA, the task I was working on specifically (and only) used the NVENC encoder (via ffmpeg). I don't know if the situation has changed but these were my observations.
All of my tests were done in 2015, so the situation might be different now.
The k80 could output upto 4 "streams" (aka outputs or threads) at once. A 780Ti can only do 2.
According to nvidia-smi the K80 "appears" to be 2 GPU's on one card. You can actually designate which GPU you want to process ffmpeg streams on.
As soon as you had both devices installed in the same PC, the Nvidia drivers disabled the output of the K80 so that it too would only output upto 2 streams per GPU.
IIRC, there was even a status message that got displayed when installing the Nvidia binary blob:
paraphrasing from memory from 3 years ago
Warning consumer card detected. Limiting available GPU's
Here is a copy/paste dump of my findings at that time.
(The formatting is screwy with the nvidia-smi optput.)
If you're measuring cores per $ or GFLOP (single precision or double precision) per $ the ordering (decreasing) is the same: 1080 ti, 1080, 1070. Tesla wins on GFLOP (double precision) per $. At least according to current lowest newegg prices. It is all quite close though.
I may be totally wrong here, but from what I remember the Titan line handles up to double precision floats whereas the gtx line handles only single? Games don't really need the double precision so it's overkill for gtx. Can anyone confirm?
On the other hand, video cards meant for games generally can't take sustained load. By that I mean you can't run them at 90-100% load for days on end in e.g. a render farm, they invariably melt. You're paying for better build quality and for having enough money for a render farm.
edit: The Tesla K20 is also in competition in my view (despite the much higher cost) due to its focus on higher double-precision performance.