Hacker News new | ask | show | jobs
by sxp 3360 days ago
For comparison, the 1080Ti is ~11.3TFLOPS + 11GB RAM @ $700 vs the Titan at ~12.1TFLOPS + 12GB RAM @ $1200. ~9% more performance for 70% more money.

https://en.wikipedia.org/wiki/GeForce_10_series#GeForce_10_....

5 comments

I have been evaluating buying a 1080Ti recently; it appears to also have the highest cores per $. Is the 1080Ti really then the most efficient card for general purpose HPC work on Linux (numerics)? The 1080 is a step down but also competitively priced. Curious about your thoughts, as I couldn't find a guide on this stuff on the web.

edit: The Tesla K20 is also in competition in my view (despite the much higher cost) due to its focus on higher double-precision performance.

We do a lot of work on video encoding. We have had a K80, Titan X(Maxwell), Titan X(Pascal), 1080, 1080Ti, and others (including render-farms based on GTX980's).

General thoughts: Don't expect to get _any_ information out of NVidia unless you are running everything on their hardware compatibility lists (i.e. server-case) Do not mix & match consumer-rated gear with 'professional' gear. (i.e. If you put the K80 in a system with a GTX1080, then the Nvidia drivers restrict the number of available processing cores to 2 per device)

Air-flow: The Tesla's run HOT even with a blower attached, and/or installed in the recommended case.

NVENC: the Pascal-based cards performance is incredibly faster AND better then the Kepler-based cards.

For anybody else doing Video encoding work: Grab an Nvidia TK1/jetson dev-kit. This little card is a MONSTER and can handle everything we throw at it without breaking a sweat.

> Do not mix & match consumer-rated gear with 'professional' gear. (i.e. If you put the K80 in a system with a GTX1080, then the Nvidia drivers restrict the number of available processing cores to 2 per device)

Huh? Not sure what exactly do you mean by "number of processing cores"?

I use two development boxes on a regular basis with Teslas side-by-side with GeForce cards and they all work just fine.

The NVENC SDK limits the number of separate H264 video streams you can encode simultaneously to 2 if you have _any_ Geforce hardware in your system.
I was unintentionally vague. I should have said 'output'. At the bottom of this post I have copy/pasted output from my original tests.

That was not CUDA, the task I was working on specifically (and only) used the NVENC encoder (via ffmpeg). I don't know if the situation has changed but these were my observations.

All of my tests were done in 2015, so the situation might be different now.

The k80 could output upto 4 "streams" (aka outputs or threads) at once. A 780Ti can only do 2. According to nvidia-smi the K80 "appears" to be 2 GPU's on one card. You can actually designate which GPU you want to process ffmpeg streams on.

As soon as you had both devices installed in the same PC, the Nvidia drivers disabled the output of the K80 so that it too would only output upto 2 streams per GPU.

IIRC, there was even a status message that got displayed when installing the Nvidia binary blob:

paraphrasing from memory from 3 years ago

Warning consumer card detected. Limiting available GPU's

Here is a copy/paste dump of my findings at that time. (The formatting is screwy with the nvidia-smi optput.)

=====================================================

Four threads running this:

ffmpeg -i 1784457.mp4 -c:v nvenc -c:a aac -strict experimental -gpu 0 -b:v 21700k -b:a 128k -y delete_me<#>.mp4

gives us ~5-6fps

and uses 3105MiB / 11519MiB of GPU RAM (755MiB for each thread)

------------------------------------------------------------

One thread running this:

ffmpeg -i 1784457.mp4 -c:v nvenc -c:a aac -strict experimental -gpu 0 -b:v 21700k -b:a 128k -y delete_me1.mp4

gives us ~16-18fps

and uses 755MiB of GPU RAM

------------------------------------------------------------

Four threads spread out using both 'GPUs':

ffmpeg -i 1784457.mp4 -c:v nvenc -c:a aac -strict experimental -gpu 0 -b:v 21700k -b:a 128k -y delete_me1.mp4

ffmpeg -i 1784457.mp4 -c:v nvenc -c:a aac -strict experimental -gpu 0 -b:v 21700k -b:a 128k -y delete_me2.mp4

ffmpeg -i 1784457.mp4 -c:v nvenc -c:a aac -strict experimental -gpu 1 -b:v 21700k -b:a 128k -y delete_me3.mp4

ffmpeg -i 1784457.mp4 -c:v nvenc -c:a aac -strict experimental -gpu 1 -b:v 21700k -b:a 128k -y delete_me4.mp4

gives us ~11fps

nvidia-smi results:

Fri May 1 12:38:21 2015 +------------------------------------------------------+ | NVIDIA-SMI 346.46 Driver Version: 346.46 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 0000:03:00.0 Off | 0 | | N/A 69C P0 67W / 149W | 1581MiB / 11519MiB | 4% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 Off | 0000:04:00.0 Off | 0 | | N/A 58C P0 78W / 149W | 1581MiB / 11519MiB | 6% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 2472 C ffmpeg 755MiB | | 0 2477 C ffmpeg 755MiB | | 1 2480 C ffmpeg 755MiB | | 1 2483 C ffmpeg 755MiB | +-----------------------------------------------------------------------------+

For clarification - is that a Jetson TK1 or TX1?
Or TX2 which is Pascal-based and came out last month?
I have seen that the TX2 was recently released. I let our R&D department know about it. (I don't know if they ordered it or not)
Nvidia Jetson TX1
As far as performance per $ goes I actually think the gtx 1070 is the sweet spot currently.
If you're measuring cores per $ or GFLOP (single precision or double precision) per $ the ordering (decreasing) is the same: 1080 ti, 1080, 1070. Tesla wins on GFLOP (double precision) per $. At least according to current lowest newegg prices. It is all quite close though.
they don't call it the bleeding edge for nothing.
I may be totally wrong here, but from what I remember the Titan line handles up to double precision floats whereas the gtx line handles only single? Games don't really need the double precision so it's overkill for gtx. Can anyone confirm?
Yeah, I kinda hoped the next Titan will have decent FP16 performance...
On the other hand, video cards meant for games generally can't take sustained load. By that I mean you can't run them at 90-100% load for days on end in e.g. a render farm, they invariably melt. You're paying for better build quality and for having enough money for a render farm.
[citation needed]