| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sxp 3360 days ago
	For comparison, the 1080Ti is ~11.3TFLOPS + 11GB RAM @ $700 vs the Titan at ~12.1TFLOPS + 12GB RAM @ $1200. ~9% more performance for 70% more money. https://en.wikipedia.org/wiki/GeForce_10_series#GeForce_10_....

5 comments

tanderson92 3360 days ago

I have been evaluating buying a 1080Ti recently; it appears to also have the highest cores per $. Is the 1080Ti really then the most efficient card for general purpose HPC work on Linux (numerics)? The 1080 is a step down but also competitively priced. Curious about your thoughts, as I couldn't find a guide on this stuff on the web.

edit: The Tesla K20 is also in competition in my view (despite the much higher cost) due to its focus on higher double-precision performance.

VA3FXP 3360 days ago

We do a lot of work on video encoding. We have had a K80, Titan X(Maxwell), Titan X(Pascal), 1080, 1080Ti, and others (including render-farms based on GTX980's).

General thoughts: Don't expect to get _any_ information out of NVidia unless you are running everything on their hardware compatibility lists (i.e. server-case) Do not mix & match consumer-rated gear with 'professional' gear. (i.e. If you put the K80 in a system with a GTX1080, then the Nvidia drivers restrict the number of available processing cores to 2 per device)

Air-flow: The Tesla's run HOT even with a blower attached, and/or installed in the recommended case.

NVENC: the Pascal-based cards performance is incredibly faster AND better then the Kepler-based cards.

For anybody else doing Video encoding work: Grab an Nvidia TK1/jetson dev-kit. This little card is a MONSTER and can handle everything we throw at it without breaking a sweat.

slizard 3359 days ago

> Do not mix & match consumer-rated gear with 'professional' gear. (i.e. If you put the K80 in a system with a GTX1080, then the Nvidia drivers restrict the number of available processing cores to 2 per device)

Huh? Not sure what exactly do you mean by "number of processing cores"?

I use two development boxes on a regular basis with Teslas side-by-side with GeForce cards and they all work just fine.

jamesfmilne 3359 days ago

The NVENC SDK limits the number of separate H264 video streams you can encode simultaneously to 2 if you have _any_ Geforce hardware in your system.

VA3FXP 3359 days ago

I was unintentionally vague. I should have said 'output'. At the bottom of this post I have copy/pasted output from my original tests.

That was not CUDA, the task I was working on specifically (and only) used the NVENC encoder (via ffmpeg). I don't know if the situation has changed but these were my observations.

All of my tests were done in 2015, so the situation might be different now.

The k80 could output upto 4 "streams" (aka outputs or threads) at once. A 780Ti can only do 2. According to nvidia-smi the K80 "appears" to be 2 GPU's on one card. You can actually designate which GPU you want to process ffmpeg streams on.

As soon as you had both devices installed in the same PC, the Nvidia drivers disabled the output of the K80 so that it too would only output upto 2 streams per GPU.

IIRC, there was even a status message that got displayed when installing the Nvidia binary blob:

paraphrasing from memory from 3 years ago

Warning consumer card detected. Limiting available GPU's

Here is a copy/paste dump of my findings at that time. (The formatting is screwy with the nvidia-smi optput.)

=====================================================

Four threads running this:

ffmpeg -i 1784457.mp4 -c:v nvenc -c:a aac -strict experimental -gpu 0 -b:v 21700k -b:a 128k -y delete_me<#>.mp4

gives us ~5-6fps

and uses 3105MiB / 11519MiB of GPU RAM (755MiB for each thread)

------------------------------------------------------------

One thread running this:

ffmpeg -i 1784457.mp4 -c:v nvenc -c:a aac -strict experimental -gpu 0 -b:v 21700k -b:a 128k -y delete_me1.mp4

gives us ~16-18fps

and uses 755MiB of GPU RAM

------------------------------------------------------------

Four threads spread out using both 'GPUs':

ffmpeg -i 1784457.mp4 -c:v nvenc -c:a aac -strict experimental -gpu 0 -b:v 21700k -b:a 128k -y delete_me1.mp4

ffmpeg -i 1784457.mp4 -c:v nvenc -c:a aac -strict experimental -gpu 0 -b:v 21700k -b:a 128k -y delete_me2.mp4

ffmpeg -i 1784457.mp4 -c:v nvenc -c:a aac -strict experimental -gpu 1 -b:v 21700k -b:a 128k -y delete_me3.mp4

ffmpeg -i 1784457.mp4 -c:v nvenc -c:a aac -strict experimental -gpu 1 -b:v 21700k -b:a 128k -y delete_me4.mp4

gives us ~11fps

nvidia-smi results:

Fri May 1 12:38:21 2015 +------------------------------------------------------+ | NVIDIA-SMI 346.46 Driver Version: 346.46 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 0000:03:00.0 Off | 0 | | N/A 69C P0 67W / 149W | 1581MiB / 11519MiB | 4% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 Off | 0000:04:00.0 Off | 0 | | N/A 58C P0 78W / 149W | 1581MiB / 11519MiB | 6% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 2472 C ffmpeg 755MiB | | 0 2477 C ffmpeg 755MiB | | 1 2480 C ffmpeg 755MiB | | 1 2483 C ffmpeg 755MiB | +-----------------------------------------------------------------------------+

kkielhofner 3360 days ago

For clarification - is that a Jetson TK1 or TX1?

Nexxxeh 3359 days ago

Or TX2 which is Pascal-based and came out last month?

VA3FXP 3359 days ago

I have seen that the TX2 was recently released. I let our R&D department know about it. (I don't know if they ordered it or not)

VA3FXP 3359 days ago

Nvidia Jetson TX1

jjuhl 3360 days ago

As far as performance per $ goes I actually think the gtx 1070 is the sweet spot currently.

tanderson92 3360 days ago

If you're measuring cores per $ or GFLOP (single precision or double precision) per $ the ordering (decreasing) is the same: 1080 ti, 1080, 1070. Tesla wins on GFLOP (double precision) per $. At least according to current lowest newegg prices. It is all quite close though.

wnevets 3360 days ago

they don't call it the bleeding edge for nothing.

mhermher 3359 days ago

I may be totally wrong here, but from what I remember the Titan line handles up to double precision floats whereas the gtx line handles only single? Games don't really need the double precision so it's overkill for gtx. Can anyone confirm?

p1esk 3360 days ago

Yeah, I kinda hoped the next Titan will have decent FP16 performance...

Asooka 3360 days ago

On the other hand, video cards meant for games generally can't take sustained load. By that I mean you can't run them at 90-100% load for days on end in e.g. a render farm, they invariably melt. You're paying for better build quality and for having enough money for a render farm.

acchow 3360 days ago

[citation needed]