| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dr_zoidberg 3753 days ago

I agree on the memory model being the most interesting thing about this card. I sort of "under-sold" it on the "better design" part of my last bullet.

People/manufacturers tend to look at clock rates, fill rates (for GPUs), FLOPs, "crunching power" in general, forgetting completely the memory part. For example, today most CPUs end up being bound by cache sizes and performance tuning focuses on being nice on the cache rather than being optimal in your instructions (see for example Abrash's Pixomatic articles[0-2], which are about high performance assembly programming in "modern environments").

With GPU and "classic" HPC (don't know about the new systems with the "compute fabric interconnects"), memory usually becomes the bottleneck (except for embarrasingly paralell problems, of course). In fact, I'm pretty it was Cray who said that a supercomputer is a way to turn a CPU-bound problem into an IO-bound problem.

[0] http://www.drdobbs.com/architecture-and-design/optimizing-pi...

[1] http://www.drdobbs.com/optimizing-pixomatic-for-modern-x86-p...

[2] http://www.drdobbs.com/optimizing-pixomatic-for-modern-x86-p...