|
|
|
|
|
by ghshephard
3707 days ago
|
|
What I'm trying to figure out, is whether a teraflop directly comparable. That is, on the top 500 list, the first 4.9 Teraflop computer was in 2000, but does that mean that Pascal could provide performance similar to the Supercomputer on the LINPACK benchmark? In describing the benchmark, they say, In an attempt to obtain uniformity across all computers in performance reporting, the algorithm used in solving the system of equations in the benchmark procedure must conform to LU factorization with partial pivoting. In particular, the operation count for the algorithm must be 2/3 n^3 + O(n^2) double precision floating point operations. This excludes the use of a fast matrix multiply algorithm like "Strassen's Method" or algorithms which compute a solution in a precision lower than full precision (64 bit floating point arithmetic) and refine the solution using an iterative approach. So, to summarize, if in 2000 the fastest supercomputer on the planet ran at about 4.9 TFLOPs, does that mean, apples-apples on the LINPACK (and only the LINPACK), that Pascal today would outperform that Supercomputer? |
|
The benchmark is essentially bottlenecked on FP64 matrix multiplies. If that's what you need to do, then sure, it's indicative.
Some machine learning workloads are also bottlenecked on matrix multiply, but don't need FP64 precision. They can use FP16. Fits a bigger model in a given memory size, makes better use of memory bandwidth, and given the right hardware support, you can get extremely high rates as on Pascal.
Personally, I find the memory system on Pascal more interesting than raw flops rate. Also, the use of nvlink to link multiple GPUs..