That's a core that does 4x4 FP16 matrix multiplication + 4x4 FP32 accumulation in one go.
That's where V100 gets its boost, up to 120 TFLOPS.