| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by peterhj 2640 days ago
	If you have an assembler or C compiler you can implement matrix multiplication (GEMM) which usually does most of the heavy lifting in your neural net. Now you correctly alluded that it may not be simple to efficiently implement GEMM but if you have a simple architecture without a complex memory hierarchy then using whatever SIMD facilities are available and some standard tricks will get you in ballpark of peak FLOP/s. Or, just download a fast BLAS from your hardware vendor...