| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by praeclarum 1126 days ago
	Yeah this is the trick. You need to maximize the use of workgroup parallelism and also lay things out in memory for those kernels to access efficiently. It’s a bit of a balancing act and I’ll be working on benchmarks to test out different strategies.