| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jacquesm 6060 days ago

It's going to be really hard to graft that on there given the fact that a lot of the computational horsepower is directly related to the bandwidth to the 'local' memory store. That would mean that the local memory store somehow has to be turned in to a cache that stays coherent across many 100's of processing units.

I'm not sure that's impossible, it just seems very hard.

If nvidia manages to crack that nut then the only thing you'll still need to keep in mind is how big your cache footprint is (as on every other cpu with a cache) in order to maximize throughput.

1 comments

andrewcooke 6060 days ago

my original comment (about unified address space) was poorly thought out (it's not clear how much fermi will help, and how much is down to opencl being "cross-platform"). but the ideas isn't that you no longer need to care about the memory hierarchy; only that pointers can be expected to work correctly. currently (particularly in opencl) there are various restrictions on pointers that make some code more complex than it needs to be. for example, you can only allocate 1/4 of the memory in a single chunk, and pointers are local to chunks, so patching together chunks of memory to get one large array is messy.

jacquesm 6060 days ago

Ok, I see what you're getting at now.

That would definitely be a good thing.

I've spent in total about 2 months now (spread out over the last year) understanding how this whole GPGPU thing fits in with the rest of computing, it is much like a specialty tool. It is harder to master, more work to get it right once you have mastered it, subject to change on shorter notice than most other solutions (because of the close tie to the hardware) but if you need it, you need it bad and the pay-off is tremendous.