Hacker News new | ask | show | jobs
by smaddox 3622 days ago
Yeah, something like this is very much needed, but it's not the hard part. The software is the hard part. The software is the reason we have the multiple levels of cache we have now. Without solving the software challenges, there can be no challenger for the existing architectures.

It's interesting to note that convolutional neural nets (CNNs) are one solution to the software challenge. It's an imperfect solution, in the sense that CNNs are not as general purpose (at the same efficiency) and have strict data requirements for training, but it is a solution, and the big N are investing heavily to the point of designing ASICs.

Eventually, though, we need to solve the software problem. That will require rethinking programming languages.

2 comments

Having written programs for this iteration of the REX Neo architecture, the architecture is not so dramatically different that programming languages will have to be rewritten. I'm not the smartest programmer in the world and I was able to figure out the assembly language fairly easily.

Some concepts, like how to manage concurrent data processing and thread communications, need to be handled carefully, but that's more at the level of 'standard library' than the compiler. There is a clear pathway to getting C working on the architecture, and a reasonable direction (that will need some fleshing out) to getting performance-enhancing optimization of something like LLVM IR.

I wouldn't expect the assembly language level to be too far off from the common paradigms. Where I'd expect the software challenges to be would be in managing large amounts of memory, if the application programmer must manage shuffling data between the local scratchpad, specific locations in foreign scratchpads that must be (manually?) DMA'd around, and DRAMs.
Our whole goal, as talked about in the software section of our website (and the ACM paper linked in it), is to have the scratchpads be entirely automated by our toolchain. While we want to allow for especially adventurous programmers to have full freedom with the scratchpads, existing and future programs written in C/C++/other languages supported in the future will handle memory allocation identically (from the programmers perspective) as existing architectures.

One other thing to point out is that our actually addressing of a cores local scratchpad, as well as "foreign" scratchpads of other cores on the same chip and/or any other attached chip is handled exactly the same. All memory operations are handled through the exact same load/store instructions as part of a global flat address map that is the same for all cores in a system (one or multiple chips interconnected).

True. I shoulda added "so to speak", since this is a still more extreme approach and might simply break any compiler/language combination we have, as you say.
While we have been exploring some ideas on how to have better programming approaches to address the unique features of our architecture, we have from the beginning though that we would be required to have some level of portability for existing applications. As of right now, we support standard C/C++ that runs through our Clang+LLVM backend, with the ability to support any language that has a LLVM frontend.

Personally, I find the actor model to be the easiest existing way to take advantage of things like our network on chip and having hard time guarantees on memory movement. That being said, right now our focus is on C and C++ along with our API and custom library ports.