| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by iskander 5524 days ago

>Olofsson has a new idea - or, specifically, a variation on an old one...it was common for a central processor to have a 'math co-processor' chip alongside it - a secondary processor which was designed specifically to carry out floating point arithmetic at speeds significantly faster than the main processor

This is exactly how people are currently using GPUs right now. How is this architecture better than a Fermi?

>"A guy straight out of college who's done a course in C programming can take a program and run it on our machine. There's no new constructs to run - you can take a program with legacy code and run it straight out of the box on our machine, and you can't do that on GPU."

If they're using only static compilation, this is very unlikely to be true. A few thousand Ph.D. theses have been sunk into parallelizing imperative programs. Despite the accumulation of sophisticated compiler techniques, it doesn't really work without extensive annotations and cooperation from the programmer. The programmer often ruins potential parallelism by accidentally creating dependencies between loop iterations. Even when analyzing ideal code, the program text doesn't contain sufficient information about the data size to create a good partition.

However, there's some small chance this isn't empty hype and they've actually made some cool breakthrough in runtime parallelization of imperative code. In that case, though, why would they be hyping vaporous hardware rather than just applying their fancy JIT compiler to existing multicore systems?

2 comments

sedachv 5524 days ago

"A few thousand Ph.D. theses have been sunk into parallelizing imperative programs."

This waste of talent continues to piss me off to no end. Why would people willingly spend time on this problem?

link

soundsop 5523 days ago

Because the reward of a breakthrough is extremely high.

link

DanWaterworth 5524 days ago

I too doubt that they have any thing new with regard to parallelization of imperative programs. If the only way to utilize all cores is either by writing functionally or by manual parallelization then they don't really have a significant advantage over FPGA coprocessors. I do agree with them however when they say that this approach is better than creating lots of general purpose cores.

link

DanWaterworth 5524 days ago

What no one has mentioned is that 4000 stacks is a lot of memory.

link

Scaevolus 5524 days ago

Not if they're done as split/segmented stacks (http://gcc.gnu.org/wiki/SplitStacks)-- basically, you have a collection of 4KB stack pages for each thread instead of one large up-front allocation, and you grow it as needed. It costs a few instructions per function entry/exit, but overall cost is negligible and it allows you to run thousands of coroutines without issues.

link

DanWaterworth 5524 days ago

If you allocate the stacks contiguously using mmap then memory is only used as it is accessed. That's not the problem. The problem is that 4000 concurrent non-trivial threads is a resource hog no matter how the stacks are allocated.

link