| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jcbeard 3517 days ago
	How would something like this differ from something like Sandia National Lab's Qthreads (http://www.cs.sandia.gov/qthreads/)? Seems it's a tried and true solution in C that also works with C++11 (committed a test case for C++11 myself)...It is also an optional underpinning for some relatively big-name frameworks like Kokkos, Chapel, RaftLib, etc.

1 comments

saman_b 3517 days ago

Thanks for mentioning this, it is indeed very related and very interesting. I am not sure why not me or people around me were aware of Qthread, it has very good support for various architectures and provides many interesting features. It has many similarities with I have in mind for a concurrent library, and even some research goals seems to be very close to mine. Specially the notion of affinity and locality is what I am focusing on in uThreads. I am going through the papers and the source code at the moment to see what are the similarities and differences.

uThreads is still a work in progress and I have specific plans for it in the future that might differ of what Qthreads is trying to accomplish. For now my focus is more on providing auto tuning of Clusters based on the workload. I also will try to explore the pros and cons of uThread migration based on Cluster placement (NUMA and cache locality), and from the 2008 paper it seems that it is what you are trying to study as well. I am open to collaboration if there is an ongoing study around this topic.

link

jcbeard 3517 days ago

ooh, ok. you might also check out openshmem (http://openshmem.org/site/) and openucx (http://www.openucx.org). I'd been planning on integrating both in RaftLib just not enough cycles to get it done yet. These combined would make it much easier to maintain a relatively portable yet performant back end. My thesis research was all about locality, memory placement, and throughput for big data systems. Current work is similar but I'm not neck deep in the hardware dev world. There are current research efforts on my part outside of work, most center around the raftlib.io platform. Before I forget, you might also want to check out the graph partitioning frameworks like metis and scotch...both are used in some MPI frameworks for more optimal partitioning. To get topology data you might want to look at the hwloc framework, it's cross platform and provides input for things like NUMA/cache/PCIe topology for optimization. I haven't had a chance to integrate this hook into RaftLib, however it's just a few lines away once I find the time. If you're wondering...I started out writing my own fiber library for RaftLib. Had ports for both IBM Power and Intel, but it gets a bit tiring maintaining/optimizing for every new architecture. Qthreads and the like have been used in HPC circles for quite awhile, so it made sense. There was no way I was beating them for dev time, so might as well join them.

Based on your auto-tuning discussion...RaftLib aims to do something similar, but for big-data stream processing workloads. Here's my 2016 IJHPCA paper: http://hpc.sagepub.com/content/early/2016/10/18/109434201667...

It looks like it's behind a paywall so if you don't have access I'll update my website with the "author archive" copy sometime today...will be at ( http://jonathanbeard.io/media ) once I update it. Bottom line if there's intersected interest, definitely open to collaboration :).

link

saman_b 3516 days ago

Thanks, so much information to absorb, let me go through all this and get back to you. I shoot you an email when I processed all this :)

link