Hacker News new | ask | show | jobs
by heisig 1074 days ago
Petalisp author here. I apologize that the README is somewhat lacking, but it wasn't me who posted this on HN.

As you may have seen on the commit history, a lot of exciting things have happened over the last few months. However, there are also a few stupid performance bugs left, so I am delaying the release of any performance numbers until those have been fixed. Otherwise, I fear that people will simply misinterpret the results.

Nevertheless, I can already state that the single-core performance of Petalisp programs is exactly like that of a C program - simply because Petalisp compiles its programs to C when possible. (Although I also have that long-term agenda of using sb-simd to reach that performance in pure CL some day.)

In addition, Petalisp is quite good at automatically parallelizing programs, and we already have most of the infrastructure for distributed and heterogeneous computing in place.

I will write a more detailed post for the HN crowd once I have reliable performance numbers and once I finished writing the documentation.

Feel free to ask me further questions.

2 comments

Thank you for responding. I wrongfully assumed that this was posted by yourself, if I had taken the time to understand that it was posted by someone else I wouldn't have made such a remark.

There's a trend where projects are marketed as 'blazingly fast', yet when benchmarked fall several orders of magnitude behind near optimal solutions. It's a knee jerk reaction of mine to call out when statements around performance are made without benchmarks.

Looking forward to seeing more about this project! I learned to program with CL way back when, makes me happy to see when people are working on big projects in it. Cheers!

This would be an exciting read. Im cretainly looking forward to it. Is there a read me for the required infrastructure setup in the works?
I am not sure what you mean by "required infrastructure setup". Installing Petalisp is a single call to (ql:quickload :petalisp) - assuming you have Quicklisp installed. If you also have a C compiler available, and an executable named cc pointing to that compiler, Petalisp will use that to speed up your codes further.

About the required hardware - anything that runs SBCL or CCL can also run Petalisp.

Sorry I should have been more specific. I meant a distributed computing system setup
Thanks for clarifying. I will definitely write down the specifics of setting up distributed computing once it works. However, support for distributed computing will still take some time. The current step is to iron out all the remaining issues of parallelizing within one CPU socket.
I'm very curious why you've chosen to go down the route of special casing on parallelizing within one socket, which I take to mean some form of shared memory paradigm?

To the best of my knowledge, shared memory approaches have been mostly abandoned in the HPC community. It seems none of the codes that went hybrid MPI+OpenMP for example, ever saw substantial performance benefit over pure MPI. At least not enough to justify the increased code complexity. If you search for "hybrid MPI/OpenMP" on Google Scholar you'll see most results are 10-20 years old.

Part of the reason for this is that on modern CPU cores with the amount of cache available, you typically want to keep at least something like 200 000 degrees of freedom per core. That's e.g. a 36^3 grid for u,v,w,p if you're doing fluid mechanics. Then the amount to communicate per core is just 8% of the total data. Furthermore you can easily do other work like compute auxiliary variables while you are waiting on communication.

I will also say that it feels a bit weird to call something "peta-" and "HPC" if using more than one socket is relatively far off into the future. For the randomly-wandering PhD students out there, it would be nice to tell them this up front in the Readme :)

> I'm very curious why you've chosen to go down the route of special casing on parallelizing within one socket

This is no special casing. Most of that code will also be used for the distributed parallelization. I agree with your remarks on hybrid MPI+OpenMP, and, in fact, Petalisp doesn't use shared memory anywhere, but always generates ghost layers and explicit communication.

> I will also say that it feels a bit weird to call something "peta-" and "HPC" if using more than one socket is relatively far off into the future. For the randomly-wandering PhD students out there, it would be nice to tell them this up front in the Readme :)

I can do that. But let me explain the rationale for naming Petalisp this way: I wanted to create a programming language that is novel, and that has the potential to scale to petaflop systems. And I wanted to create a robust implementation of that language. I think I have achieved the former part, but the latter simply takes time.

Final remark: Good HPC practice is always to get single core performance right, then getting multicore performance right, and then scaling up to multiple nodes. Anything else is an enormous waste of electricity.