Hacker News new | ask | show | jobs
by wvoq 5477 days ago
Hi. Most of the advice in the comments already posted is sound, but none of them seemed to address HPC. Working on a cluster invites an entirely different bundle of conceptual and practical hurdles (e.g. parallelism, working remotely, industrial-strength shell scripting &c.) Even though I had been programming since I was a kid, I found my crash-course in HPC to be quite challenging; confronted with a new programming model in a new low-level language, it was the first time that I really appreciated what it must be like not to know how to program at all.

So: code as much as possible. As soon as you can possibly stand it, look into MPI4py and start parallelizing your code. Chances are you won't be working with python in HPC contexts, but learning parallel programming at the same time as C or fortran would be needlessly difficult. I would also try to get time on a cluster as soon as you're in a position to use it respectably. Most universities with HPC facilities have an online application for an account, and some sysadmin might take pity on you :) Otherwise, maybe Amazon has some kind of deal?

In the interim, become _very_ comfortable with bash and general command line fu, and a serious text editor. Good luck!

1 comments

Do you think that learning about parallelism would help a beginner to learn how to program, or to understand code better? I consider myself an intermediate programmer, with about five years of experience, but have never explored HPC. Not sure it would have helped me more to have done so as a beginner.
I'm not sure, really. In Saeed Dehnadi's article "The camel has two humps", it's mentioned that there are three great filters in programming pedagogy:

* assignment/sequence

* recursion

* concurrency.

Most students never master the first, and most of those never master the second, and so on. In that spirit, I would recommend that the OP firmly grasp the first two with both hands before reaching for the third. Of the languages the OP could study that treat concurrency or parallelism as a kind of conceptual primitive (e.g. Scala or Erlang), none are likely to appear as working languages in an HPC milieu. Almost always, the libraries are bolted on as an afterthought to traditionally popular languages for scientific computing.

I think this is a regrettable pattern, but a pattern which will shape the OP's daily work if they wish to begin (and remain in) a program in computational chemistry.

All I meant to suggest is that when the OP begins to learn parallel computing, they want to be thinking about parallel algorithms, not segfaults or pointer purgatory or the finer points of scp or vi. The sooner the former can be mentally sublimed, the better the OP will feel about HPC.