Hacker News new | ask | show | jobs
by gvggf 2555 days ago
This will sound harsh, but it isn’t meant to be.

Scientists code better then you do science.

This is simply a consequence Of a weeding out mechanism for those that have no coding skills. The only ppl who get away with no coding skills are important professor with grad students to do the coding.

This isn’t to say that our skills are great, but a generic programmers (I.e. CS majors) science abilities are approximately zero (common, no thermo in an “eng” undergrad???)

So what can you do?

Since you mentioned science and not engineering, I’d ignore the AI advice. Science needs models based on mechanistic understanding of the underlying phenomena. A model that merely predicts is useful for engineers, not scientists.

“materials simulator (eg how can we get a material having a given set of properties)“

This is already done, but of limited usefulness. First the Mtls simulators are far from perfect. Then there is the problem of actually synthesizing the mtls. These simulations are more typically done to weed out bad candidates.

“No immediate financial return”

Wrong attitude. Only an attitude of “no financial return” helps science. That’s not to say you won’t make money off of it, but that can never be a goal since (true) science advances freely (again see the Gaussian jerk vs. Einstein or Landau - who contr. more?)

Instead, focus on making the programming tools scientists use better, easier to use and GPL. GPL is important because an MIT license by itself allows a scientist to use others work while blocking others (see Gaussian).

For example, making python (or Julia?) better would be one of the most important contributions you could make. The matplotlib guy was deeply mourned in science.

The two cents of a physical sciences researcher who once flirted with the Valley.

6 comments

I actually strongly disagree with the message in the lead-in. All scientists can “code” better than OP can do science, yes, but most woefully lack any software engineering experience. Most postdoc research code would earn a failing grade from an undergraduate software engineering professor, or get themselves fired from a real world programming job. Think spaghetti code, lack of testing or continuous integration, non compliance with standards, etc.

I would say a cheap win for a coder would be to attack some domain where only rough research code exists and make it more reliable, scalable, better documented, interoperable, etc. a complete rewrite is probably required in many cases, but you have the working old version to compare against.

> or get themselves fired from a real world programming job. Think spaghetti code, lack of testing or continuous integration, non compliance with standards, etc.

Perhaps, but doing all those things would get them fired from the job they are currently in.

> matplotlib

And now a problem that would be great to solve is having vectored images and being able to change my fonts on size, lines are too small, I changed the paper format, whatever. I can't express how often I've had to redo plots simply because they don't look right in a paper.

I also cannot express the beauty of LaTeX but the absolute horror it is to create Tikz images. They are beautiful but it is definitely an art that one can never master. I want to do it with code, I don't want dumb gui interfaces that only work on certain machines and never work as expected.

If a nicer version of Tikz could be made that had a lot of power under the hood was created, this would help a lot of people. That's why matplotlib is so great. To do basic things is extremely straightforward. But if you want to do extremely complex things you also have that power. (Even something as simple as matplotlib for LaTeX - which results in vectored images - would be incredibly helpful)

Check out matplotlib's PGF backend: https://matplotlib.org/users/pgf.html
This is pretty nice and I will likely be using it from now on. Thanks.

But I do want something a little more native to latex. The major issue is that sometimes font sizes, axes, titles, even plot thickness doesn't look right in a paper. The issue is when you have a large plot and have to replot to fix these things. But vectored images will help.

> Science needs models based on mechanistic understanding of the underlying phenomena. A model that merely predicts is useful for engineers, not scientists

I'm not sure I agree. I'm aware of quite a bit of supercomputing time that is spent doing lattice QCD calculations (which apparently some scientists find useful), and though I'm no quantum physicist I'm pretty sure there is not much of a "mechanistic understanding" in QCD. I think your claim also doesn't apply to a lot of social science - psychology has a lot of functional models, but I don't think there are many mechanisms described.

I'll also state that modern science that doesn't require any engineering is pretty rare nowadays, so if a predictive model helps engineers that can then help scientists, the model has been helpful to scientists.

Ohm's law existed long before there was a mechanistic description behind it, and though it is mostly used for "engineering," I feel confident that a lot of scientists in the 19th century found it useful.

From https://www.olcf.ornl.gov/leadership-science/physics/:

"New Frontiers for Material Modeling via Machine Learning Techniques" - 40,000 hours allocated on Summit

"Large scale deep neural network optimization for neutrino physics" - 58,000,000 hours allocated on Summit.

Supercomputers typically do not allocate 58 million hours to things which are not useful.

I work with the DOE and was at ORNL before Summit was released (I got to play on Summit-dev). When making these models there is A LOT of exploration happening. There's a whole class of visualization techniques called "in situ" that visualize data as it comes off the press (memory is then dumped because there's neither enough storage space nor can we write to disk fast enough). I'll tell you that there will be a lot of restarting those simulations because the scientists need to explore the data as it is going on. Going in the wrong direction? Made a small mistake that causes cells explode? Realize you're not looking in the right region of interest? You restart the sim (thank god for restart files, right?). Exploration is one of the most important things in research and it is getting more and more difficult. I believe this is what the gp is after. Having these understandings helps you explore the data better. Creating these tools is hard work and takes a lot of collaboration too.
I guess "Mechanistic understanding" was meant to contrast against machine learning, not quantum mechanics. To elaborate, machine learning means fitting of a bunch of data by a given model. In science (eg lattice QCD) one often tries to theoretically (or computationally) explore regimes where data is not yet available. As a (former?) theoretical physicist, I am more than happy to admit that this is not immediately useful, though it will hopefully become useful in the long run.
Thanks, it makes sense.

> focus on making the programming tools scientists use better, easier to use and GPL

That sounds the most natural path to take going forward. Besides looking at existing GPL software and how that can be improved, would you have a recommendation on where to find scientists/researchers open to discussing their needs that could be solved by software? I'll release the software under GPL, but need to know I'm building something useful.

As a computational physicist (meaning I do science, but most of my time is spent programming), I agree with everything the parent said, but perhaps I can add some more specifics. The matplotlib example is a very good one in the sense that it's a piece of vital infrastructure almost everyone has used at some point. It works well enough for performance insensitive (meaning non-realtime and small-ish datasets) 2D visualization, has a lot of features and is easy to use. Other niches are less fortunate - for instance, for general purpose performant 3D visualization there's the bloated monstrosity that is VTK and little else, so I mostly just write OpenGL code by hand. That's annoying, but I haven't found anything that isn't outright terrible.

Other scientists, depending on their interests, will readily give you similar examples of obvious general purpose libraries that are lacking or non-existent, but there's a simple reason for this - it's hard, unrewarding work that's very hard to commercialize. Most of the large scale projects that exist have grown out of academic grants and often struggle for funding or are abandoned entirely. If you are really considering this as a career move that's eventually supposed to put food on the table, you need to have a pretty good idea of how your project is realistically going to earn money, because the correlation between funding and general usefulness is very weak in this space. Since academic funding isn't on the table for you, the common alternative involves things like biotech startups and venture capital.

>> Other scientists, depending on their interests, will readily give you similar examples of obvious general purpose libraries that are lacking or non-existent

I'd love to hear some of these, if folks on the thread can share more. Added 3d visualization to my list...

Symbolic computation libraries for Python are lacking - there's SymPy, but it can throw a "NotImplementedException" at you if you try processing some more hairy formulas... Would be great if SymPy was improved.
Have you looked into SageMath? (Technically it's "Python-based" rather than "just" Python.) I'm also curious if you're specifically looking for something in Python, or if doing symbolic computation/computer algebra in other languages would work for you, barring license costs.
I needed a Python function which computes a (symbolic) derivative of a certain known formula. I could've just computed the derivative in say Mathematica, and then write the Python function manually based on the derived formula, but the derivative function had hundreds of terms and there was no way I'd transpile it manually without introducing errors.

In the end, I used a hacky Mathematica script which converts a resulting Mathematica formula into a Python code (which I then pasted into my program). But, if SymPy was better, I could do all this in just Python.

BTW, according to Wikipedia, SageMath is just using SymPy for calculus.

I agree with this; scientific computing per se is best left to scientists and cannot be effectively done without the proper training. But there is a huge need in science for well-designed software. I'm talking about bread-and-butter CSE topics like basic UI design and documentation. This is where OP should concentrate their efforts in my opinion. A lot of research quality code is shockingly buggy and difficult to run. To give an example, if you are a biologist trying to run a shiny new machine learning method on your data, you are SOL in most cases unless the original authors went out of their way to enable that. For this reason a few PIs, really rich ones with big bio labs and f.u. grant money, employ full-time software developers, but this not possible for most people.
The trend in science is away from GUI's if that is what you are thinking, improved (although far from perfect) programming skills in the sciences and move towards reproducible research has heralded a move away from GUI's, documentation is another matter although that is becoming increasingly automated as well.
> science abilities are approximately zero

I’d say it really depends on your program and what you mean by science. I minored in BioEngineering, I also double majored in math. At least one of my CS final projects has a citation (which I recently discovered after looking at Google scholar).

My point is, what makes “science” skills may not match your expectations, but I’d argue many people have said skill set.

> Only an attitude of “no financial return” helps science.

I also take issue with this. Arguably all financial investments are a way of directing research. All research needs funds. How do we get most of the drugs we have today? It’s typical some research is done publicly, but the last “mile” so to speak, is done by private companies.

> generic programmers

I would argue that most would agree that a "generic programmer" does not have a degree in math or a minor in bioengineering. There are a lot of programmers who never studied any STEM outside of a CS curriculum, which usually has ~no science and rarely requires advanced math (i.e. requires only linear algebra).