| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mjan22640 986 days ago
	Math was always a must for a scientist, todays computer science is also a must. The study programmes should reflect that.

2 comments

eesmith 986 days ago

"Math" is such a wide topic that you certainly must qualify your statement.

The standard entomologist curriculum does not require calculus, while a physics curriculum does. Both produce scientists. (For example, https://cals.cornell.edu/education/degrees-programs/entomolo... under "Major Requirements" says "One semester of college statistics or biometry", and the listed physics requirement doesn't require calculus.)

On the other hand, an entomologist interested in population ecology may need to know differential equations.

Your use of "study program" suggests your experience is at the undergrad level, and not at the grad school level, which is how most scientists I know got their training.

At the undergrad level the study programs do reflect what's needed for a solid education. If a student is interested in computational biology, that program will emphasize taking more CS courses than the program for a student interested in marine biology.

But at the grad level, the "study program" is much less formalized. You might take graduate level classes the first couple of years, but then you are expected to pick up the missing bits on your own.

Once you have your PhD and are a working scientist, you rarely have the luxury of following any study program.

And if you've been a scientist for 20 years, any CS training you had likely did not cover SIMD, and emphasized practices which are no longer relevant. (For example, the link points out "That advice [about HDDs] is mostly outdated today [with SSDs]".)

Those latter categories are who the linked-to piece is for, not undergrads in a well-defined study program.

link

AtlasBarfed 986 days ago

The article basically implies that some non-professional coder will be doing assembly and basically doing the work of an optimizing compiler. I think the point of the parent is that if you are at this point already, and you're in an academic setting, you might as well real a full computer architecture textbook front to back.

I would be curious to know of all the "scientific coders" what percentage of them understood the entire article. I'd be similarly curious how much your typical "bootcamp" developers would understand of it. I know everything presented, so it basically comes off as a "lecture notes" for someone that already knows it. Someone that doesn't understand SIMD, CPU fundamentals, assembly, and compilers, I'd imagine their eyes would glaze right when the assembly code appeared.

And while SSDs are MUCH FASTER than HDDs, the basics of interacting with storage is the same, just that rather than waiting a million years for data to arrive from the CPU's perspective, it comes in 10,000s of years.

Latency numbers all programmers should be aware of:

https://gist.github.com/jboner/2841832

link

eesmith 986 days ago

I am a professional coder and my eyes glaze over with assembly. Still, I don't think there's that much assembly. What I saw was to show how the code is implemented, with very little about "doing assembly" outside of a simple example.

I can't judge background - I don't have a sense of who uses Julia, and I've been programming for too long, without exposure to the target audience.

Since you mentioned "academic setting", I'll point out there are also scientists-who-program in industrial settings. However, none of the ones I know about use Julia.

My belief is that most scientists-who-program aren't going to read text books from other fields. They are under pressure to produce NOW, and don't think it's worth the time to acquire an entirely new mindset. Instead, I think this sort of knowledge transfer is by jerks and fits, as someone figures out an optimization, and passes it along, with domain-specific context that makes it easier for others in the field to understand.

Which means, like you, I don't think this notebook will be all that useful, though in my case that's because I think it's too generic.

> what percentage of them understood the entire article

I don't think that's a telling metric. Only some scientific coders are interested in writing fast code (vs. fast-enough code), and only some of those use Julia.

link

th0ma5 986 days ago

I agree with this sentiment, like the majority of CS people are telling statisticians that a lot of Julia remains a kind of snake oil or otherwise mystical thinking, it is very unfortunate. Even in the first page of the documentation "No need to vectorize code for performance; devectorized code is fast" is some kind of category error redefinition of how programming languages work in my opinion.

link

newswasboring 986 days ago

> Even in the first page of the documentation "No need to vectorize code for performance; devectorized code is fast" is some kind of category error redefinition of how programming languages work in my opinion.

Can you elaborate a bit? I don't really get what you are trying to say.

link

th0ma5 986 days ago

If the code can easily be vectorized then it has the potential to vectorize it incorrectly or there is some automation happening that is hidden. If they're just saying their non-vectorized operations are just as quick, then how quick could true vectorization be. Also, this is how Octave, NumPy, Matlab, R, etc work by making vectorized math operations happen with whole matrices using statements that look like simple non-vector operations. Further, usually when people are having these kinds of issues it's because they started with a non-parallelizable concept of their problem in our trying to redo it... And no amount of magic is going to fix a bad concept of the problem space.

link

adgjlsfhk1 986 days ago

I think the problem here is that there are two very different meanings of "vectorized" at play. The first (and what the Julia docs are talking about here) is the pattern of writing "vector-operations" (i.e. rather than writing a loop, writing an expression that works across an entire array). The second meaning is using SIMD instructions (e.g. AVX2). What the julia docs are trying to say is unlike in languages like Python/R/Matlab where loops have a high overhead do to an interpreter, in Julia, loops are fast because the language is compiled. There are lots of algorithms that are easy to express in an iterative fashion that are pretty much impossible to vectorize efficiently (dataflow analyais/differential equations etc).

The docs here aren't trying to talk about SIMD instructions at all here. (although Julia/LLVM are pretty good at producing SIMD instructions from loops where possible).

link

th0ma5 986 days ago

But no one in science uses Python loops for example, they use NumPy / Jax / Polars etc so that is an unfair and disingenuous comparison.

link

adgjlsfhk1 986 days ago

That's exactly the point. People are changing their coding style to work around the fact that loops in the language are slow.

link