Hacker News new | ask | show | jobs
by enriquto 1702 days ago
I for one welcome the "dryness" of mathematical writing. It feels clean, like reading a story without distracting ads.

A beautiful advice that I received as a student was to write mathematics as a series of definitions, propositions and proofs. No text is allowed to exist outside of these three. In practice it is difficult to enforce, but it is helpful to keep this as an aim.

4 comments

> A beautiful advice …

Oh my goodness! So there are actual people who prefer that? I had to chew my way a fair share of books like that and I can’t stand them. Clearly the person who thought of these definitons, propositions and proofs had some reason to think of them. Sometimes they were trying to solve a problem, sometimes they were combining ideas, sometimes they were looking for structures with certain aesthetic properties. There was always a why behind why they thought about this and not something else. Sometimes there are multiple possible such reasons, that is fine. In that case the author can select whichever they fancy the most. Every time i had the misfortune to read a book like what you describe it felt like I was eating powdered milk without reconstituting it. There was a clear chain of thought between ideas and they choose to just hide it.

I understand that math requires work. One needs to get a paper and a pencil and work out examples, check proofs, play with definitions. But why wouldn’t the author write down what made them care about the next item?

> it is helpful to keep this as an aim.

Why?

Trying to study a very abstract concept without any examples is almost useless. There is, perhaps, some philosophical sense in which we could only be writing down mathematics for god to read, but humans searching the proposition space necessarily move heuristically. Formal generalization is fine (and even great) for working with objects, but pretty terrible for recognizing them. For example, (and also meta-example), most people do not need to spend a lot of time convincing themselves that the pigeonhole principle is true, at least in the finite case, but show most people a problem that it solves, and they will be unable to do so.
As a non-mathematician this seems to be lacking a little something. Would you not want your paper to provide some indication of what you're attempting to communicate and why? Or is that information (as the joke goes) possible, and therefore trivial, to infer?
A great deal of maths is that you get to use theorems for purposes that they were not intended for. Presenting the theorem in a "pure" form thus allows to approach it without pre-conceptions.

I love the analogy with cooking recipes in another comment. Math papers are like recipe books, deliberately devoid of their social context. The same recipe may mean different things to different cooks, even contradictory! Having a clean, neutral description of the recipe allows both cooks to safely refer to the exact same recipe, without endorsing contexts that they may find odious.

I agree that knowing the context in which a recipe was created, and the contexts where it has been used, is very useful. But it would be extremely annoying to have this explanation interleaved with the recipe description itself. This information is best kept separate. Then, at the beginning of the recipe you can have a list of links to cooks who have written different things about it.

> A great deal of maths is that you get to use theorems for purposes that they were not intended for. Presenting the theorem in a "pure" form thus allows to approach it without pre-conceptions.

Plenty of physics papers end up being useful in ways the authors never conceived. I don't think the authors writing down their best guess of the importance of their results, prevents others from using the results or techniques in alternate ways. In fact, I frequently see this happening.

But the authors writing down their best guess of the importance of their results helps others judge the minimum importance of the results, and decide whether they want to read the paper in the first place.

> But the authors writing down their best guess of the importance of their results helps others judge the minimum importance of the results, and decide whether they want to read the paper in the first place.

In many cases it will be unlikely that the authors even know the importance of the results. Forcing them to come up with an explanation would be useless at best, and an unbearable burden at worst (e.g., when a student has proved a theorem proposed by his advisor).

There are mathematical reviews, where mathematicians with a higher-level view on the field point to particular papers and explain their importance. Also, many journals have an editorial column that explains the papers on each issue.

While I was teaching c++ I would sometimes look at programs from a colleague and try to rewrite them in a more teachable form using c++14.

The abstraction capabilities in modern c++ made it more fun and allowed me to condense programs in totally unexpected ways. More abstract code was simpler and sometimes easier to understand.

But after a while (usually my version 4 or 5) if I abstracted it too much (Templatized it, made work for Unicode or char types) etc. the complexity would shoot up again. It became incomprehensible even to me even though I had written it. I knew it worked because it gave the same result but it was no longer anchored/rooted in anything real.

There is a zone of usefulness that is between the completely abstract and the concrete. Physics lives in this zone…it uses mathematics but they don’t feel the need to generalize to N dimensions or to assume the constants of nature are variable—unless they have to.

This is why all good descriptions of mathematics start by showing a concrete problem they want to solve. Gauss invented the FFT algorithm to simplify his calculations of the orbit of Ceres. He had the numbers in front of him and tried to reduce his computational workload by exploiting a repeated pattern in the computations. Teaching the FFT as a fait acompli and showing asteroid orbit as a sample application is ass backwards.

Reminds me of my general principle of the best solutions being "square" in the sense that the effort you expend on them is quite similar on any axis. By the time you're past abstraction's diminishing returns for the given problem, you're spending way more effort on that than anything else (which is understandable - it's fun!) and your efforts aren't "square" any more.

(A toy example of the 'square' thing - imagine you have to make 10,000 widgets for 1 dollar each. Making them all by hand for $1 without optimizing the process would be inefficient. So would spending $9,999 to build a machine which could make widgets for $0.0001. But spending $100 to optimize production so you can make the widgets for $0.01 each is a massive win ($200 cost vs. $10,000 for the other alternatives.)

That is an interesting thought. What’s it an example of? Convex optimization?
As a mathematician, let me assure you that the point of view in that comment is not popular amongst mathematicians.

You are absolutely right, context is king.

I'm sorry, but this is a horrible advice. It sacrifices communication for the sake of enforcing an arbitrary aesthetic.

Until mid-20th century, mathematics had never been communicated in this austere manner.

When you are a working mathematician, you never start with a definition. You start with a context in which your exploration begins.

It might be a question which someone else asked that interests you (and there is a story as to why). It might be that you don't even have a question, but the objects of your study are not studied enough, so you hope you stumble into one. You can easily tell why this is an interesting thing to look at.

You do some calculations, take a look at a few examples, see if you can make a stab at the chaos in front of your eyes and find a pattern, which we formally call a conjecture.

Then you see if this pattern holds in other cases, and why. This shapes the backbone of the proof.

Once you have a basic idea of a proof, you can formulate a theorem. In the formulation, you list all the conditions to which your proof applies. The pattern might be much more general, but your proof might work e.g. "only in cases when the order of the group is invertible in the base field", or some other.

After all the work is effectively done, you decide that a concept that appears repeatedly in your reasoning deserves to have a name. Something convenient to call it by, so you don't have to repeat yourself. So you make a definition.

You then decided to share your joy with this world, and write a paper.

You listen to the "beautiful advice", and throw out anything that makes your paper interesting.

Context goes out of the window, along with any hope for the reader to have any idea why your paper is worth looking at. At best, you'll advertise this in talks, or explain over beers. Side note: you will have to drink a lot of beers to make it in math.

Then, you follow the advice again, and lay things out in the order exactly opposed to the one you were thinking in: definition - theorem - proof - conjectures - examples - context.

Wait, you already scrapped context, and the examples you started with aren't illustrating the results you ended up proving.

So you tidy up your paper, come up with more specific examples, and remove anything that wasn't on the direct path to your result.

Having climbed to a place where you can see better, you pull the ladder up. Good luck to anyone outside the group of five people who are actively working in this niche!

And finally, you write an abstract to your paper, where you mention the things you defined. The abstract doesn't make any self by itself, one needs to be in-the-know to get half of it, and read your paper to understand it.

In practice, it acts as a "No Trespassing" sign for the outsiders (i.e. anyone not in direct contact with the five people you have beers with at the Annual Niche Field Conference).

Satisfied, you lean back and post it to arXiv.

It's been a beautiful day, you think. This practice is difficult to enforce, but you kept it as an aim, and got pretty close to perfection (as exemplified by a Bourbaki text, or anything by Serge Lang, but I repeat myself).

Somewhere not too far away, a student in the class you're teaching cries.

----------------

I said "you", but as someone who's written a couple of math papers, that's really me too. We are all taught in a horrendously backwards (literally!) manner.

This perversion of the beautiful art isn't a new observation. I can't write better about it than Vladimir Arnold[1] (a titan whose name is, I hope, familiar to you).

It's worth a read to anyone who has ever studied mathematics:

[1] https://www.uni-muenster.de/Physik.TP/~munsteg/arnold.html

I do not disagree at all with you. Arnold is my mathematical hero and his advice and insight is invaluable. I understand that math is done starting with proofs, and ending with definitions and axioms.

Yet, I have witnessed many young mathematics students that could not write a concise, self-contained proof, nor understand its value. I certainly was one of those, and this advice helped me. For these people, it is helpful to learn how to organize your thoughts in an over the top, nearly bourbakist, formal way. Also, the correctness of proofs is much easier to check this way, and any incorrect or illogical stuff sticks out immediately. Then, once you have written your stuff in that dry style, you can add some glimpses of discourse that become much more valuable than if you had started with some informal hand-waving. This is pretty much the writing style of Arnold: his proofs are breath-takingly concise and elegant, and there is an insigthful discourse around them. The proofs without the discourse stand on their own, but the discourse alone would be worthless.

I like your analogy of climbing the cliff and pulling the ladder. But there is another cliff that goes even higher and you needed the ladder for that one! Of course you need to help others to build their own ladders.

> Somewhere not too far away, a student in the class you're teaching cries.

Maybe, maybe not. In any case, I agree that you cannot teach math in a purely bourbakist style. I prefer a "visual" style like that inspired by the books of Arnold, Strang, Needham, and I am the sole teacher in my lab that seriously uses the word "amplitwist" to refer to the complex derivative :)

My response is too long for HN; breaking up into two parts :)

Part 1:

==============

I feel like there is a survivorship bias in your assessment: the students who would benefit from learning to construct a rigid argument without holes are already the ones who can hand-wave their way to the result, i.e. they already have the motivation and intuition to get there.

On the other hand, I feel like over 9 out of 10 people who could and would enjoy advanced mathematics get turned off by unnecessary formalisms some time in highs school (Lockhart's Lament describes vividly how Euclidean geometry there is massacred - and that's both the first, and often the last time they see proofs!).

To add insult to injury, rigid reasoning is not introduced to students unless they are math majors, and even then, it's when they take a Real Analysis course. The way we teach intro Calculus and Linear Algebra classes should be classified as a Geneva Convention violation and a crime against humanity: all the intuition you can get from a Bourbakism with none of the rigor.

It doesn't need to be this way. Even rigor can be fun. Just like everything else, rigor is a part of mathematics that we do for a reason. Once you approach the very concept of rigor the same way you approach, say, derivatives, you will see that there is no need to impose it on people.

How many times have you seen a "proof" that 0 = 1, usually derived from a coy division by zero, or abusing square roots, etc? People repost those on Facebook as memes. They are fun!

But also, they are the motivating example for rigor. After all, that's the entire reason we need it in the first place: to avoid arriving at incorrect conclusions.

Without having a vast assortment of examples of arriving at incorrect conclusions, rigor is both unmotivated and unnecessary. Newton and Leibniz didn't need rigor when they invented Calculus, after all; hand-wavy infinitesimals did just fine. Why should the students bother?

There is no value in rigor in and of itself. All the effort to put mathematics on a rigorous footing gave us are things like Banach-Tarski Paradox (which is, objectively, absurd and only shows that the extent to which math models physics goes so far!) and Godel's Incompleteness Theorem (which shows that even attempting to reach Perfect Rigor is futile).

You don't need to introduce Peano's axioms to talk about number theory, and neither does any number theorist, really. And we wouldn't want any student to crank out a Principia while working on their topology homework.

So, treating rigor as a branch of math (which it is!), it needs to be introduced and taught just like any other branch - starting with context, stories, pitfalls, and seeing all the motivation for why we do things the way we do.

It starts with basic critical thinking, logic and philosophy classes, where people learn the difference between "All liberals support free healthcare" and "All supporters of free healthcare are liberals" (....well, I wish).

Going further, it's seeing the "proofs" that 0=1, or that all cats are grey (by induction). The latter "proof" is still the only thing that motivates me to check the "obviously true" things, like the induction step being applicable to the base case.

In high school, I had a great little book called "Lapses in Mathematical Reasoning" by Bradis and co-authors. It was a perfectly accessible assortment of gotchas.

Zeno's paradoxes are a motivation for some of the rigor of Calculus (convergent sequences and infinite sums are the answer to the paradox).

And when we look at rigor like this - like a thing that needs to be motivated, not an a-priori good - we see a rather disturbing pattern that rigor has been introduced at the expense of clear reasoning.

Take Calculus. Teaching it with limits, epsilon-deltas, etc. without giving a motivation for why this complex machinery is needed is purely a waste, and a thing that made many people despise math (it turned me off from analysis for a very long time, personally).

The problems that this rigor addresses aren't even taught to the vast majority of people who take Calculus! Everything that the intro course covers can be taught with infinitesimals just fine without introducing the epsilon-delta rigor.

And, in fact, epsilon-delta rigor can be entirely dispensed with (because the infinitesimals can be put on a rigorous basis, with non-standard analysis). Epsilon-delta was not an achievement. It was a defeat. It was the greatest minds of the time not being able to figure out how to add rigor to the concepts that Leibniz and Newton introduced, and so they simply powered through and worked around the concept of infinitesimal to make some hairy math work.

With rigor, just like with anything else, we have to ask: what's the return on investment there? Is it a good bang for the buck? Why is hand-waving bad?

Having learned a subject, we know where hand-wavy reasoning can lead. We know that not all cats are grey, or that continuous function doesn't need to be differentiable anywhere.

But there is no value in rigorous reasoning in Calculus if we are not running into monstrosities like the Weirstrass function. And, when we start out, we don't - because the Nature is quite nice, math-wise. At least on a day-to-day scale.

Adopting Arnold's mindset, the amount of rigor in a mathematical argument is somewhat like the amount of precision in a physical model.

No sane person would start teaching physics with Einstein's relativity. But in math, not only we do that, we never teach Newton's Laws - and in introductory classes, we don't even explain the formulas!

Imagine forcing high-schoolers crunch Einstein's tensors where all they needed was F = mg, without ever explaining what curvature even is or why it's needed ("it'll come in handy, trust us").

This is what we do with Calculus - or in any area where rigor is used without justification.

>Then, once you have written your stuff in that dry style, you can add some glimpses of discourse that become much more valuable than if you had started with some informal hand-waving

You always start with some informal hand-waving. Not including it in your paper is, put simply, lying by omission.

And great mathematicians didn't shy away from prose, especially when introducing significant concepts. When I was trying to understand quaternions, I found all the texts I looked at stupefying - until I found Hamilton's book where he introduced them.

Not only I got more from the first chapter than I was aware there was to know, but I also learned things like where the word vector comes from when we use it to mean "a magnitude and a direction". Learning it was infinitely more valuable to me than seeing the axioms of a vector space (which, of course, you never need to remember - just write down a handful or so rules that translations in a plane satisfy, and the chances that something that fits ain't a vector space will be zero unless you go out of the way to make up a contrived example just for that purpose).

In fact, and that's Arnold's point, you lose no rigor by ditching formal reasoning when you can be concrete.

I believe that it is detrimental to the human brain to go through the exercise of "proving" that a collection of invertible operators, along with all their compositions, form a group.

And yet, this is a common exercise! People spend time on this! Just watch: [1]. The video goes for seventeen minutes! For diagonal matrices with non-zero entries!

I feel that having this "rigor" is worse than saying that these matrices form a group because of course they do.

On the other hand, no time is ever spent explaining why the formal definition of "set with an operation" is introduced. That's because it's needless, of course; it seems that the sole purpose of this definition is to create exercises.

=====

[1] https://www.youtube.com/watch?v=q_JqHQPbmUk

[2] https://math.stackexchange.com/questions/919040/proving-a-gr...

[3] https://math.stackexchange.com/questions/1108349/prove-that-...

Part 2:

=============

From the comment on that video:

>It is more fun to proof that the set of a 2 by 2 matrices with everywhere the same value x with x not equal to 0 is a group. The determinant of such matrices is 0 but it is still a group.

This is only surprising if you don't understand 2x2 matrices as operators on a plane, in which case the exercise is a cruel perversion (why on Earth would anyone want to consider such matrices, or check that they form a group, with identity that's not the identity matrix?! And how would one come up with this to begin with?!).

"Even though the determinant is zero" is a symptom of a conceptual gap. Of course the determinant being zero has nothing to do with these matrices forming a group! You can embed GL(2) into GL(3) by filling the rest of the entries with zero, and of course this will still be a group: because it acts on the XY plane in just the same way as before, and matrix multiplication still gives their composition because we defined it to work this way.

And of course matrices of the form [x x; x x] form a group. A better question would be, why wouldn't they?

Take the following kindergarten-accessible definition of a group: actions that you can undo, repeat, and combine.

Let's hand-wave the above exercise with this definition. What does [t t; t t] do? Let's take t=1. [1 1; 1 1] takes a point in [a, b] in a plane, and sends it to a point [a+b, a+b].

Doesn't seem like you can undo that, because you don't know whether [3, 3] came from [1, 2] or [2, 1]. Bummer.

Well no surprise, the operation [1 1; 1 1] smashes the whole plane into a single line spanned by [1; 1], aka y = x (the image is spanned by column vectors). We might as well ignore anything off this line, 'cause we can't tell points away from the line apart after applying [1 1; 1 1].

What does [1 1; 1 1] do on its image, the line y = x? It sends [a; a] to [2a, 2a] on the same line. So [1 1; 1 1] acts like multiplying by 2. That's certainly something you can undo.

The same works for other matrices; [x x; x x] acts like multiplying by 2x on that line. You can undo that as long as x is not 0.

And you can repeat/combine these operations because who's gonna stop you?

There is nothing left to prove.

This "hand-wavy" argument, of course, is something that gives much more understanding than a "formal" proof from the "definition". That proof is "fun" because it is surprising - and it is surprising because it doesn't make sense.

And I would argue that it's much easier to make a mistake there - and conclude that it's not a group because such-and-such axiom doesn't hold.

The hand-wavy argument, though, ultimately comes from (or would lead to) an understanding that matrices act on their eigenspaces - and little more needs to be said (given that all of those matrices share an eigenspace).

Furthermore, this gives an example of a group representation for a group of nonzero real numbers with operation defined by xy = 2xy.

Of course, such a definition is utterly confusing; why would anyone come up with such a thing, other than to torture people? Why would* one want to redefine the product of real numbers to be something else?!

Seeing people work it out on Math StackExchange [3] is painful.

The people giving the answers are confident that a * b := 2(a+b) is both is and isn't a group! This alone should tell you that at some point, rigor becomes a hindrance. This is that point.

I say, the correct answer is that if someone gives you a group without a thing that it acts on, ask for your money back.

Of course that thing isn't a group, but what breaks if we just say it is? Since nobody is giving me a refund on this, let's see how it would act on itself. An element a would act by sending b to (2a) + 2b, so we have translation and scaling. Can we undo this? Sure, we just need to shift back by (-2a) and scale down by 1/2. But scaling down isn't an option here, so tough luck.

It's not the only problem, of course; but the student is left utterly confused (again, see comments in [3]!) by this exercise, whose point seems to be that "arbitrarily messing with definitions sometimes works and sometimes doesn't".

But it feels* like this should be a group. Let's fix it. The rule "a * x -> 2(a+x)" is kosher; we can take it to be the action of a on the real line.

What does the composition look like?

well, a * b * x = a * (2b + 2x) = 2a + 4b + 4x

That "4x" there tells us that the group generated by these actions is larger than just the generating set. Nothing in our generating set can multiply by 4 (again, that would be a way to see that the rule doesn't define a group). The exploration can then go on further to examining which subgroup of the affine transformations of the real line this generates. It's interesting!

I trust you that rigor being forced on you could have improved your mathematical reasoning. But in that case, you are exceptional - or there was more to it than "do it this way just because". The most common case, in my experience, is represented in the [1][2][3] (particularly in the comments): it makes people confused, wrong, and lost.

I'd rather have them never seen a definition of a group than go through that kind of brain damage.

>I agree that you cannot teach math in a purely bourbakist style. I prefer a "visual" style like that inspired by the books of Arnold, Strang, Needham, and I am the sole teacher in my lab that seriously uses the word "amplitwist" to refer to the complex derivative :)

In that case, they might be crying tears of joy or grief over all the years they were taught otherwise :)

[1] https://www.youtube.com/watch?v=q_JqHQPbmUk

[2] https://math.stackexchange.com/questions/919040/proving-a-gr...

[3] https://math.stackexchange.com/questions/1108349/prove-that-...

> Until mid-20th century, mathematics had never been communicated in this austere manner.

Euclid.

The Elements may have been meant to go through a teacher who'd help you connect to it, who knows? But the text itself is totally definition, theorem, proof, repeat, and that's all that survived to reach us.

Great Write-up!

I linked to the Arnold essay too before i saw your post :-)

Thanks!

I ended up writing a mini-essay as a follow-up comment.

Also, would love to chat more!