Hacker News new | ask | show | jobs
It took me 10 years to understand entropy (cantorsparadise.com)
398 points by dil8 1522 days ago
44 comments

I don't understand entropy and this article did not change it. The issue I take is with the definition of "the most likely state".

Think of a series of random bits that can be either 0 or 1 with equal probability. How likely is it that they are all 0 or all 1? Not very likely. There is exactly one configuration. How likely is it that they have a specific configuration of 0 and 1? Equally likely. All states are equally likely. If you randomly flip bits you go from one state to another state but each one is equally likely to occur. There is no special meaning to a specific configuration if you don't give it one.

If you look at the average of all bits you start grouping all states together with the equal number of 1s. If you talk about the average there is only one configuration that is all 1s but most configurations have roughly 50% 1s. If you now start flipping bits you will meander through all possible bit-states but the average will most likely be close to 50% 1s most of the time.

In physics we usually look at averages such as the average velocity expressed as temperature. Therefore it makes sense to group together all states using the average and then the states with very low or very high averages are few.

But if you look deeper than that averaging it stops making sense to me. It's a completely different world. I don't know what Entropy is supposed to mean on the level of individual states/configurations. I don't understand what kind of macroscopic "averaging" function we may use to group up those states. There could be more than one possibility - from that would follow that there is more than one definition of macro-Entropy. Ideally there should be one general definition of how we have to look at those microstates and from that follows our general definition of Entropy. Sadly I didn't study Physics and this topic still continues to confuse me. The usual explanations fail to enlighten me.

> But if you look deeper than that averaging it stops making sense to me. It's a completely different world.

I think you're less confused than you think you are!

As I posted elsewhere, it helps to think of entropy as a quantity that actually depends on how much you know about the system in question.

Typically when you calculate the entropy of a system at temperature X, that means all you know is that you stuck a thermometer in it and measured X. You don't know anything more than the average temperature. It could be in any state consistent with that temperature.

If you know more about the system, it has less entropy. If you know it down to the exact microstate, it has zero entropy.

This is how I have come to understand entropy. The words disorder and order are a proxy for information content.

> If you know more about the system, it has less entropy.

One question though. When you say "it" does it include you as well as the system or just the system? To me "it" includes both because by it is "you" who's state has changed by acquiring more information. It could be in the form of neuronal rearrangement or bits being stored in some digital media etc., A new information content has thus been created.

There's an interesting side effect if one thinks deep enough here. The system will keep changing its state so the information one is out of date thus leading to more disorder (i.e., information loss) and increased entropy. One can keep the information updated but it takes energy. And I read somewhere that the energy thus used will lead to increase in overall entropy of the universe and thus the 2nd law.

>This is how I have come to understand entropy. The words disorder and order are a proxy for information content

Does information content mean this? ... "How many bits of random-number generator would I need to make the number of micro-states in the macro-state?"

That is my mental model, yes. More bits are needed to capture more detailed (or micro-states as you called it elsewhere in this thread, or finer-grained) information.

Let's say there's a stone, we want to know its details. If all we want to know is whether it weighs more than 100KG or not then one bit will do. 1 means > 100KG and 0 means < 100KG. If we want to know its colour (as one of 7 WIBGYOR) as well then we need 4 bits; 3 bits to encode 7 colours and 1 bit to encode yes/no for the weight. And so on..as we gain more and more information we need more bits to store that.

This is just for the storage though; in order to gain the information we need to expend energy. More information requires more energy leading to more disorder as expending energy releases heat and thus 2nd order of thermodynamics as well as arrow of time. IMO our perception of time is purely based on memory which is information content of event stored in Neurons.

Quite a bit of hand-wavy. But this is a mental model I've developed over the years of thinking and reading (and listening to lectures) about entropy, information, arrow of time, and energy and how they are interconnected.

The article does say that some crystalline structures can have more entropy (information) than their fluid state. How could that be? Any ideas on what that fluid state might be? The information content in a crystal is really low.
Entropy (differences) are an objective quantity which can be measured, there is no subjectivity about it. It is not which parameters you know it is about which parameters you hold fixed.
Fascinating discussion. I see some parallel here to the Bayesian vs Frequentist view of probability.

They are perhaps both valid points of view depending on the situation.

If you take a frequentist view of an unbiased coin, then the probability that it will land heads on the next flip is objectively, by definition 50%. So the resulting calculation of entropy (log 50% = 1 bit) is also objectively defined. But if your 50% probability represents a subjective belief, the resulting entropy calculation should also be considered subjective, I would think.

Leonard Susskind disagrees with you. See his lectures on statistical mechanics, he is very clear that entropy is a matter of knowledge about the system. It has to be.
Susskind says that entropy is determined by selecting a macro-state. He doesn't claim that the entropy of a macro-state depends on whether we know which macro-state the real system is really in.

If we happen to know, then, sure. For example we could pick a weird-ass observable state, and when we saw it we would know the entropy of the system was low. But the entropy of each macro-state just depends on how many micro-states we define it to contain. It doesn't depend on our knowledge of the system state.

The concept of entropy wasn't invented so that we could calculate entropies of macrostates, it was so that we could calculate entropies of real systems and understand their behaviour. Macrostates are an accounting tool that helps us do this. You seem to be treating the calculation of macrostate entropy as an end-goal in itself, but also allowing yourself to somehow freely choose any macrostate you want. When it comes to applying thermodynamics in practice, you'll have to calculate the entropy of a real, or at least hypothetical, system.

The point of macrostates is that you ought to know which macrostate a given system is in. That's the thing that you know. You don't know which microstate it's in, but you do know which macrostate it's in.

For example, if I say "a cubic metre volume of air at room temperature and pressure", I've described a physical system. I've also described a macrostate.

If you're calculating the entropy of macrostates that are not consistent with a description of a system -- if you've defined your macrostates such that you don't know which macrostate a given system is in -- then in order to calculate that system's entropy you have to sum up over all such possible macrostates anyway, so you haven't saved yourself any work or earned any insights along the way.

So yes, you can calculate the entropy of a macrostate without knowing what macrostate a real system is in, but it kind of sounds like you're arguing that log(x) is not a function of variable x, because log(3) is a constant and log(4) is a constant, and you can divide up any x into constants of your choice.

I've been trying to reconcile these perspectives, and I think it really is both. And they are both physically relevant.

Consider the subjective entropy perspective. If you know the exact microstate of a system, then you can in theory play the part of Maxwell's demon. You could have a little gate that you open only for fast particles, and using your knowledge of the microstate, you can predict exactly when they will arrive.

But consider the objective perspective. If you take this very same system and put it in thermal contact with another system, then an objective entropy perspective is the relevant one. Those systems will equilibrize and your subjective knowledge is irrelevant to that process.

I haven't fully wrapped my head around it yet, but I do think that acknowledging both is a step in the right direction at least.

> If you take this very same system and put it in thermal contact with another system, then an objective entropy perspective is the relevant one. Those systems will equilibrize and your subjective knowledge is irrelevant to that process.

The subjective view handles this scenario just fine, though, and makes more accurate predictions than the objective view.

For example, there are systems where some aspects of the original microstate survive thermal contact with another system. We use such systems to store data! I bet your hard drive is in thermal contact with its environment right now! It's very hard to reconcile this with an objective take on entropy.

And there are some systems that will rapidly be scrambled. The subjective perspective has no problem admitting that your knowledge of a system can become inaccurate and useless. Even without thermal contact, you'd need to perform a tremendous amount of (perhaps reversible) computation in order to make a functioning Maxwell's demon with your initial microstate conditions, because the microstate will evolve in time in a complicated way. The subjective view is still totally consistent with entropy of a system increasing over time!

I wrote two replies to this that I both deleted. Then I had a good long thunk, and here's what I came up with.

The temperature of an object can be determined through 1/T = dS / dE. What is this S? How can it exist if you know the system perfectly? And here is where the great insight comes. The thermometer! You apply a thermometer to a system you perturb it! The system may have started in one particular microstate, but the very nature of thermal contact involves random influence. Those random tiny influences from the thermometer allow the object (harddrive in our case) to enter a bunch of microstates with certain probabilities. And that's what S measures.

So our subjective knowledge does actually not matter. (Classically speaking) the system is in a particular microstate we may know it or not, and it still manages to have a temperature. That is due to the states it could hypothetically enter (but haven't yet)!

If we think back to the harddrive and it's contents: Very gently touching a harddrive with a thermometer while not scramble its contents. So we may say that microstates corresponding to different files than the ones you put there are actually not accessible. And they don't contribute to the entropy we used for the temperature.

No, it is subjective. We just only have such blunt instruments for practically measuring states, relative to the gargantuan amount of entropy in most real systems, that the subjective nature of entropy is easy to miss. But in a world where the frontiers of thermodynamics have moved from steam engines to lasers, computers, DNA, and black holes, the difference is increasingly obvious and important.

With steam engines, we got away with treating a volume of gas as having not only a few parameters that we knew and cared about, like mass, temperature and pressure, but we could further deceive ourselves into thinking that those were the only parameters that existed to describe the system. The only parameters that were knowable. But Boltzmann knew better.

Look at Boltzmann's formula, S = kB log W.

For any single particular system you describe to me, W will be 1, and so S will be 0. So it's only if you describe an ensemble of systems -- that is, if you describe a system vaguely, such that I am left to imagine the details -- that we have nonzero entropy. If you ask me to calculate the entropy of that "system", that macrostate, that ensemble, then sure, I'll end up with nonzero entropy. But if I ask you to keep transmitting more data about the scenario, then with each further description, you'll be narrowing the state space and thereby decreasing the entropy.

Look, since the entropy of a macrostate is nonzero, but the entropy of any single microstate which is consistent with that macrostate is zero, it's clear that entropy is not an intrinsic property of any real system. It's a property of how many other possible non-existent systems could be swapped out for the one in front of you, without you noticing the change.

If I swap out the air in your room for an equal volume of air at equal temperature and pressure, you probably won't notice.

If I swap out the hard drive in your laptop for an equal volume of hard drive at equal temperature and pressure, you probably will!

Maybe better to say that the universe does not appear to pick out a single coarse-graining or fine-graining procedure for practically any system.

For instance, following your Boltzmannian example, I think one would notice swapping 1 µm³ of the r/w head and 1 µm³ of the recording surface of a new, freshly powered on HDD more than one would notice substituting the entire HDD for a new one of the same model and turning that on. And here I am already using units of length (cf. "equal volume"), and we know neither units nor lengths are generally picked out by the universe.

Very few people know this but.

Information entropy and statistical mechanical entropy are two different things.

They share the same equation and the same name but they are two unrelated concepts. You have conflated the two. The person you are responding to is referring to statistical entropy.

Basically in this entire thread nobody, including you, is fully grasping the situation.

They are not at all unrelated. It is not easy to grasp, so I understand the confusion. https://en.m.wikipedia.org/wiki/Landauer%27s_principle

Fun rabbit hole would start with classic paper by jaynes

Many more recent examples relating bit erasure costs of computation. Some names to look up if interested include charlie Bennet,Dave wolpert, James crutchfield, Susanne still, for starters.

Edit -- a collection of ideas related to this problem and mixing in "complexity" can be found in SFI proceedings called "Complexity, entropy, and the physics of information"

I respectfully disagree. Perhaps you'd like to present more than a mere assertion to make your case. I did.

If it helps, here's a paper that explains my stance in more detail. https://bayes.wustl.edu/etj/articles/theory.1.pdf

If you think there is no relation between the different things called entropy apart from the name maybe you're not fully grasping the situation either.
U, the internal energy is objective.

The free energy F = U - TS is the maximum amount of work you can extract from the system. This depends on how much you know about the system. S does indeed depend on what you know about the system.

See the Gibbs Paradox for more information.

If two people disagree on the maximum amount of work that could be extracted from a given system (with both of them basing their figure on their own evaluation of S), are there any cases where it would it be impossible to empirically demonstrate that at least the proponent of the lower figure was wrong?

If only changes in S (and F) have measurable consequences, would that not merely mean that assigning an absolute value is an arbitrary choice, which would not mean the same as it being subjective (there could still be an objective conversion between one basis and another, as there is for kinetic energy in different inertial reference frames.)

In the Gibbs Paradox, there is no subjectivity in whether the gases being mixed are the same or different, and no subjectivity in what the change of entropy is in either case. The paradox is that it does not feel right that identity makes an objective difference between the two cases, but the empirically-demonstrable distinction between fermions and bosons shows that this intuition does not hold in general. I believe Von Neumann came up with a QM resolution of the paradox.

> are there any cases where it would it be impossible to empirically demonstrate that at least the proponent of the lower figure was wrong?

Isn't it more interesting to examine a situation where it would be possible to empirically demonstrate that the proponent of the lower figure was wrong?

I think he is confusing the usage of entropy in physics and computer science. In computer science entropy is conditional probability and depends on what we know about a system.
As it does in physics!

"which parameters [thermodynamic variables] you know" ~ "which parameters [thermodynamic variables] you hold fixed"

(or know in average, like the energy for a system in a heat bath where the temperature is fixed)

https://bayes.wustl.edu/etj/articles/theory.1.pdf

http://nicf.net/articles/thermodynamics-statistical-mechanic...

You can observe the movement of molecules beyond macro properties like temperature.
No this is completely and utterly wrong. Entropy is not a function of knowledge.

Two people with varying and different levels of knowledge of a system does not mean the system has two different entropy values. Even if I knew the exact position of all atoms in a cup of water, the temperature of that water does not change due to that knowledge.

Entropy does rely on what your picked configuration of macro states and microstates. Temperature is an arbitrary choice of macrostate.

> Even if I knew the exact position of all atoms in a cup of water, the temperature of that water does not change due to that knowledge.

It actually does! You would disagree with the other person about the temperature of that water. But I agree that this is admittedly not obvious at first.

No it does not. The thermometer does not change based off of my knowledge or opinion.
A thermometer doesn't measure temperature any better than a meterstick measures length. And we all know what Einstein had to say about the relativity of metersticks.

To paraphrase from the paper I linked in another reply to you, a thermometer is just a heat bath equipped with a pointer which reads its average energy, whose scale is calibrated to give the temperature T, defined by 1/T = dS/d<E>.

You can read the thermometer if you like, but if you know the exact microstate of the water to begin with, the thermometer reading will tell you much less than you already knew about the water. And precise knowledge of the water's microstate will (theoretically) allow you to extract much more work from that water than you would be able to with only the thermometer reading.

> Even if I knew the exact position of all atoms in a cup of water, the temperature of that water does not change due to that knowledge.

If you knew the exact position of all atoms in a cup of water you wouldn't assign any temperature to it. Not a thermodynamic temperature at least.

The number of microstates does not change, even if you KNOW the the cup of water is in a specific microstate.

The boltzman equation is based on total accessible microstates.

"accessible" means something only given a set of constraints.

Like the temperature, if you keep the temperature of the water fixed. And the number of molecules if instead of a cup you have a close container to prevent it from evaporating. Then what you have is water at some temperature that you control. And you could have the water at a different temperature with exactly the same microstate.

Or imagine gas at some fixed temperature within a cylinder with one movable wall. If you knew the location of every molecule of the gas it wouldn't make sense to talk about its pressure - you could compress it (reducing the number of accessible microstates) without doing any work.

Edit: In summary, thermodynamics loses its meaning if you know the microstate and can act on that knowledge.

If the cup of water is in a specific microstate at time t=0, and evolves over time according to deterministic equations of motion, how will it "access" other microstates that aren't along that specific trajectory in phase-space?
I think that works ok. But I think it's an unnecessarily tricky explanation. Entropy per macro-state decreases as we look at finer-grained macro-states. It feels simpler to associate the entropy of each macro-state with that macro-state, rather than assuming we know which macro-state the system is in, and then attributing the lower entropy to our knowledge of the macro-state.

I think it can probably be expressed either way. I just think the "knowledge" part is tricky and can be left out.

Can you elaborate on the difference between a "fine-grained" macrostate and a macrostate that is not fine-grained?

I think you will find it hard to separate the concept of a macrostate from the state of knowledge (or ignorance) of an individual subjective observer.

Sure. A fine-grained macro-state contains fewer micro-states. A coarse-grained macro-state contains more micro-states.

Say I flip 8 coins and I don't look at the results. A fine-grained macro state is TTTT TTTT. A coarser-grained macro state is TTTT xxxx. The one has 4 bits more entropy than the other. It works the same way in statistical mechanics. Call them spins.

We're just talking about some ensemble of micro states, and then we divide the ensemble up into macro-states. To do statistical mechanics at all, I think I have to define some macro-states according to which micro-states they contain. That doesn't mean I necessarily have any information about which macro-state the system is actually in.

> A fine-grained macro state is TTTT TTTT.

This macrostate seems fine enough to be a microstate, but sure. For the macrostates with at least one 'x' in it, that 'x' seems to be a placeholder for the concept of subjective ignorance.

> That doesn't mean I necessarily have any information about which macro-state the system is actually in.

But the entire purpose of the exercise of assigning microstates to macrostates is so that you can match up a description of some system to a microstate ensemble and calculate its entropy! Otherwise there's no point to arbitrarily labelling various groups of microstates.

To follow your example more practically, let's say you have an 8-spin system, whose net spin is zero. (You know because you've measured its overall magnetic moment or something). I've just described a system that is in one of the following possible microstates:

TTTT HHHH, TTTH HHHT, ..., HHHH TTTT

Now you can go ahead and define the macrostates as fine-grained as you want, where TTTT HHHH and HHHH TTTT are in different macrostates, but to calculate the entropy of this system, you're going to have to sum up all of those macrostates anyway to get the one that's consistent with the described system.

How is a macrostate TTTTxxxx different from having the information about the TTTT part and not about the xxxx part?

Talking bout the entropy of the macrostate TTTTxxxx makes sense only conditional on the TTTT information.

Oh my god, this explanation is gold. Thank you. I'm going to save it and refer to it in the future.
Yeah the author is conflating low entropy with a low number of microstates, which is consistent with the thermodynamic assumption that maximal entropy means a uniform distribution of microstates, but is confusing.

The purest mathematical justification for why low entropy means a low number of microstates probably comes from the fact that (classical) physical systems are a dynamical systems that preserve the measure induced by the standard metric of the phase space. The measure theoretic definition of entropy then implies the entropy of a partition of the phase space (i.e. a set of macrostates) is indeed the average of the logarithm of the number of microstates.

So indeed if W is the number of microstates in the current macrostate then entropy = log(W) (on average).

And using the typical set you can show that the probability that the average of log(W) over n samples is within 'epsilon' of the 'exact' entropy goes to 1 as n goes to infinity. This is the mathematical justification for the second law of thermodynamics.

The trick is that all of this is true no matter how you partition phase-space. Though that does mean that what is and isn't a high entropy state depends on your perspective.

>The trick is that all of this is true no matter how you partition phase-space. Though that does mean that what is and isn't a high entropy state depends on your perspective.

That seems to be correct.

>Yeah the author is conflating low entropy with a low number of microstates

Since entropy is found by counting micro-states (for example your third paragraph), that should be ok. What am I missing?

The equivalence is an important theorem (important enough to be engraved on Boltzmann's gravestone), and if you're switching back and forth in an explanation of what entropy is then you're skipping over some important details that answer what it means for something to be a 'low entropy state'.
Here's a concrete example of entropy with just two macro-states, 'broken' and 'unbroken'. If it becomes unclear or unconvincing, can you point out where that happens? It's intended to be ELI5, clear enough to discuss coherently.

Question: Why is it when I drop a vase it smashes into a million pieces; however when I then drop the million pieces it does not form a vase?

Answer: Stop! Don't drop any more expensive vases. Start with these simpler systems that do repair themselves sometimes when you drop them.

Take a coin and align it so the 'heads' side faces up. 'Heads' means 'unbroken'. (The reason for that will become clearer as we do more experiments.) Now drop the coin on the floor. How often is the 'heads' side still facing up? Now if you drop it again, how often does it 'repair' itself so that the 'heads' side is up? (Really do this.)

Try the experiment with 2 coins. Align them all heads-up, drop them, then see if your pattern is 'broken'. ('Broken' means not-all-heads-up.) Drop the 'broken' coins again. How often do they 'repair' themselves? ('Repaired' means all heads-up.) ( Don't think about it! Don't solve for it! Do it! )

Try again with 5 coins. How often does a 5-coin system 'break' when you drop it? How often does a broken 5-coin system 'repair itself' when you drop it again?

How about 10 coins? How often does a broken pattern of 10 mixed heads/tails repair itself to all heads when you drop it again? Sometimes it does, but you'll have to be very lucky or patient to see it happen.

I think from here you can probably see (part of) the answer to your question about the vase. The word people use for this kind of thing is 'entropy'. With enough coins, the 'broken' state is much more probable than the 'repaired' state. The log of a probability is called 'entropy.'

https://www.quora.com/Why-is-it-when-I-drop-a-vase-it-smashe...

Have you looked at Huffman coding? I'd recommend the (free) book by David MacKay, it is secretly the "hackers guide to thermodynamics".

http://www.inference.org.uk/mackay/itila/book.html

This book is phenomenal and I highly second this recommendation.
> I don't know what Entropy is supposed to mean on the level of individual states/configurations. I don't understand what kind of macroscopic "averaging" function we may use to group up those states

I find it helpful to think of entropy as a property of not of the system, or any individual state (micro- or macro-), but as a property of the "compression" process that summarizes microstates with a coarser-grained macro-description.

Given a choice of compression, classical physics says a system will tend to spent most of their time in the most likely compressed state. Different choices of compressions can lead to different macro descriptions, with different "entropies" and different dynamics among their macroscopic variables.

In this light it's not meaningful to think of the entropy of individual states. You could think about the "identity" compression, but you would end up with a description that was exactly as complicated as the full micro-state time-evolution dynamics; you wouldn't end up with any smaller set of variables that could describe the equilibrium of the whole system (really this would not admit an "equilibrium" at all)

First, entropy is a macroscopic property, it makes no sense to talk about the entropy of a single particle. Second, entropy is not a fundamental property, it depends on what the observer cares about. Take the common example of a gas in the corner of a box, in that case we care about the density distribution in the box, a macroscopic property. To make this more concrete, one way to quantify the density distribution could be to tessellate the box with cubes of some size, find the ones with the lowest and highest density, and use the difference between those two densities as a measure of density variation.

Once we have decided that all we care about is the density variation in the box, we can go through all possible microscopic states and group them together by their density variation. Finally a low entropy macroscopic state is simply a macroscopic state - a certain density variation - for which there are only a few microscopic states that have the corresponding density variation. On the other hand a high entropy macroscopic state is a macroscopic state for which there are many microscopic states that have the corresponding density variation. You can also call the microscopic states low or high entropy but only with reference to the macroscopic property you use to group them, in themselves microscopic states are not low or high entropy.

If you observe a low entropy macroscopic state, then you know a lot about the microscopic state, after all there are only very few. If you observe a high entropy macroscopic state, then you know a lot less about the microscopic state, there are much more possibilities even though they are microscopically indistinguishable. And if there are no limiting constraints on how the microscopic states can evolve, if the evolution is essentially a random walk through all possible microscopic states, then the entropy of the system will increase with high probability as it is much more probable to randomly walk into one of the many microscopic states associated with a high entropy macroscopic state than to walk into one of the few microscopic states associated with a low entropy macroscopic state.

>there are much more possibilities even though they are microscopically indistinguishable.

You meant "macroscopically" ?

True, but to late to edit.
This is a very sensible confusion. The forms of macroscopic averaging functions which are useful and valid cannot be made up arbitrarily, but are determined by the microscopic physical laws of the system. There is a reason that the law of increase of entropy is the second law of classical thermodynamics, with conservation of energy being the first law. To state it explicitly: energy is a globally conserved quantity, which can be freely exchanged among the interacting microscopic parts of systems. So we can bring a test system (called a thermometer) into interaction with our system under study, (indirectly) observe the average energy per degree of freedom of the thermometer, and call that observation the temperature of the system under study. Similarly, it is a known physical phenomenon that a gas confined to a container will exert a steady average outward force per normal unit area on the walls of the container; we have ways to measure this force, and we call it pressure. And so on, and on: every useful macroscopic averaging function is a relatively stable, measurable quantity which is determined by the physics of the systems under study. If we discovered some new measurement technique tomorrow which enabled us to measure the "quintessence" of physical systems, and this measurement was stable and reproducible, and could be meaningfully aggregated from the microscopic parts of the system and measured on the macroscopic scale, our definition of entropy would change, to account for "quintessence".
It is also because “the system” is intrinsic to the notion: as you say, any configuration of bits is equally likely; this only takes into account the “system of the bits “. The moment your system is “the bits and their mean value” everything changes, as there are systems with a single possible configuration.

That is what happens when he starts the first example: “a system of particles INSIDE A VOLUME. The volume is what makes the entropy larger or smaller. The particles in a different volume (or just by themselves) have a different entropy.

Not a physicist either, and I don't claim to understand entropy that well either but maybe it would help to consider that entropy may not be a universal variable of systems in the universe.

I think you should rather consider it as a mathematical construct that applies to some systems where the microscopic quantities are well defined, and where the 'averaging' that we can observe is also well defined. So if you look at thermodynamics, entropy is well defined, but you may be totally right, that what we call "microscopic states" in a gas can be broken down further in elementary particles, that may or may not behave in quantic ways, and what not, and counting the micro-states considering the elementary particles is a whole different game.

But it doesn't really matter. What matters is that at the scale we're at and with the microscopic/macroscopic relation that's defined, entropy works. The calculations that give some numbers to entropy show that it looks like entropy cannot decrease. They call it a universal principle of thermodynamics, because there is nothing (to my understanding), that explains it microscopically.

And it works for a variety of situation in physics, such that it seems that it's a universal property of nature. But it's mostly mathematical. It seems to say that "given a system we know everything about, there is no way to go to a system that has some unknown things to us".

Anyways. I mostly wrote this to see if I could articulate it to myself, hopefully it helps you as well.

Thanks for writing that. From the perspective you’ve articulated I sometimes wonder whether the idea of the heat death of the universe is a matter of perspective, it only applies to the matter and properties of the universe that we consider significant, are we living within the heat deaths of past forms of the universe in which physical interactions we have overlooked dominated?
>It seems to say that "given a system we know everything about, there is no way to go to a system that has some unknown things to us".

That's backwards: information is the negative of entropy. The 2nd law says that entropy never decreases, so information never increases (it can only be preserved or lost).

>I don't know what Entropy is supposed to mean on the level of individual states/configurations.

The entropy is a property of a probability distribution, not of a state. Entropy is defined as H = -sum(p_i log(p_i)). A 'state' implicitly defines a probability distribution: uniform probability over all the microstates compatible with the state description.[0] In the case of a microstate, the entropy of the probability distribution over microstates consistent with that state is zero - there's only one compatible state, so p_i = 0 for all other states, and log(p_i) = 0 for the compatible state. In the case of a macrostate, the entropy of the probability distribution over microstates consistent with the macrostate works out to -sum((1/N) log(1/N)) = log(N), where N is the number of consistent microstates. That's the Boltzmann entropy.

Sometimes people will write about the entropy of a 'state' in such a way that it sounds like they're talking about the entropy of a microstate -- but what they're probably talking about is "the entropy of the macrostate that this microstate belongs to." It's sloppy to talk like that, because "the" corresponding macrostate isn't unique. There are many sets of macrostates that could contain a microstate, depending on what properties of the microstates one considers 'macro.'

(Ex: 10100101 is a member of both "symmetric bit strings of length 8" and "bit strings of length 8 that average to 1/2". The entropy of "symmetric bit strings of length 8" is 4 bits, whereas the entropy of "bit strings of length 8 that average to 1/2" is ~6.1 bits. And of course, the entropy of "the bit string of length 8 that is exactly 10100101" is zero.)

For information theoretic entropy (I don’t know anything about thermodynamics):

The first thing you describe, that if you draw many symbols from your source distribution at random then you see a distribution of symbols that is equal to the probability distribution of the source, is called the asymptotic equipartition principle.

I think your confusion comes from two things. First, conflating bits and symbols and second assuming symbols are equiprobable.

Take English text where our symbols could be letters of the alphabet. These are not equiprobable, if you select a letter from a book at random you get a different distribution than 1/26. If you took as your symbols the individual bits that would encode those characters in ascii you would get something closer to equal probable symbols. Another choice for symbols would be the words in the book.

I suppose the problem is because entropy is a proxy for the number of states of an abstract configuration space which has the same observed quantity as the concrete object that you take to measure its entropy. So, for example, if you know that your object with mass M and temperature T then to measure its entropy you take all the posible states for an abstract object with mass M and temperature T and the logarithm of that number of states is the definition of the entropy of one object that has mass M and temperature T. So the more you know about the the concrete object the less number of posible states for the abstract model and so the entropy is not a property of the concrete object rather is a property of an abstract model with same fixed global properties.
>There could be more than one possibility

This seems to connect with the idea behind Chaitin's incompleteness theorem. Making specific statements about the reducible complexity of something is not always possible.

>Think of a series of random bits that can be either 0 or 1 with equal probability. How likely is it that they are all 0 or all 1? Not very likely. There is exactly one configuration. How likely is it that they have a specific configuration of 0 and 1? Equally likely.

Well, there are only 2 states with all 1 or all 0.

But there are 2^N states of mixed 1 and 0.

Even if you treat the sets of bits as opaque items, and pick one from a bucket, I'd expect getting one of the 2^N - 2 configurations to be a far more likely outcome than one of the 2 remaining.

In fact, we could bet on it...

But there's only one state that is 10010001111110101000.
Sure, but that's irrelevant.

There are billions that are similar, and only one that's all 0.

"similar" is in the eye of the observer! This fact should be especially clear in the context of bit strings. If 10010001111110101000 is my login password, don't be surprised if other permutations fail to grant you access, even if you have the correct number of 1's and 0's.
>"similar" is in the eye of the observer!

Nope, also similar in algorithmic complexity theory (e.g. counting compressibility).

In computer science a high entropy means a high informational value. So any information that is not the expected value has a high content of information and therefore a high entropy.

In thermodynamics the case is a bit to the opposite as a high entropy means a low state of energy and therefore less internal processes within a system or none at all.

Another thing that contributes to the confusion which you have noted but not fully realized is that there are two different concepts that use the same equation and the same word: "entropy".

Information entropy and statistical entropy are two different things.

A specific string of bits has zero entropy. It may be a sample from a distribution which has some entropy.

Unless you are selecting a random single bit from that string, in which the entropy of that selection process is -p1log p1 - p0log p0.

High entropy = less predictable system. Low entropy = highly predictable system.
To stay within your bits analogy, I imagine an increase in entropy would be the equivalent of each bit becoming base-3, base-4, and so on, hence increasing the number of possible states (and reducing your ability to predict them).
One aspect of entropy that I always find counterintuitive is that unlike mass, charge, etc. it is not a physical quantity. In fact, from the point of view of an experimenter with perfect information about a physical system, the entropy of the system is exactly conserved over time (as made precise by Liouville's Theorem). The Second Law survives in this setting only in the most trivial sense that a constant function does not decrease.

It's only when you start making crude measurements---lumping positions into pixels, clouds of particles each with their own kinetic energy into a single scalar called "temperature," etc---that you start to see a nontrivial entropy and Second Law. Different ways of lumping microstates into macrostates will give you different (and inconsistent) notions of entropy.

The way to make sense of entropy is to treat it as a subjective quantity. A subjective quantity is a function where the observer's state of knowledge is one of the input arguments.

The article describes it as a measure of hidden information in a system, which is a good description. But that's not a property of the system itself, it's a property of the observer, from whom the information is hidden.

So different observers with different information about a system will have different opinions about its entropy.

My password, for example, to me has zero entropy. I know its microstate. But it's quite secure from someone trying to guess it, and they will think it's quite high in entropy.

If all you know about a system is that it's a kilogram of air at room temperature, it will seem quite high in entropy to you, as many possible microstates are consistent with that description. But if you have godlike knowledge of the exact configuration of every particle in the container, it will seem very low in entropy to you, and that's more than just an accounting difference. Indeed you can use that information to operate a Maxwell's demon and turn the system into a heat engine, splitting the cold and hot molecules into separate spaces and extracting work as though the system really had low entropy to start with. Because it did. To you.

Most of the confusion about entropy comes from what Jaynes calls the mind projection fallacy: the tendency to treat our uncertainty about a system as a property of the system, rather than a property of ourselves.

> The article describes it as a measure of hidden information in a system, which is a good description. But that's not a property of the system itself, it's a property of the observer, from whom the information is hidden.

Was hoping to see someone point this bit out.. I wish references to entropy included this piece of information more frequently. When I was first trying to understand the concept I kept thinking of it as something objective, but as you say it’s a property of the observer

Speaking of observers always rubs me off the wrong way... I don't want to touch on the observer problem, but just to mention something that should be obvious: there's ALWAYS hidden information in any system where time exists. Any "observer" can only know what the world looks like within its light cone. Because quantum mechanics shows that determinism is not possible, it's not possible for any "observer" to know the exact future state of the world outside what was observable within its light cone up until that moment. There's also the problem that you can only store a limited amount of information even given perfect theoretical storage... hence again, some information must be forgotten by whatever the "observer" is... talking about a "perfect observer" that knows all there is to know makes absolutely no sense.
Since entropy seems to be a measure of our ignorance, then there maybe there is no point in discussing a perfect observer (of something that is boundless). Edit: > talking about a "perfect observer" that knows all there is to know makes absolutely no sense.

if there was such a thing as a perfect observer, let’s say it is you, then you would still choose to measure some things, but not others. Unless by “perfect observer” you are referring to something that knows the states of all things simultaneously at all times, in which case that (to me) doesn’t sound like an observer at all, would just be someone/something that knows. So like you said “there is always hidden information” but that information is hidden to someone that is doing the observing and so is dependent on them. Otherwise from what or whom would the information be hidden? What would information even be without an observer?

Whatever is the driving factor behind the laws of physics seems to have “perfect information”. Not implying anything religious.
Entropy in thermodynamics is a statistical effect which acts like a "force" because of the immense number of particles and sub-states in play. A perfect simulation of gas particles bouncing in a two-chamber system will result in the "pressure" equalising because that is overwhelmingly the most likely state to end up in.

To be honest, I hadn't heard of Louisville's Theorem before but it doesn't seem to imply what you're saying -- in fact it is used to prove the fluctuation theorem which quantifies the probability of entropy spontaneously decreasing (as thermodynamic entropy is a statistical effect).

Liouville's Theorem does indeed seem to imply that entropy doesn't change:

https://physics.stackexchange.com/questions/202522/how-is-li...

> One aspect of entropy that I always find counterintuitive is that unlike mass, charge, etc. it is not a physical quantity. In fact, from the point of view of an experimenter with perfect information about a physical system, the entropy of the system is exactly conserved over time

True of energy as well. It can't be directly measured except as a relation between two states.

> One aspect of entropy that I always find counterintuitive is that unlike mass, charge, etc. it is not a physical quantity.

Those physical quantities might be intuitive, but as a physicist Brian Greene once wrote, no one really knows what mass is. We only know that mass bends space-time curve, hence gravity.

> no one really knows what mass is. We only know that mass bends space-time curve, hence gravity.

Mass is much better understood by its role in inertia. Basically mass is the amount of energy you need to exchange with a thing to change its current speed. This observation works from Newtonian mechanics to QM and GR as well.

Now, why do things have mass? The famous E=mc² explains this for most things: they have mass because something inside them has potential or kinetic energy. This works all the way down to the atomic level - the mass of a proton for example is almost entirely explained by the potential energy of the quarks being held together in a small volume; the total mass of the quarks themselves is only a small fraction of that. Now, the mass of the elementary particles is somewhat more complicated, but the Standard Model does have explanations for those - symmetry breaking for fermions, and the Higgs mechanism for the massive bosons.

The next mystery is: why is inertial mass equal to gravitational mass? GR has essentially explained this, by showing that acceleration is equivalent to gravitational attraction depending on your frame of reference.

So overall, I'm not sure what Brian Greene means by that - mass is at least as well understood as other basic properties of particles (charge, spin, color charge).

This lecture by Leonard Susskind explains most of these things about mass in a way I found easy to follow:

https://www.youtube.com/watch?v=JqNg819PiZY

If I can pick your brain, there’s a related concept — entropy production. How does that relate with these ideas?
Not the GP, but entropy production is any process that increases your ignorance about a system (and usually, if you're doing it intentionally, everybody else's ignorance as well).

To produce entropy you have to grow the number of possible microstates that are consistent with available knowledge of the macrostate.

Usually you accomplish that by converting stored energy into heat somehow. A charged capacitor has low entropy compared for the energy it holds; discharge it through a resistor, and you produce a bunch of entropy because that energy can now be distributed in a lot more ways among a lot more degrees of freedom, and nobody can possibly keep track of them.

It's a property of information so if you assume perfect information, of course it becomes trivial.
I would ask why you don't have the same problem with energy?
Not really. Even from a point of view of an experimenter with perfect information, the entropy of the system declines over time as fewer and fewer bits are needed to describe the system.

For example, start with a Glas of warm water and an ice cube in it. Over time, the ice will melt and the range of different temperatures of the molecules decline. Consequently, you need fewer and fewer bits to describe the complete state of the system. It takes fewer Bits to encode all the velocities of a million molecules that all move at a similar speed than to encode all the velocities of a million molecules that move at very different speeds.

The more similar the state of the molecules becomes, the shorter a text becomes that has to describe the complete state of a system. Therefore, entropy is decreasing even from the point of view of an observer with perfect information.

Because ice is solid you can argue it takes less information because the particles aren't moving at all or in together in unison, so it will take less buts to encode.

Furthermore, velocity is also a product of direction as much as speed, so if you take into consideration a solid object may vibrate it's particles in the same direction while a liquid can have it's particles in infinite directions, you're talking about way more information you have to encode.

What is perfect information?

I understood perfect information to include infinite precision knowledge of non-quantized values like position and momentum. To store that information we'd almost always need infinite bits to express the state of even one particle.

(...and cough ignoring uncertainty...)

Using a finite number of bits was described in the comment above as "lumping positions into pixels".

Surely glass of warm water + ice cube is lower entropy state compared to melted icecube mixed in the water.
This reminds me of a great article that I saw on Hacker news that really helped explain the concept of Entrophy to me. Linked here:

https://news.ycombinator.com/item?id=24140808

Gotta love these interactive websites.
I thought entropy (in the Shannon sense) was a property of discrete and finite probability distributions. It's essentially a measure of how random a sample from such a probability distribution is. Notably, continuous probability distributions don't have meaningful entropy (or in some sense, their entropy is always infinite). It's worth considering the similarities and differences between entropy and standard deviation.

I thought the 2nd law of thermodynamics was saying that with incomplete knowledge, the probability distribution of possible states becomes more and more spread out as time goes on. It's almost a limit to how you can make predictions or simulations of physics when the initial state of the system is not fully known. Equivalently, it's a banal statement about chaos in the sense of chaos theory.

The only thing I don't get is how physicists get around the discrete and finite restriction. Maybe the state of the system is not what has entropy. Rather, one can define an arbitrary function f from the system to a finite set S, and then talk about the entropy of f(System at time t), because this is indeed a discrete and finite probability distribution which you can take the entropy of.

Hmmm. Maybe I understand entropy.

> The only thing I don't get is how physicists get around the discrete and finite restriction.

Actually, they don't! When you start doing the math about states in a quantum sense (i.e. statistical mechanics), the basic premise is that the available range of states _is_ discrete. Particles are quantized - so they can only possess certain allowable discrete energy levels. The broader laws of thermodynamics fall out of that and appear to be continuous as you scale up to the macro world across a huge number of microstates.

I think this comment is significantly more insightful than the article.

As for the thing you don't get: quantum mechanics means that the state space is actually discrete, which means there is no need to pass to a continuous distribution. And finiteness is not really a concern either: first of all it is not strictly necessary for the (Gibbs) entropy to be defined, and secondly the space state is actually often finite once e.g. the total energy in the system is fixed.

> I thought entropy (in the Shannon sense) was a property of discrete and finite probability distributions. It's essentially a measure of how random a sample from such a probability distribution is. Notably, continuous probability distributions don't have meaningful entropy (or in some sense, their entropy is always infinite).

True, but for continuous distributions you can use the KL divergence against a uniform distribution :)

One of the properties of entropy H(X) of a random variable X is that if f is a bijective function then H(f(X)) = H(X).

For relative entropy (or "KL divergence" as some people call it), we have that H(X||Y) = H(f(X)||f(Y)). But if you fix Y to have a continuous uniform distribution, then you lose this critical property because f(Y) may no longer have a continuous uniform distribution.

Apparently this "critical property" is not so important to all the people who use relative entropy as a generalization to a continuous distribution defined on a space with an underlying measure.

Why would they care about arbitrary transformations mapping points in the space to other points in the space?

What I think it means, is that if you take two different parametrizations of the same physical phenomenon, then you get two different entropy values.

E.g. if you have a bunch of particles with fixed mass. You could look at the distribution of speeds and get one entropy. Then the distribution of kinetic energy (basically speed squared). Uniform speed means non-uniform speed squared so the entropies would disagree.

This sounds like it could pose issues.

Physical entropy is defined from the probability distribution over states. Velocities or squared-velocities are not states, they are derived quantities. Points in a phase space would describe states. Physical states are discrete anyway when you consider quantum physics :-)

As for the entropy of probability distributions in general, I think relative entropy is invariant under reparametrizations because both the probability of interest and the reference probability transform in the same way [1]. But I don't remember what does it mean exactly. [And I am not sure if that makes ogogmad wrong, I may not have understood well his comment.]

([Edit: forget this aside. You probably were talking about speeds as positive magnitudes.] By the way using an example analogue to yours discrete entropy wouldn't be invariant either: if you have a distribution {-1,1} and square it it collapses to a zero-entropy singleton {1}.)

[1] https://en.wikipedia.org/wiki/Kullback–Leibler_divergence#Pr...

Yeah, you also have to transform the "reference" function, and then the entropy stays the same. I prefer to think of it as the "density of states" -- it's necessary to make the argument of the logarithm dimensionless, after all.
> I thought the 2nd law of thermodynamics was saying that with incomplete knowledge, the probability distribution of possible states becomes more and more spread out as time goes on. It's almost a limit to how you can make predictions or simulations of physics when the initial state of the system is not fully known. Equivalently, it's a banal statement about chaos in the sense of chaos theory.

I'm not sure I understand what do you mean by "as time goes on". Classical thermodynamical entropy is defined for a system in equilibrium and it doesn't change with time. It changes when you do things to the system.

I don't think statistical mechanics entropy is limited in this way. I think the (incorrect? oversimplified?) definition given in the article is only valid under the conditions you've given. But I'm not sure.
Then it maybe depends on what you meant by "the 2nd law of thermodynamics".
In Shannon’s 1948 paper, part V deals with continuous sources. The key is to realise that you cannot measure a continuous signal exactly, and so you can define a rate of information relative to the fidelity of your measurement. (I only skimmed that part years ago, and never studied it carefully. But it makes perfect sense.)
If you mean differential entropy (which Shannon supposedly suggested as a generalisation to continuous random variables), this is not a good generalisation of entropy to continuous random variables. It lacks all the interesting properties of entropy.

The "proper" generalisation of entropy to continuous random variables is something called relative entropy, or in some books it's called KL divergence. But this is now a property of how two probability distributions relate to each other, rather than a property of a single probability distribution alone.

I'm not an expert in probability theory or physics, but this is what I've learnt from a brief study of these areas.

Relative entropy? KL? Ah, found it – Kullback–Leibler divergence, it’s called. Thanks, I’ll put that on my list of stuff to learn about.
>how physicists get around the discrete and finite restriction

By turning a sum into an integral. The probability 'density' is p(x), and the 'density of states' is n(x), so then entropy is then integral of p(x)log(p(x)/n(x)) over dx.

Right, it requires a sort of alphabet of discrete specific states. Discrete locations in space, discrete numbers of things and discrete kinds of things.
yeah. i think of minimum entropy as a dirac delta distribution and maximum entropy as a flat uniform random distribution.

i never really understood the physical definition, but always handwaved it away with "things dissipate over time into an undetectable signal, or a flat distribution"

You basically nailed it!
Nice writeup! BTW statistical thermodynamics has a name for that set of possible microstates, perhaps the most pretentious sounding name in all of physics, the "canonical ensemble".
Nah, Ultraviolet Catastrophe is worse, then there's wavefunction collapse. And there's gotta be something in particle physics that puts these terms to shame. Given that the particle names are drawn from literature and whimsy.

I looked up the dictionary definition of pretentious, and so it doesn't really apply, but the hERG channel is a critical ion channel and various drugs block this and cause Really Bad Things. hERG stands for "the human Ether-a-go-go-Related Gene" - a pretty bloody stupid name but whimsey is not restricted to particle physics.

Canonical ensemble, along with microcanonical and grand canonical ensembles are all over statistical mechanics. And I suspect there's not one bit of whimsey in their naming. There was not any humour in my stat mech course aside from me trying to make sense of it.

There's actually a sort of general principle in medical science that people should avoid whimsical names for things, since in all likelihood someone with a life-altering or fatal condition or their family shouldn't be told that it's due to a mutation in the Sonic Hedgehog protein [1]

[1] https://en.wikipedia.org/wiki/Sonic_hedgehog

"Ultraviolet Catastrophe"'s problem is that it sounds way cooler than it is. I mean, that's the title of a cyberpunk novel; more hyperbolic (and disappointing) than pretentious. Eigenthings would be second on my list of pretentiousness (oddly, not gedankenthings; I'm inconsistent I guess). Standard Model names strike me whimsical to the point of being undignified.
I'm still mad about top and bottom when truth and beauty were right there!
If I'm not mistaken, the "official" names are just "t" and "b", so top and bottom, just like truth and beauty, are more mnemonics. So referring to them as truth and beauty would be just as correct.
Thought one was their names and the other their attributes?
Ultraviolet catastrophe is cool! It’s why we needed quantum physics, otherwise there would be catastrophic infinite energy emerging from random high frequency oscillations!
Well that's not quite what it is, first of all, and second of all I don't like the practice of calling something a catastrophe when its just predicted by a theory, but didn't happen. I mean, bad theories always predict something horrible should be happening. What's next, do flat earthers get to have an "ocean water catastrophe" because their theory implies all the water drains away? Or a "super luminal" catastrophe because their theory requires infinite unending linear acceleration of the earth (and moon and sun)? I mean, it really is a catastrophe I guess...for the theorist.
My favorite is "gravothermal catastrophe" with "violent relaxation" being a close runner-up.
"well, I never!"

—the Grand canonical ensemble

> the most pretentious sounding name in all of physics

Also, "the free will theorem" and "the god particle".

the "god particle" was actually the "goddamn particle" https://www.businessinsider.com/why-the-higgs-is-called-the-...
Like the great Von Neuman once quipped, “ Why don't you call it entropy”, von Neumann suggested. “In the first place, a mathematical development very much like yours already exists in Boltzmann's statistical mechanics, and in the second place, no one understands entropy very well, so in any discussion you will be in a position of advantage.”
The typical measure of entropy (Shannon or Gibbs, and let's spare details for later and after you've read up on the theory of large deviations) is

- sum (p log(p))

which is not that different than the formula for the mean

sum (p 1/n)

the critical difference is the normalization constant is based on the probability of the state rather than assuming a uniform probability over all states.

So, in effect, the entropy is a measure of the mean. It is a measure adopted to the case where "mean" is ill-defined because the number of modes and/or the variation around those modes is not handled well by simpler metrics.

If there was anyone who taught you this then they should be fired.

More constructively, principal among the many things wrong with your comment is the formula for the mean; sum_i p_i = 1, so sum_i p_i / n = 1 / n. The mean would instead be sum_i p_i x_i.

Perhaps I'm misunderstanding or missing something, but I'm afraid this seems completely wrongheaded to me. (My apologies for being so blunt, but right now your comment appears to be the most-upvoted, and I therefore think it needs some pushback.)

[EDITED to add: I was looking at an old version of the page; by the time I wrote this the parent was no longer the top comment. I'll leave the bluntness in, especially as at least one other person was even blunter.]

You refer to "the mean" and I think you mean the mean of the probabilities. Now, when you've got a probability distribution, by far the usual thing for "the mean" to mean is the sum of Pr(x) x -- the mean of the values. Taking the mean of the probabilities is a really strange thing to do.

One reason why it's a really strange thing to do is that this thing you call n is really kinda meaningless. There's no difference between these two probability distributions: (a) 1, 2, 3, or 4, with probabilities 0.1, 0.2, 0.3, 0.4 respectively; (b) 1, 2, 3, 4, or 5, with probabilities 0.1, 0.2, 0.3, 0.4, 0 respectively. But (a) has n=4 and (b) has n=5. Maybe you want n to be the number of nonzero probabilities? But now consider (a) along with the following probability distribution parameterized by a (small, positive) number h: 1, 2, 3, 4, or 4+h, with probabilities 0.1, 0.2, 0.3, 0.4-h, h. Every version of this distribution with h>0 has n=5, but when h is very small it's practically indistinguishable from (a) with n=4.

Further, since the sum of probabilities is always 1, what you write as sum (p 1/n) is just the same as the number 1/n. You can call it "the mean" if you want to, but I don't see what this adds over calling it what it is: the reciprocal of the number of possibilities.

There is something to what you say: the entropy is kinda related to the number of possibilities; if the probabilities are all equal, the entropy is log(#possibilities); if the probabilities are equal-ish then it's modestly smaller than that. But note e.g. that this relationship is exactly the inverse of what you say, in that "the mean" decreases with the number of possibilities, and the entropy increases with the number of possibilities.

The entropy is not "a measure of the mean". It kinda-sorta is related to "the number of possibilities", which is the reciprocal of "the mean". It is not at all the case, as your last paragraph suggests, that for most purposes we should be using "the mean" but we need to use the entropy when "the number of modes ... is not handled well by simpler metrics", whatever that means; for most purposes we should be using the entropy, and in the special case where all the probabilities are equal we can get away with just counting possibilities.

(In some important situations it turns out that what you have is some number of possibilities with roughly equal probabilities, and a whole lot more whose probabilities rapidly decrease to almost zero, and then you can get away with counting the number of reasonably-probable possibilities and taking its log. E.g., various situations in communications theory can fruitfully be thought of this way. But the entropy is still the more fundamental quantity, and "the mean" is still a needless obfuscation of "the (effectively) number of possibilities".)

It can be related to compression. If some phrase has a probability p_i of occuring, then the optimal length for the code is -log(p_i). The entropy sum(-p_i log_pi) = mean(-log(p_i)) is how long code you will use on average.
The author mentions Boltzman brains and that a human body could theoretically spontaneously form out of particles given a long enough time span. Of course, nothing like this can ever happen. It’s the fallacy of thinking infinite time means infinite possibilities.
> that a human body could theoretically spontaneously form out of particles given a long enough time span.

To be fair, isn't this precisely what happened?

If you leave out "spontaneously"!
Specifically if you leave it in! It's a matter of viewpoint or scope.

"Spontaneous" can be defined a few different ways: https://www.merriam-webster.com/dictionary/spontaneous

From the link "2: arising from a momentary impulse" you get the implied meaning from the original comment if you assume standard human experience of a "moment" IE a few seconds or less.

However you could argue that human evolution is but a moment in the scope of the universe.

And then with definition "5: developing or occurring without apparent external influence, force, cause, or treatment"

The only way it wouldn't be spontaneous would be if an external actor (Deity of your choice?) directed human evolution somehow. To say this is debatable is an understatement..

And so we have fun in word play and hopefully appreciation of different viewpoints.

:-)

edit: rearranged for better flow

"Oh, that was easy," says Man, and for an encore goes on to prove that black is white and gets himself killed on the next zebra crossing."
It's not a fallacy, it's a paradox. One that indicates that our theories of quantum fluctuations in an infinite universe are incomplete.

This podcast episode has a good discussion on the subject: https://universetoday.fireside.fm/745

The initial conversation is an entirely false premise, that an infinitely large universe would have "anything that can happen, would happen." An infinitely large universe could be empty, and fit the bill, and there would be nothing like "there are infinite copies of myself that are slightly different than now." It's bad philosophy, not science.
Perhaps it's actually correct, but our intuitions about ridiculously long periods of time aren't good. Note that heat death is in ~10¹⁰⁰ years, whereas this Boltzman body would take ~10^(10⁶⁹) years. That second time period is literally incomprehensible. So we think, of course a fully formed human body wouldn't appear; in practice, that's not how it actually works; the fact that it appears possible is at best a mathematical artifact, not reality. But we're talking about a timescale that's not just longer than the age of the universe, or than the total lifespan of the universe, not just orders of magnitude longer than those times, but on a completely different scale. Given that, I think we need to toss out those intuitions.

As to the author's last question of whether such a thing even makes sense at all given those time scales, I don't see why not. After all, once the universe reaches heat death, as far as we know nothing from the outside is going to come along and garbage collect it, so why couldn't it last for an arbitrary/infinite number of years? And compared to that, ~10^(10⁶⁹) years, or ~10^(10^(10⁵⁶)) years, or whatever, is nothing.

I think this more about the arrow than the entropy.

Considering the ‘arrow of time’ a function on each microstate lends itself better to ”can” questions.

I think both the author and you are assuming your definition for the arrow is correct.

It is still unknown if the arrow allows any arbitrary state to become another, or if there is a strict genealogy for the change of the state that can be seeded from an existing state that ensures it will evolve into a specific other under some time scale, or whether either of those permit Boltzmann brains, or even if there is some control the state has over itself (free will?).

Why is it not bound to happen eventually?
For the same reason an infinite number of zeroes will never contain a one. Particles still have to follow the laws of how matter behaves, no matter the timescale. Nothing spontaneously forms like the author suggests, regardless of entropy. That’s not how matter works.
In short: the authors make a good summary of these ideas:

- Entropy in thermodynamic equilibrium is well understood. The early theory (before statistical mechanics was developed) fits well with our modern understanding.

- The analogies made about entropy are not always good and indeed, if you try to match the physics with "entropy is disorder" it does not always work.

- In non-equilibrium situations it is, as the author points out, more complex.

Regarding the last item, even Stephen Hawking postulated some strange ideas about the universe having to rewind past some point in time, so that the big crush would be the mirror of the big bang.

Here's another head-spinning application of the concept of entropy, in quantum information theory:

https://www.cambridge.org/core/books/abs/quantum-information...

> "The first fundamental measure that we introduce is the von Neumman entropy. It is the quantum analog of the Shannon entropy, but it captures both classical and quantum uncertainty in a quantum state. The von Neumann entropy gives meaning to a notion of the information qubit. This notion is different from that of the physical qubit, which is the description of a quantum state in an electron or a photon. The information qubit is the fundamental quantum informational unit of measure, determining how much quantum information is in a quantum system."

Incidentally chem.libretexts.org, a collection of open-source chemistry textbooks, has a good overview of the physical-chemical applications. The site is kind of a mess but you'd want chapter 18.3:

https://chem.libretexts.org/Bookshelves/General_Chemistry/Ma...

Actually a quite nice article. After also spending years as a professional physicist not understanding entropy, I finally decided that I was not necessarily the problem, and spent the last 5 years or so trying to understand it better by rewording the foundations with my research group. (I'm one of the papers the author cites is part of a series from our group developing "observational entropy" in order to do so.)

A lot of what makes this topic confusing is just that there are the two basic definitions — Gibbs (\sum p_i log l_i) and "Boltzmann" (log \Omega) — entropy, and they're really rather different. There's usually some confusing handwaving about how to relate them, but the fact is that in a closed system one of them (generally) rises and the other doesn't, and one of them depends on a coarse-graining into macrostates and the other doesn't.

The better way to relate them, I've come to believe, is to consider them both as limits of a more general entropy (the one we developed — first in fact written down in some form by von Neumann but for some reason not pursued much over the years.) There's a brief version here: https://link.springer.com/article/10.1007/s10701-021-00498-x.

This entropy has Gibbs and Bolztmann entropy as limits, is good in and out of equilibrium, is defined in quantum theory and with a very nice classical-quantum correspondence, and has been shown to reproduce thermodynamic entropy in both our papers and the elegant one by Strasberg and Winter: https://journals.aps.org/prxquantum/abstract/10.1103/PRXQuan...

After all this work I finally feel that entropy makes sense to me, which it never quite did before — so I hope this is helpful to others.

p.s. If you're not convinced a new definition of entropy is called for, ask a set of working physicists what it would mean to say "the entropy of the universe is increasing." Since von Neumann entropy is conserved in a closed system (which the universe is if anything is), and there really is no definition of a quantum Boltzmann entropy (until observational entropy), the answers you'll get will be either a mush or a properly furrowed brows.

The universe is an open system.
The universe is not a closed system
How do you define "the universe"?
Statistical mechanics is one way of representing entropy but you don’t need it. The second law of thermodynamics can be expressed in other much more general terms. Also it requires that the system be isolated not “thermally isolated”. There’s other types of interactions such as gravitational and electromagnetic.
I mean, come on. You know and I know that the statistical mechanics definition gets you 99% of the way there in terms of intuition. Obviously if I spin a rotor in my thermally insulated box with a magnet on the outside I can add energy to order things with, I don't think anyone is confused on that point.
Well, claiming to having understood entropy is no joke. He should be flawless then. Just like with enlightenment he who claims to understand entropy really does not and he who does will not say so.
Can you explain what you mean by "not needing stat mech" and thermo entropy being "more general"?
I had an art teacher who was very philosophical. One day he described to the class what entropy was. I took a lot of physics and even astrophysics. Little did i know he had a better conceptual understanding and explanation than i've ever heard before. Too bad i don't remember exactly what he said.
Not to poke holes in your nostalgia, but how do you know he had a great explanation if you don’t remember it after further study?
"Don't try to understand it, feel it" - from tenet, but does sort of apply to entropy as a way of looking at problems.

That being said I heard a sports science student try to recall their working definition of entry and it was some mess of locks and keys floating around randomly hitting eachother?

> definition of entry and it was some mess of locks and keys floating around randomly hitting eachother?

Locks and keys does indeed sound like "entry". I thought we were discussing "entropy".

In art, a high-entropy painting is one that would be hard to tell apart from similar paintings, one example being paintings created by simply splattering paints all over the canvas.
That's the problem with randomness.

The required Dilbert reference: https://dilbert.com/strip/2001-10-25

> "Entropy is not Disorder One of the most popular belief about entropy is that it represents disorder."

This is what confused me the most about entropy in high school, the "order / disorder" lingo. Isn't "order" a metaphysical concept, something a conscious entity thinks about a system? How would nature know the difference? It took me some years to understand that that lingo is indeed misleading. (Still definitely not an expert of course.)

It's order in the different things in different places sense. Not the best term, since if I had to give an informational definition of order I'd probably set it up backwards, but not terrible. I like "separate" to communicate low entropy and "mixed" for high entropy, but that's just what particular examples of low and high entropy look like.
> Isn't "order" a metaphysical concept

Yes. The most scientific way of talking about "order" in that sense is the Kolmogorov complexity, which is still extremely poorly-defined. The best way to put it is that "ordered" states are ones that have low Kolmogorov complexity.

What is a way to articulate it then?
OP is about as good as any explanation I've seen.
"Compressibility" (in the software sense) would perhaps carry the most meaning.

Entropy is ultimately all about the ability to extract information (i.e. work) out of a system.

"Compressibility" is really not that great an illustration for physical entropy. Information-theoretic entropy is not quite the same thing as physical entropy, but close enough to confuse you if you're not paying attention.
I'm going to need an example on the differences, because in so far as statistical entropy is entropy - it's ultimately describing an information function (and the lean in quantum mechanics these days is that information is describing physical properties as well - hence holography and the blackhole information paradox).

The heat-deathed universe for example would be the ultimate compressible information: 1 measurable state, across all space, for the rest of infinite time.

Indeed, it was actually James Gleick's book The Information that helped me understand the concept better.
>Contrary to popular opinion, uniformly distributed matter is unstable when interactions are dominated by gravity (Jeans instability) and is actually the least likely state, thus with very low entropy. Most probable states, with high-entropy, are those where matter is all lumped together in massive objects.

That means over time the system becomes more ordered and starts organizing itself into spheres.

I once brought this question up on physics stack exchange and basically the answers were either some form of rolling their eyes at me or dismissing me outright. The people who did answer the question stated that as particles organize themselves into spheres some other part of the universe gets hotter as a result and that the seemingly self organization I see going on with the solar system was just an isolated system.

This answer still seemed far fetched to me. It still looks as if some overall self organization is still going on if the universe gets hotter on one side and matter gets organized into solar systems on another side.

It took me 3 years to somewhat understand what entropy is. If you have loaded dice that always roll 6s then the dice rolling ALL 6s is the highest entropic state. rolling Random numbers would then be a low entropy state.

Entropy is simply a phenomenon of probability. As time moves forward, particles enter high probability configurations. Like rolling dice. As you roll dice more and more... rolling random numbers has a higher probability then rolling all 6s...

It just so happens that disordered arrangements happen to have higher probabilities in most systems. But if you look at a system of loaded dice or the solar system... in those cases Ordered configurations have higher probabilities. That's really all it is. The entire phenomenon of entropy comes down to probability and the root of probability is the law of large numbers.

Entropy: "to describe energy loss in irreversible processes". We have no clue about what is or is not reversible. Complex systems exhibit self-organizing behavior for no reason (that we understand), and we continue to identify more conditions under which this occurs. How does a Nobel Prize get handed out for identifying/quantifying "self-organization" http://pespmc1.vub.ac.be/COMPNATS.html without bringing everything we think we know about entropy under scrutiny? Self-organization does not consume energy any more than entropic decay emits it. Irreversibility is a poor assumption.
> Self-organization does not consume energy any more than entropic decay emits it.

This statement is incredibly wrong - this is exactly what both these processes do. We calculate chemistry reaction kinetics by including entropy terms, and optimize reactions by manipulating the entropy on one side of the equation (a classic is getting a liquid phase to precipitate out as you produce it).

I mean the reason coal can be turned into electricity is because there's a big increase in entropy going from "solid carbon in a specific location" to "CO2 diffused everywhere".

Complex systems are net increases in entropy. The water is flowing downhill, but it takes a really weird organism-shaped path to get there. Self organization is supposedly interesting because we don't know why such a path manifests. Thus far, nothing has given the second law a second's (ha ha) pause. It's not impossible, but considering most of our foundational physics is time-symmetric it makes sense to call entropy irreversible. Even if it could be reversed (and don't hold your breath on that one), it's still the cause of the arrow of time.
Welp, I suppose the author has another 10 years to work on their article about how it took them 20 years to actually understand entropy.
Entropy isn’t measuring a loss of energy, but the loss of the ability for a closed system to do useful work.

Order is often used to describe what’s going on but it’s not the kind of order we normally think of. Sufficient cold water in a warm room is just as capable of preforming work as warm water in a cold room.

Gas molecules in a box - entropy seems quite straighforward there. An even distribution is the most likely state and has the highest entropy.

In space, at large scales, gravity starts dominating - so stars and planets are actually a higher likelihood state than an even distribtion.

Isn't this just about statistical independence? In a small amount of gas (almost by definition of what is a gas), the particles don't have much effect on each other. One can assume statistical independence.

While in space with gravity overwhelming other effects, the particles have very much effect on each other. Hence the statistics about their state are affected by these dependencies. So the previous intuition about entropy can't hold.

I enjoyed the article but have a very minor nitpick. I didn't understand why the author added this sentence.

"However, the timescales involved in these calculation are so unreasonably large and abstract that one could wonder if these makes any sense at all."

Apart from the fact that we could wonder about anything and everything I think the author does not state what evidence do we have to suspect that large enough timescales would change the laws of physics.

It could be the case of course, and it would be great to talk about them if they exist but without further justification I feel that this sentence is an unjustified opinion in what is otherwise a very nice article that helps better understand enthropy.

Of course you don't need to really understand entropy for it to be useful. It's definitely an interesting concept but when I was crunching equations for Thermodynamics, one of the weeder classes for ME, it becomes clear you need it for things to balance out. Once you've cranked threw a dozen or so problems you get a feel for what it is even if the physics and the spiritual side of it remains murky.

Now, 35 years later, when I marvel at my new engine or what have you, I still vaguely remember my entropy-problems days and appreciate that someone worked this stuff out.

I view entropy as a probability distribution of some set of configurations of something. Entropy is low if there’s only one configuration and high if uniformly distributed.

There’s also some observer/interaction effect which is like introduction a conditional probability which would cause crystallization in an otherwise homogeneous system. Essentially a catalyst.

I also find it fascinating that when it is super cold outside and you throw a pan of boiling water out the window it turns to snow instantly vs a cup of room temperature water which does not. It probably fits in terms of activation energy as well.

I recommend Information Theory for Intelligent People: http://tuvalu.santafe.edu/~simon/it.pdf
Let me add: This PDF is only 13 pages and presents a few different ways of viewing entropy. It was easy for me to follow as an undergrad.
That's pretty good as far as I'm concerned. Took me a couple years to really grasp electrical impedance. Breakthrough for me was a concise book written in 1976 by Rufus P. Turner.

Subtle things take a while to get.

The Science of Can and Can't[1] is interesting in how it looks to address a number of fundamentals via counterfactuals including the 2nd Law of Thermodynamics.

Edit: See [2] for background about Constructor Theory.

[1] https://www.chiaramarletto.com/books/the-science-of-can-and-... [2] https://www.youtube.com/watch?v=8DH2xwIYuT0

Of possible related interest: https://arxiv.org/abs/chao-dyn/9603009

I think Bricmont is a clear thinker/presenter on these matters and this article actually showed up in a "for humanities people" anthology. [1]

[1] https://www.amazon.com/Flight-Science-Reason-Academy-Science...

I had the pain, and pleasure, of taking and then (assistant) teaching thermodynamics at MIT.

One of the tidbits that always stuck with me was that astronomers have estimated that observable universe’s total entropy:

When you compare that value to the maximum possible entropy, i.e. the heat death of the universe, and then to the ridiculously low entropy state of the beginning of the universe, we are currently halfway along in that ‘timeline’.

It always brought to mind a grandfather clock; the clock stops when the weight hits the floor, and we are halfway there…

My understanding of entropy: it is a measure of how big a system (matter + energy from a space region) is, and how much its components have interacted with each other: Entropy ~ log(number of possible system states). As the universe unfolds, systems originally isolated are starting to interact and to form bigger systems, hence the number of possible states increases, and entropy increases too.
Entropy implies that these states are indistinguishable from each other.
Of all of physics, entropy is the most depressing part.
Of all places, it was a conversation about the Socratic Forms in a Political Theory course I took in college that really brought the weight of the concept home to me. It went something like "Unlike the realm Socratic forms exist in, everything in our universe is subject to entropy; it is in everything's nature to degrade or decay over time."

Maybe there's more to that? I'm all ears.

The forms are a low energy state that materially arise from an entropic process. Degradation and decay can lead to more organization and beauty.
Small nit in case the author sees this: the image labelled "Entropy of each configuration of system with two dices where the observed macrostate is their sum" is either incorrect or mislabeled.

For example, 2 and 12 each have 1 microstate, and ln 1 = 0, so the entropy of 2 and 12 is 0, but the image says 0.028 (which is the probability of 2 or 12, not the entropy).

> Boltzmann imagined that our universe could have reached thermodynamical equilibrium and its maximal entropy state a long time ago, but that a spontaneous entropy decrease to the level of our early universe occured after an extremely long period of time, just for statistical reasons.

I’m interested in reading more about this. Any pointers?

I read with interest most well written articles explaining entropy. I often leave the article mildly satisfied that I understood it. Until the next day when I again have to figure out the difference between "high" and "low" entropy in a particular model, and invariably I mix up the two.
As a computer scientist it isn't helpful that the entropy in thermodynamics and that in computer science (informational content - I don't know a good English term) collide a bit.
but is it not hubris to think that we really know much about the origin and outcome of the universe? is it wise to make decisions based on this modicum of knowledge that we currently have regarding thermodynamics and the universe?

I suspect that the scientists of a trillion years from now will know a lot more than we do know...so, I don't really much that much confidence in current pronouncements regarding the beginning and possible end of the universe..

and yes I do have a degree in science and courses in physics & thermodynamics

As the old quote runs...

"[W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together." -- Asimov

It isn't wise to say you're ignorant, it's wise to know how ignorant you are. If I see a coin come up HHH and have to bet on the next 2 flips, you can be damn sure I'll bet HH. You can bet HT, TH, or whatever else at equal probability, but I suspect I'll come out the winner more frequently than you.

Does it hurt to try?
it hurts to make decisions based on our current knowledge that is a child's knowledge
How exactly? And how exactly would we ever get past a ‘child’s knowledge’ without learning and making mistakes?
The models we have work and are relatively parsimonious. It would be silly to assume they are final but equally silly to shy away from using them for obvious reasons.

The modernization of physics also means we have outlines for what theories should look like, so even if our current theories are wrong we can still use the principles of (say) symmetry and information to constrain future work.

Shannon called the function "entropy" and used it as a measure of "uncertainty," interchanging the two words in his writings without discrimination
scientific method: it started with thermodynamic entropy, but scientits found out that this truth is much deeper engrained in our universe, then we got a mathematically generalized version, which is now used used to explain the "arrow of time" which our time reversable physics equations would not be able to explain alone.
The problem is that entropy is a subjective notion.

It's a measure of our lack of knowledge about the state of a system.

Alternatively it took ten years for Aurelien Pelissier's misunderstandings of entropy to decay.
Anyone who think they understand entropy is living in a state of sin.
For those missing the reference, the original quote is also great:

"Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin." John von Neumann

Google freewall? Guess I won't read this article...
Seeing "Read the rest of this story with a free account."

Nope.

What do you mean?
It is required that I log-in to the article using my google or facebook account to read it.

Garbage. Your article isn't worth reading if I need to jump through hoops to get there.

Here's an archived copy:

https://archive.today/vRD7i

I think the word entropy is science’s largest mistake.
Shannon explained the name 'entropy' in (McIrvine and Tribus 1971):

My greatest concern was what to call it. I thought of calling it 'information,' but the word was overly used, so I decided to call it 'uncertainty.' When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, 'You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.'

Very cool! I didn’t know that.
English was not my college professors native language. It took me a while to realize entropy was not the same as enthalpy. Very confusing.
Yeah, and while computational modeling has a good handle on enthalpy in drug design, it's all handwaving and statistics for entropy. And yet it's entropy that breaks predictions and drives medchem.

Solve how to properly model the free energy of interaction between a ligand and a protein (or two proteins) with proper solvent treatment and (a) you'll be famous, not Kardashian famous, but famous and (b) a whole lot of people will buy or download your software.

Wait until you find out about enstrophy.
entropy does not increase. The universe has organized itself into people, brains, cities, iPhones
not to brag but it took only 2 months to forget anything about it