Hacker News new | ask | show | jobs
by threatripper 1520 days ago
I don't understand entropy and this article did not change it. The issue I take is with the definition of "the most likely state".

Think of a series of random bits that can be either 0 or 1 with equal probability. How likely is it that they are all 0 or all 1? Not very likely. There is exactly one configuration. How likely is it that they have a specific configuration of 0 and 1? Equally likely. All states are equally likely. If you randomly flip bits you go from one state to another state but each one is equally likely to occur. There is no special meaning to a specific configuration if you don't give it one.

If you look at the average of all bits you start grouping all states together with the equal number of 1s. If you talk about the average there is only one configuration that is all 1s but most configurations have roughly 50% 1s. If you now start flipping bits you will meander through all possible bit-states but the average will most likely be close to 50% 1s most of the time.

In physics we usually look at averages such as the average velocity expressed as temperature. Therefore it makes sense to group together all states using the average and then the states with very low or very high averages are few.

But if you look deeper than that averaging it stops making sense to me. It's a completely different world. I don't know what Entropy is supposed to mean on the level of individual states/configurations. I don't understand what kind of macroscopic "averaging" function we may use to group up those states. There could be more than one possibility - from that would follow that there is more than one definition of macro-Entropy. Ideally there should be one general definition of how we have to look at those microstates and from that follows our general definition of Entropy. Sadly I didn't study Physics and this topic still continues to confuse me. The usual explanations fail to enlighten me.

19 comments

> But if you look deeper than that averaging it stops making sense to me. It's a completely different world.

I think you're less confused than you think you are!

As I posted elsewhere, it helps to think of entropy as a quantity that actually depends on how much you know about the system in question.

Typically when you calculate the entropy of a system at temperature X, that means all you know is that you stuck a thermometer in it and measured X. You don't know anything more than the average temperature. It could be in any state consistent with that temperature.

If you know more about the system, it has less entropy. If you know it down to the exact microstate, it has zero entropy.

This is how I have come to understand entropy. The words disorder and order are a proxy for information content.

> If you know more about the system, it has less entropy.

One question though. When you say "it" does it include you as well as the system or just the system? To me "it" includes both because by it is "you" who's state has changed by acquiring more information. It could be in the form of neuronal rearrangement or bits being stored in some digital media etc., A new information content has thus been created.

There's an interesting side effect if one thinks deep enough here. The system will keep changing its state so the information one is out of date thus leading to more disorder (i.e., information loss) and increased entropy. One can keep the information updated but it takes energy. And I read somewhere that the energy thus used will lead to increase in overall entropy of the universe and thus the 2nd law.

>This is how I have come to understand entropy. The words disorder and order are a proxy for information content

Does information content mean this? ... "How many bits of random-number generator would I need to make the number of micro-states in the macro-state?"

That is my mental model, yes. More bits are needed to capture more detailed (or micro-states as you called it elsewhere in this thread, or finer-grained) information.

Let's say there's a stone, we want to know its details. If all we want to know is whether it weighs more than 100KG or not then one bit will do. 1 means > 100KG and 0 means < 100KG. If we want to know its colour (as one of 7 WIBGYOR) as well then we need 4 bits; 3 bits to encode 7 colours and 1 bit to encode yes/no for the weight. And so on..as we gain more and more information we need more bits to store that.

This is just for the storage though; in order to gain the information we need to expend energy. More information requires more energy leading to more disorder as expending energy releases heat and thus 2nd order of thermodynamics as well as arrow of time. IMO our perception of time is purely based on memory which is information content of event stored in Neurons.

Quite a bit of hand-wavy. But this is a mental model I've developed over the years of thinking and reading (and listening to lectures) about entropy, information, arrow of time, and energy and how they are interconnected.

The article does say that some crystalline structures can have more entropy (information) than their fluid state. How could that be? Any ideas on what that fluid state might be? The information content in a crystal is really low.
Unfortunately the author doesn't explain it beyond sharing a reference to this paper[1] which is way beyond my competence.

[1] https://www.nature.com/articles/nature08641

Entropy (differences) are an objective quantity which can be measured, there is no subjectivity about it. It is not which parameters you know it is about which parameters you hold fixed.
Fascinating discussion. I see some parallel here to the Bayesian vs Frequentist view of probability.

They are perhaps both valid points of view depending on the situation.

If you take a frequentist view of an unbiased coin, then the probability that it will land heads on the next flip is objectively, by definition 50%. So the resulting calculation of entropy (log 50% = 1 bit) is also objectively defined. But if your 50% probability represents a subjective belief, the resulting entropy calculation should also be considered subjective, I would think.

Leonard Susskind disagrees with you. See his lectures on statistical mechanics, he is very clear that entropy is a matter of knowledge about the system. It has to be.
Susskind says that entropy is determined by selecting a macro-state. He doesn't claim that the entropy of a macro-state depends on whether we know which macro-state the real system is really in.

If we happen to know, then, sure. For example we could pick a weird-ass observable state, and when we saw it we would know the entropy of the system was low. But the entropy of each macro-state just depends on how many micro-states we define it to contain. It doesn't depend on our knowledge of the system state.

The concept of entropy wasn't invented so that we could calculate entropies of macrostates, it was so that we could calculate entropies of real systems and understand their behaviour. Macrostates are an accounting tool that helps us do this. You seem to be treating the calculation of macrostate entropy as an end-goal in itself, but also allowing yourself to somehow freely choose any macrostate you want. When it comes to applying thermodynamics in practice, you'll have to calculate the entropy of a real, or at least hypothetical, system.

The point of macrostates is that you ought to know which macrostate a given system is in. That's the thing that you know. You don't know which microstate it's in, but you do know which macrostate it's in.

For example, if I say "a cubic metre volume of air at room temperature and pressure", I've described a physical system. I've also described a macrostate.

If you're calculating the entropy of macrostates that are not consistent with a description of a system -- if you've defined your macrostates such that you don't know which macrostate a given system is in -- then in order to calculate that system's entropy you have to sum up over all such possible macrostates anyway, so you haven't saved yourself any work or earned any insights along the way.

So yes, you can calculate the entropy of a macrostate without knowing what macrostate a real system is in, but it kind of sounds like you're arguing that log(x) is not a function of variable x, because log(3) is a constant and log(4) is a constant, and you can divide up any x into constants of your choice.

We seem to be stuck in a loop of explaining basic first-year statistical mechanics back and forth to each other repeatedly. I'm not sure why.

I'm making a pedagogical point. The OP addresses how difficult entropy is to understand. I'm responding to that. We don't need to talk about "knowledge" when you define entropy, or in an initial explanation of entropy. We could, but we could decide not to.

The log(x) example is a good one. First-time students who are learning about logarithms don't need to be told that a logarithm depends on 'knowledge' or on 'information.' It's ok to just tell them how logarithm is defined.

Sure, there is information. I'm saying it's confusing and unnecessary to introduce more big ideas like information, when the topic is "entropy is difficult to understand" or "logarithms are difficult to understand."

I think you might enjoy cosma shalizi's paper "What is a macrostate?" https://arxiv.org/abs/cond-mat/0303625
I've been trying to reconcile these perspectives, and I think it really is both. And they are both physically relevant.

Consider the subjective entropy perspective. If you know the exact microstate of a system, then you can in theory play the part of Maxwell's demon. You could have a little gate that you open only for fast particles, and using your knowledge of the microstate, you can predict exactly when they will arrive.

But consider the objective perspective. If you take this very same system and put it in thermal contact with another system, then an objective entropy perspective is the relevant one. Those systems will equilibrize and your subjective knowledge is irrelevant to that process.

I haven't fully wrapped my head around it yet, but I do think that acknowledging both is a step in the right direction at least.

> If you take this very same system and put it in thermal contact with another system, then an objective entropy perspective is the relevant one. Those systems will equilibrize and your subjective knowledge is irrelevant to that process.

The subjective view handles this scenario just fine, though, and makes more accurate predictions than the objective view.

For example, there are systems where some aspects of the original microstate survive thermal contact with another system. We use such systems to store data! I bet your hard drive is in thermal contact with its environment right now! It's very hard to reconcile this with an objective take on entropy.

And there are some systems that will rapidly be scrambled. The subjective perspective has no problem admitting that your knowledge of a system can become inaccurate and useless. Even without thermal contact, you'd need to perform a tremendous amount of (perhaps reversible) computation in order to make a functioning Maxwell's demon with your initial microstate conditions, because the microstate will evolve in time in a complicated way. The subjective view is still totally consistent with entropy of a system increasing over time!

I wrote two replies to this that I both deleted. Then I had a good long thunk, and here's what I came up with.

The temperature of an object can be determined through 1/T = dS / dE. What is this S? How can it exist if you know the system perfectly? And here is where the great insight comes. The thermometer! You apply a thermometer to a system you perturb it! The system may have started in one particular microstate, but the very nature of thermal contact involves random influence. Those random tiny influences from the thermometer allow the object (harddrive in our case) to enter a bunch of microstates with certain probabilities. And that's what S measures.

So our subjective knowledge does actually not matter. (Classically speaking) the system is in a particular microstate we may know it or not, and it still manages to have a temperature. That is due to the states it could hypothetically enter (but haven't yet)!

If we think back to the harddrive and it's contents: Very gently touching a harddrive with a thermometer while not scramble its contents. So we may say that microstates corresponding to different files than the ones you put there are actually not accessible. And they don't contribute to the entropy we used for the temperature.

No, it is subjective. We just only have such blunt instruments for practically measuring states, relative to the gargantuan amount of entropy in most real systems, that the subjective nature of entropy is easy to miss. But in a world where the frontiers of thermodynamics have moved from steam engines to lasers, computers, DNA, and black holes, the difference is increasingly obvious and important.

With steam engines, we got away with treating a volume of gas as having not only a few parameters that we knew and cared about, like mass, temperature and pressure, but we could further deceive ourselves into thinking that those were the only parameters that existed to describe the system. The only parameters that were knowable. But Boltzmann knew better.

Look at Boltzmann's formula, S = kB log W.

For any single particular system you describe to me, W will be 1, and so S will be 0. So it's only if you describe an ensemble of systems -- that is, if you describe a system vaguely, such that I am left to imagine the details -- that we have nonzero entropy. If you ask me to calculate the entropy of that "system", that macrostate, that ensemble, then sure, I'll end up with nonzero entropy. But if I ask you to keep transmitting more data about the scenario, then with each further description, you'll be narrowing the state space and thereby decreasing the entropy.

Look, since the entropy of a macrostate is nonzero, but the entropy of any single microstate which is consistent with that macrostate is zero, it's clear that entropy is not an intrinsic property of any real system. It's a property of how many other possible non-existent systems could be swapped out for the one in front of you, without you noticing the change.

If I swap out the air in your room for an equal volume of air at equal temperature and pressure, you probably won't notice.

If I swap out the hard drive in your laptop for an equal volume of hard drive at equal temperature and pressure, you probably will!

Maybe better to say that the universe does not appear to pick out a single coarse-graining or fine-graining procedure for practically any system.

For instance, following your Boltzmannian example, I think one would notice swapping 1 µm³ of the r/w head and 1 µm³ of the recording surface of a new, freshly powered on HDD more than one would notice substituting the entire HDD for a new one of the same model and turning that on. And here I am already using units of length (cf. "equal volume"), and we know neither units nor lengths are generally picked out by the universe.

Very few people know this but.

Information entropy and statistical mechanical entropy are two different things.

They share the same equation and the same name but they are two unrelated concepts. You have conflated the two. The person you are responding to is referring to statistical entropy.

Basically in this entire thread nobody, including you, is fully grasping the situation.

They are not at all unrelated. It is not easy to grasp, so I understand the confusion. https://en.m.wikipedia.org/wiki/Landauer%27s_principle

Fun rabbit hole would start with classic paper by jaynes

Many more recent examples relating bit erasure costs of computation. Some names to look up if interested include charlie Bennet,Dave wolpert, James crutchfield, Susanne still, for starters.

Edit -- a collection of ideas related to this problem and mixing in "complexity" can be found in SFI proceedings called "Complexity, entropy, and the physics of information"

I respectfully disagree. Perhaps you'd like to present more than a mere assertion to make your case. I did.

If it helps, here's a paper that explains my stance in more detail. https://bayes.wustl.edu/etj/articles/theory.1.pdf

If you think there is no relation between the different things called entropy apart from the name maybe you're not fully grasping the situation either.
U, the internal energy is objective.

The free energy F = U - TS is the maximum amount of work you can extract from the system. This depends on how much you know about the system. S does indeed depend on what you know about the system.

See the Gibbs Paradox for more information.

If two people disagree on the maximum amount of work that could be extracted from a given system (with both of them basing their figure on their own evaluation of S), are there any cases where it would it be impossible to empirically demonstrate that at least the proponent of the lower figure was wrong?

If only changes in S (and F) have measurable consequences, would that not merely mean that assigning an absolute value is an arbitrary choice, which would not mean the same as it being subjective (there could still be an objective conversion between one basis and another, as there is for kinetic energy in different inertial reference frames.)

In the Gibbs Paradox, there is no subjectivity in whether the gases being mixed are the same or different, and no subjectivity in what the change of entropy is in either case. The paradox is that it does not feel right that identity makes an objective difference between the two cases, but the empirically-demonstrable distinction between fermions and bosons shows that this intuition does not hold in general. I believe Von Neumann came up with a QM resolution of the paradox.

> are there any cases where it would it be impossible to empirically demonstrate that at least the proponent of the lower figure was wrong?

Isn't it more interesting to examine a situation where it would be possible to empirically demonstrate that the proponent of the lower figure was wrong?

In that case we could objectively say that one value of S does not yield F for that system (given that F is defined as a maximum), but this would not resolve the general question of subjectivity.
I think he is confusing the usage of entropy in physics and computer science. In computer science entropy is conditional probability and depends on what we know about a system.
As it does in physics!

"which parameters [thermodynamic variables] you know" ~ "which parameters [thermodynamic variables] you hold fixed"

(or know in average, like the energy for a system in a heat bath where the temperature is fixed)

https://bayes.wustl.edu/etj/articles/theory.1.pdf

http://nicf.net/articles/thermodynamics-statistical-mechanic...

You can observe the movement of molecules beyond macro properties like temperature.
Sure, at least in principle. And if you knew what every molecule was doing the entropy would vanish.
No this is completely and utterly wrong. Entropy is not a function of knowledge.

Two people with varying and different levels of knowledge of a system does not mean the system has two different entropy values. Even if I knew the exact position of all atoms in a cup of water, the temperature of that water does not change due to that knowledge.

Entropy does rely on what your picked configuration of macro states and microstates. Temperature is an arbitrary choice of macrostate.

> Even if I knew the exact position of all atoms in a cup of water, the temperature of that water does not change due to that knowledge.

It actually does! You would disagree with the other person about the temperature of that water. But I agree that this is admittedly not obvious at first.

No it does not. The thermometer does not change based off of my knowledge or opinion.
A thermometer doesn't measure temperature any better than a meterstick measures length. And we all know what Einstein had to say about the relativity of metersticks.

To paraphrase from the paper I linked in another reply to you, a thermometer is just a heat bath equipped with a pointer which reads its average energy, whose scale is calibrated to give the temperature T, defined by 1/T = dS/d<E>.

You can read the thermometer if you like, but if you know the exact microstate of the water to begin with, the thermometer reading will tell you much less than you already knew about the water. And precise knowledge of the water's microstate will (theoretically) allow you to extract much more work from that water than you would be able to with only the thermometer reading.

But entropy does not change with this knowledge.
> Even if I knew the exact position of all atoms in a cup of water, the temperature of that water does not change due to that knowledge.

If you knew the exact position of all atoms in a cup of water you wouldn't assign any temperature to it. Not a thermodynamic temperature at least.

The number of microstates does not change, even if you KNOW the the cup of water is in a specific microstate.

The boltzman equation is based on total accessible microstates.

"accessible" means something only given a set of constraints.

Like the temperature, if you keep the temperature of the water fixed. And the number of molecules if instead of a cup you have a close container to prevent it from evaporating. Then what you have is water at some temperature that you control. And you could have the water at a different temperature with exactly the same microstate.

Or imagine gas at some fixed temperature within a cylinder with one movable wall. If you knew the location of every molecule of the gas it wouldn't make sense to talk about its pressure - you could compress it (reducing the number of accessible microstates) without doing any work.

Edit: In summary, thermodynamics loses its meaning if you know the microstate and can act on that knowledge.

>it wouldn't make sense to talk about its pressure -

If I have a pressure gauge that reads the same thing regardless of my knowledge how is pressure meaningless? The tool that reads pressure gives me an accurate pressure number regardless of what I know or don't know. This number is correct.

Your argument is basically saying that the pressure gauge becomes wrong once you have more knowledge of the system. No it doesn't. The pressure gauge is still giving you a number defined as "pressure."

The gas in that cylinder is at a specific microstate within the macrostate defined as pressure.

If the cup of water is in a specific microstate at time t=0, and evolves over time according to deterministic equations of motion, how will it "access" other microstates that aren't along that specific trajectory in phase-space?
It can't. But you're not typically defining ONLY microstates along that trajectory as accessible. You are defining all accessible configurations according to your defined macrostate.

Knowledge of future microstates does not change what was already defined as a macrostate. The definition and the rules you used to construct a macrostate are independent to knowledge of the system.

If you gain knowledge of the system and you would like to change your macrostate, then be my guest. You can certainly do that, but "entropy" as we know it does not actually change with more knowledge unless you change the parameters according to your gained knowledge.

Think of it this way. The thermometer ALWAYS reads the same thing EVEN if you have 100% knowledge of the current microstate. You can build a new thermometer using some other mechanism to get a different reading and to take advantage of your new found knowledge... but you'd be changing the definition of your macrostate.

I think that works ok. But I think it's an unnecessarily tricky explanation. Entropy per macro-state decreases as we look at finer-grained macro-states. It feels simpler to associate the entropy of each macro-state with that macro-state, rather than assuming we know which macro-state the system is in, and then attributing the lower entropy to our knowledge of the macro-state.

I think it can probably be expressed either way. I just think the "knowledge" part is tricky and can be left out.

Can you elaborate on the difference between a "fine-grained" macrostate and a macrostate that is not fine-grained?

I think you will find it hard to separate the concept of a macrostate from the state of knowledge (or ignorance) of an individual subjective observer.

Sure. A fine-grained macro-state contains fewer micro-states. A coarse-grained macro-state contains more micro-states.

Say I flip 8 coins and I don't look at the results. A fine-grained macro state is TTTT TTTT. A coarser-grained macro state is TTTT xxxx. The one has 4 bits more entropy than the other. It works the same way in statistical mechanics. Call them spins.

We're just talking about some ensemble of micro states, and then we divide the ensemble up into macro-states. To do statistical mechanics at all, I think I have to define some macro-states according to which micro-states they contain. That doesn't mean I necessarily have any information about which macro-state the system is actually in.

> A fine-grained macro state is TTTT TTTT.

This macrostate seems fine enough to be a microstate, but sure. For the macrostates with at least one 'x' in it, that 'x' seems to be a placeholder for the concept of subjective ignorance.

> That doesn't mean I necessarily have any information about which macro-state the system is actually in.

But the entire purpose of the exercise of assigning microstates to macrostates is so that you can match up a description of some system to a microstate ensemble and calculate its entropy! Otherwise there's no point to arbitrarily labelling various groups of microstates.

To follow your example more practically, let's say you have an 8-spin system, whose net spin is zero. (You know because you've measured its overall magnetic moment or something). I've just described a system that is in one of the following possible microstates:

TTTT HHHH, TTTH HHHT, ..., HHHH TTTT

Now you can go ahead and define the macrostates as fine-grained as you want, where TTTT HHHH and HHHH TTTT are in different macrostates, but to calculate the entropy of this system, you're going to have to sum up all of those macrostates anyway to get the one that's consistent with the described system.

Good review of common ground. At this point hopefully the active folks in the discussion can see that we're all describing statistical mechanics exactly the same way.

What I'm saying is pedagogical. We need to define our macro-states. We don't need to go on and talk about our definitions being information or 'knowledge.' We could just use the definitions and calculate. We can talk about 'knowledge' but we don't need to.

The exception is when we actually have some information about what macro-state some system is really in. Obviously we then have to build that information into our model, and the entropy changes. What I'm saying is this: it's not necessary to mix that into our definition of entropy. That definition is not going to help folks who don't understand entropy, and it's unnecessary.

How is a macrostate TTTTxxxx different from having the information about the TTTT part and not about the xxxx part?

Talking bout the entropy of the macrostate TTTTxxxx makes sense only conditional on the TTTT information.

It's different because I can define a macro-state without any information about which macro-state any system is actually in. As I think you're also saying, the only information I need is information about how I've defined my own macro-states.

If we just define the macro-states, we're good to go. We don't need to talk about 'knowledge'. We can talk about 'knowledge', it's fine, but that lets in unnecessary woo.

Oh my god, this explanation is gold. Thank you. I'm going to save it and refer to it in the future.
Yeah the author is conflating low entropy with a low number of microstates, which is consistent with the thermodynamic assumption that maximal entropy means a uniform distribution of microstates, but is confusing.

The purest mathematical justification for why low entropy means a low number of microstates probably comes from the fact that (classical) physical systems are a dynamical systems that preserve the measure induced by the standard metric of the phase space. The measure theoretic definition of entropy then implies the entropy of a partition of the phase space (i.e. a set of macrostates) is indeed the average of the logarithm of the number of microstates.

So indeed if W is the number of microstates in the current macrostate then entropy = log(W) (on average).

And using the typical set you can show that the probability that the average of log(W) over n samples is within 'epsilon' of the 'exact' entropy goes to 1 as n goes to infinity. This is the mathematical justification for the second law of thermodynamics.

The trick is that all of this is true no matter how you partition phase-space. Though that does mean that what is and isn't a high entropy state depends on your perspective.

>The trick is that all of this is true no matter how you partition phase-space. Though that does mean that what is and isn't a high entropy state depends on your perspective.

That seems to be correct.

>Yeah the author is conflating low entropy with a low number of microstates

Since entropy is found by counting micro-states (for example your third paragraph), that should be ok. What am I missing?

The equivalence is an important theorem (important enough to be engraved on Boltzmann's gravestone), and if you're switching back and forth in an explanation of what entropy is then you're skipping over some important details that answer what it means for something to be a 'low entropy state'.
Here's a concrete example of entropy with just two macro-states, 'broken' and 'unbroken'. If it becomes unclear or unconvincing, can you point out where that happens? It's intended to be ELI5, clear enough to discuss coherently.

Question: Why is it when I drop a vase it smashes into a million pieces; however when I then drop the million pieces it does not form a vase?

Answer: Stop! Don't drop any more expensive vases. Start with these simpler systems that do repair themselves sometimes when you drop them.

Take a coin and align it so the 'heads' side faces up. 'Heads' means 'unbroken'. (The reason for that will become clearer as we do more experiments.) Now drop the coin on the floor. How often is the 'heads' side still facing up? Now if you drop it again, how often does it 'repair' itself so that the 'heads' side is up? (Really do this.)

Try the experiment with 2 coins. Align them all heads-up, drop them, then see if your pattern is 'broken'. ('Broken' means not-all-heads-up.) Drop the 'broken' coins again. How often do they 'repair' themselves? ('Repaired' means all heads-up.) ( Don't think about it! Don't solve for it! Do it! )

Try again with 5 coins. How often does a 5-coin system 'break' when you drop it? How often does a broken 5-coin system 'repair itself' when you drop it again?

How about 10 coins? How often does a broken pattern of 10 mixed heads/tails repair itself to all heads when you drop it again? Sometimes it does, but you'll have to be very lucky or patient to see it happen.

I think from here you can probably see (part of) the answer to your question about the vase. The word people use for this kind of thing is 'entropy'. With enough coins, the 'broken' state is much more probable than the 'repaired' state. The log of a probability is called 'entropy.'

https://www.quora.com/Why-is-it-when-I-drop-a-vase-it-smashe...

Have you looked at Huffman coding? I'd recommend the (free) book by David MacKay, it is secretly the "hackers guide to thermodynamics".

http://www.inference.org.uk/mackay/itila/book.html

This book is phenomenal and I highly second this recommendation.
> I don't know what Entropy is supposed to mean on the level of individual states/configurations. I don't understand what kind of macroscopic "averaging" function we may use to group up those states

I find it helpful to think of entropy as a property of not of the system, or any individual state (micro- or macro-), but as a property of the "compression" process that summarizes microstates with a coarser-grained macro-description.

Given a choice of compression, classical physics says a system will tend to spent most of their time in the most likely compressed state. Different choices of compressions can lead to different macro descriptions, with different "entropies" and different dynamics among their macroscopic variables.

In this light it's not meaningful to think of the entropy of individual states. You could think about the "identity" compression, but you would end up with a description that was exactly as complicated as the full micro-state time-evolution dynamics; you wouldn't end up with any smaller set of variables that could describe the equilibrium of the whole system (really this would not admit an "equilibrium" at all)

First, entropy is a macroscopic property, it makes no sense to talk about the entropy of a single particle. Second, entropy is not a fundamental property, it depends on what the observer cares about. Take the common example of a gas in the corner of a box, in that case we care about the density distribution in the box, a macroscopic property. To make this more concrete, one way to quantify the density distribution could be to tessellate the box with cubes of some size, find the ones with the lowest and highest density, and use the difference between those two densities as a measure of density variation.

Once we have decided that all we care about is the density variation in the box, we can go through all possible microscopic states and group them together by their density variation. Finally a low entropy macroscopic state is simply a macroscopic state - a certain density variation - for which there are only a few microscopic states that have the corresponding density variation. On the other hand a high entropy macroscopic state is a macroscopic state for which there are many microscopic states that have the corresponding density variation. You can also call the microscopic states low or high entropy but only with reference to the macroscopic property you use to group them, in themselves microscopic states are not low or high entropy.

If you observe a low entropy macroscopic state, then you know a lot about the microscopic state, after all there are only very few. If you observe a high entropy macroscopic state, then you know a lot less about the microscopic state, there are much more possibilities even though they are microscopically indistinguishable. And if there are no limiting constraints on how the microscopic states can evolve, if the evolution is essentially a random walk through all possible microscopic states, then the entropy of the system will increase with high probability as it is much more probable to randomly walk into one of the many microscopic states associated with a high entropy macroscopic state than to walk into one of the few microscopic states associated with a low entropy macroscopic state.

>there are much more possibilities even though they are microscopically indistinguishable.

You meant "macroscopically" ?

True, but to late to edit.
This is a very sensible confusion. The forms of macroscopic averaging functions which are useful and valid cannot be made up arbitrarily, but are determined by the microscopic physical laws of the system. There is a reason that the law of increase of entropy is the second law of classical thermodynamics, with conservation of energy being the first law. To state it explicitly: energy is a globally conserved quantity, which can be freely exchanged among the interacting microscopic parts of systems. So we can bring a test system (called a thermometer) into interaction with our system under study, (indirectly) observe the average energy per degree of freedom of the thermometer, and call that observation the temperature of the system under study. Similarly, it is a known physical phenomenon that a gas confined to a container will exert a steady average outward force per normal unit area on the walls of the container; we have ways to measure this force, and we call it pressure. And so on, and on: every useful macroscopic averaging function is a relatively stable, measurable quantity which is determined by the physics of the systems under study. If we discovered some new measurement technique tomorrow which enabled us to measure the "quintessence" of physical systems, and this measurement was stable and reproducible, and could be meaningfully aggregated from the microscopic parts of the system and measured on the macroscopic scale, our definition of entropy would change, to account for "quintessence".
It is also because “the system” is intrinsic to the notion: as you say, any configuration of bits is equally likely; this only takes into account the “system of the bits “. The moment your system is “the bits and their mean value” everything changes, as there are systems with a single possible configuration.

That is what happens when he starts the first example: “a system of particles INSIDE A VOLUME. The volume is what makes the entropy larger or smaller. The particles in a different volume (or just by themselves) have a different entropy.

Not a physicist either, and I don't claim to understand entropy that well either but maybe it would help to consider that entropy may not be a universal variable of systems in the universe.

I think you should rather consider it as a mathematical construct that applies to some systems where the microscopic quantities are well defined, and where the 'averaging' that we can observe is also well defined. So if you look at thermodynamics, entropy is well defined, but you may be totally right, that what we call "microscopic states" in a gas can be broken down further in elementary particles, that may or may not behave in quantic ways, and what not, and counting the micro-states considering the elementary particles is a whole different game.

But it doesn't really matter. What matters is that at the scale we're at and with the microscopic/macroscopic relation that's defined, entropy works. The calculations that give some numbers to entropy show that it looks like entropy cannot decrease. They call it a universal principle of thermodynamics, because there is nothing (to my understanding), that explains it microscopically.

And it works for a variety of situation in physics, such that it seems that it's a universal property of nature. But it's mostly mathematical. It seems to say that "given a system we know everything about, there is no way to go to a system that has some unknown things to us".

Anyways. I mostly wrote this to see if I could articulate it to myself, hopefully it helps you as well.

Thanks for writing that. From the perspective you’ve articulated I sometimes wonder whether the idea of the heat death of the universe is a matter of perspective, it only applies to the matter and properties of the universe that we consider significant, are we living within the heat deaths of past forms of the universe in which physical interactions we have overlooked dominated?
>It seems to say that "given a system we know everything about, there is no way to go to a system that has some unknown things to us".

That's backwards: information is the negative of entropy. The 2nd law says that entropy never decreases, so information never increases (it can only be preserved or lost).

>I don't know what Entropy is supposed to mean on the level of individual states/configurations.

The entropy is a property of a probability distribution, not of a state. Entropy is defined as H = -sum(p_i log(p_i)). A 'state' implicitly defines a probability distribution: uniform probability over all the microstates compatible with the state description.[0] In the case of a microstate, the entropy of the probability distribution over microstates consistent with that state is zero - there's only one compatible state, so p_i = 0 for all other states, and log(p_i) = 0 for the compatible state. In the case of a macrostate, the entropy of the probability distribution over microstates consistent with the macrostate works out to -sum((1/N) log(1/N)) = log(N), where N is the number of consistent microstates. That's the Boltzmann entropy.

Sometimes people will write about the entropy of a 'state' in such a way that it sounds like they're talking about the entropy of a microstate -- but what they're probably talking about is "the entropy of the macrostate that this microstate belongs to." It's sloppy to talk like that, because "the" corresponding macrostate isn't unique. There are many sets of macrostates that could contain a microstate, depending on what properties of the microstates one considers 'macro.'

(Ex: 10100101 is a member of both "symmetric bit strings of length 8" and "bit strings of length 8 that average to 1/2". The entropy of "symmetric bit strings of length 8" is 4 bits, whereas the entropy of "bit strings of length 8 that average to 1/2" is ~6.1 bits. And of course, the entropy of "the bit string of length 8 that is exactly 10100101" is zero.)

For information theoretic entropy (I don’t know anything about thermodynamics):

The first thing you describe, that if you draw many symbols from your source distribution at random then you see a distribution of symbols that is equal to the probability distribution of the source, is called the asymptotic equipartition principle.

I think your confusion comes from two things. First, conflating bits and symbols and second assuming symbols are equiprobable.

Take English text where our symbols could be letters of the alphabet. These are not equiprobable, if you select a letter from a book at random you get a different distribution than 1/26. If you took as your symbols the individual bits that would encode those characters in ascii you would get something closer to equal probable symbols. Another choice for symbols would be the words in the book.

I suppose the problem is because entropy is a proxy for the number of states of an abstract configuration space which has the same observed quantity as the concrete object that you take to measure its entropy. So, for example, if you know that your object with mass M and temperature T then to measure its entropy you take all the posible states for an abstract object with mass M and temperature T and the logarithm of that number of states is the definition of the entropy of one object that has mass M and temperature T. So the more you know about the the concrete object the less number of posible states for the abstract model and so the entropy is not a property of the concrete object rather is a property of an abstract model with same fixed global properties.
>There could be more than one possibility

This seems to connect with the idea behind Chaitin's incompleteness theorem. Making specific statements about the reducible complexity of something is not always possible.

>Think of a series of random bits that can be either 0 or 1 with equal probability. How likely is it that they are all 0 or all 1? Not very likely. There is exactly one configuration. How likely is it that they have a specific configuration of 0 and 1? Equally likely.

Well, there are only 2 states with all 1 or all 0.

But there are 2^N states of mixed 1 and 0.

Even if you treat the sets of bits as opaque items, and pick one from a bucket, I'd expect getting one of the 2^N - 2 configurations to be a far more likely outcome than one of the 2 remaining.

In fact, we could bet on it...

But there's only one state that is 10010001111110101000.
Sure, but that's irrelevant.

There are billions that are similar, and only one that's all 0.

"similar" is in the eye of the observer! This fact should be especially clear in the context of bit strings. If 10010001111110101000 is my login password, don't be surprised if other permutations fail to grant you access, even if you have the correct number of 1's and 0's.
>"similar" is in the eye of the observer!

Nope, also similar in algorithmic complexity theory (e.g. counting compressibility).

That's a very good, and deep, point, and I agree that there is something important there. However, algorithmic complexity is still only defined relative to an arbitrary reference computer. If my reference computer happens to, in hardware, XOR its inputs and outputs with the bitstring 10010001111110101000, then (in terms of bits that end up represented on the drive), that will be the one that has the lowest algorithmic complexity, although the algorithm might think that it's outputting all 0's.
In computer science a high entropy means a high informational value. So any information that is not the expected value has a high content of information and therefore a high entropy.

In thermodynamics the case is a bit to the opposite as a high entropy means a low state of energy and therefore less internal processes within a system or none at all.

Another thing that contributes to the confusion which you have noted but not fully realized is that there are two different concepts that use the same equation and the same word: "entropy".

Information entropy and statistical entropy are two different things.

A specific string of bits has zero entropy. It may be a sample from a distribution which has some entropy.

Unless you are selecting a random single bit from that string, in which the entropy of that selection process is -p1log p1 - p0log p0.

High entropy = less predictable system. Low entropy = highly predictable system.
To stay within your bits analogy, I imagine an increase in entropy would be the equivalent of each bit becoming base-3, base-4, and so on, hence increasing the number of possible states (and reducing your ability to predict them).