Since we are to a great extent ignorant of the system under consideration, our description is inherently probabilistic. Our first task is to find an expression for the probability that a system is in a particular microstate. Once we have these numbers, we are in a position to say what the system does on average. To see how we might go about assigning these probabilities, consider the following algorithm.

Imagine you have before you a wall of monitors labelled A to Z. On each screen is a picture of a system in one of its microstates. The probability that a screen shows a particular state is equal to the probability that the system is in that state. The nature of the system in question is not at this moment important, but let’s suppose without any loss of generality that there are five states available to it.

You walk along the wall and keep a running total of how many screens display the first state, how many show the second state, and so on. After checking every screen, you’re left with a set of occupation numbers , where is the number of screens displaying the state .

Now we ask the question: how many ways could you attain this set of occupation numbers? The diagram below might help you to picture this combinatorial problem.

We can permute the 26 monitors any way we like. For example, if the states of monitors T and R were swapped, we would attain exactly the same occupation numbers. This gives a total of 26! ways of rearranging the 26 monitors. But we must be careful not to over-count, since we could not tell the difference if, say, screens T and G were swapped, or any pair of monitors in the same state. This means we have over-counted by a factor of 4! through permuting the monitors of state 1, 6! through permuting those of state 2, and so on. So the number of ways of achieving this particular set of occupation numbers is

This is a relatively large number: compare this with the case where all 26 screens display the same state. Then we might have . For this configuation, ; the configuation is unique. So the probability of seeing every single screen display the same state is *extremely *unlikely, as we expect. This simple example shows that some sets of occupation numbers are more probable than others (though our intuition could have told us this anyway).

More generally, given screens, each of which can display any one of microstates, the number of ways of achieving the set of occupation numbers is

Why is this of significance?

There exists a set of occupation numbers which maximises ; this set can be observed in the greatest number of different ways, and is therefore most likely to be seen on repeating the algorithm with the screens over and over again. The occupation numbers are determined exclusively by the probabilities through the equation . Then is equivalent to the number of ways it is possible to observe a particular set of probabilities . *It is therefore only reasonable to assign to the set of probabilities * *values which maximise *. This is because that particular set of probabilities occurs with the highest frequency and is therefore *most likely* to be seen. Strictly speaking, we are taking the limit in which , since only then do the probabilities defined by become exact.

Maximising in its present form is difficult. We’re going to carry out a little manipulation, one part of which you may have to take on trust for the time being. First, we take the natural logarithm of both sides:

This follows directly from the properties of logarithms. Next, we use *Stirling’s approximation*, which says that

in the limit that is a very large number. So

The second and fourth terms cancel, since the sum over all occupation numbers must be equal to the total number of screens . We now divide through by :

The first and second terms cancel, leaving

It is this quantity that we call the Gibbs entropy:

How does this help? It is important to note that the natural logarithm increases monotonically with its argument – when increases, so too does its logarithm, though not at the same rate. Since is a strictly positive scale factor, we can guarantee that as increases, increases. So to maximise is to maximise . Furthermore, is a function written explicitly in terms of the probabilities , making it easy to find the values of for which is maximised. To reiterate: *the fairest assignment of the * *is that which maximises *.

As yet we have said absolutely nothing about the physics of the system we are describing – it is therefore natural to ask, when will the microphysical structure of the system enter our equations? The only answer can be: when we maximise the Gibbs entropy. We’ll learn how this happens in a following post. But before that, its necessary to ensure we’re familiar a particular mathematical technique.

That the Gibbs entropy of a system tends to increase to a maximum is an empirically validated postulate, often called *the* *second law of thermodynamics*. The artificial game above gives an idea of why the Gibbs entropy has the form it does, but really there is no correct way of ‘deriving’ .

The Gibbs entropy (or the *Shannon *entropy) has its foundations in information theory. The Shannon entropy is a carefully constructed function of a set of probabilities that satisfies a number of constraints. These constraints are chosen such that entropy measures the *uncertainty* associated with a probability distribution. The association of entropy with uncertainty, and how this uncertainty manifests itself in real systems, is explored in a later post.

Pingback: Statistical Mechanics – Introduction | Conversation of Momentum