# Statistical Mechanics – Defining the Gibbs Entropy

Since we are to a great extent ignorant of the system under consideration, our description is inherently probabilistic. Our first task is to find an expression for the probability that a system is in a particular microstate. Once we have these numbers, we are in a position to say what the system does on average. To see how we might go about assigning these probabilities, consider the following algorithm.

Imagine you have before you a wall of monitors labelled A to Z. On each screen is a picture of a system in one of its microstates. The probability that a screen shows a particular state $\alpha$ is equal to the probability $p_\alpha$ that the system is in that state. The nature of the system in question is not at this moment important, but let’s suppose without any loss of generality that there are five states available to it.

You walk along the wall and keep a running total of how many screens display the first state, how many show the second state, and so on. After checking every screen, you’re left with a set of occupation numbers $\{N_\alpha\}$, where $N_\alpha$ is the number of screens displaying the state $\alpha$.

Now we ask the question: how many ways could you attain this set of occupation numbers? The diagram below might help you to picture this combinatorial problem.

We can permute the 26 monitors any way we like. For example, if the states of monitors T and R were swapped, we would attain exactly the same occupation numbers. This gives a total of 26! ways of rearranging the 26 monitors. But we must be careful not to over-count, since we could not tell the difference if, say, screens T and G were swapped, or any pair of monitors in the same state. This means we have over-counted by a factor of 4! through permuting the monitors of state 1, 6! through permuting those of state 2, and so on. So the number of ways of achieving this particular set of occupation numbers is

$\displaystyle W=\frac{26!}{4!\ 6!\ 3!\ 8!\ 5!} \approx 10^{15}$

This is a relatively large number: compare this with the case where all 26 screens display the same state. Then we might have $\{N_\alpha\}=\{0,0,26,0,0\}$. For this configuation, $W=1\$; the configuation is unique. So the probability of seeing every single screen display the same state is extremely unlikely, as we expect. This simple example shows that some sets of occupation numbers are more probable than others (though our intuition could have told us this anyway).

More generally, given $N$ screens, each of which can display any one of $\Omega$ microstates, the number of ways of achieving the set of occupation numbers $\{N_1,N_2,...,N_\Omega\}$ is

$\displaystyle W=\frac{N!}{N_1!\ N_2!\ ...\ N_\Omega!}$

Why is this of significance?

There exists a set of occupation numbers $\{N_\alpha\}$ which maximises $W$; this set can be observed in the greatest number of different ways, and is therefore most likely to be seen on repeating the algorithm with the screens over and over again. The occupation numbers are determined exclusively by the probabilities $\{p_\alpha\}$ through the equation $N_\alpha=p_\alpha N$. Then $W$ is equivalent to the number of ways it is possible to observe a particular set of probabilities $\{p_\alpha\}$It is therefore only reasonable to assign to the set of probabilities $\{p_\alpha\}$ values which maximise $W$. This is because that particular set of probabilities occurs with the highest frequency and is therefore most likely to be seen. Strictly speaking, we are taking the limit in which $N\to\infty$, since only then do the probabilities defined by $p_\alpha=\frac{N_\alpha}{N}$ become exact.

Maximising $W$ in its present form is difficult. We’re going to carry out a little manipulation, one part of which you may have to take on trust for the time being. First, we take the natural logarithm of both sides:

$\displaystyle \ln W=\ln N! - \sum_{\alpha=1}^\Omega\ln N_\alpha!$

This follows directly from the properties of logarithms. Next, we use Stirling’s approximation, which says that

$\ln N!\approx N\ln N - N$

in the limit that $N$ is a very large number. So

$\displaystyle\ln W=N\ln N - N -\sum_{\alpha=1}^\Omega N_\alpha\ln N_\alpha+\sum_{\alpha=1}^\Omega N_\alpha$

The second and fourth terms cancel, since the sum over all occupation numbers must be equal to the total number of screens $N$. We now divide through by $N$:

$\displaystyle \frac{\ln W}{N}=\ln N-\sum_{\alpha=1}^\Omega\frac{N_\alpha}{N}\ln N_\alpha$

$\displaystyle \frac{\ln W}{N}=\ln N-\sum_{\alpha=1}^\Omega \frac{N_\alpha}{N} \ln \Big( N\frac{N_\alpha}{N}\Big)$

$\displaystyle \frac{\ln W}{N}=\ln N-\frac{1}{N}\ln N \sum_{\alpha=1}^\Omega N_\alpha-\sum_{\alpha=1}^\Omega \frac{N_\alpha}{N}\ln\Big(\frac{N_\alpha}{N}\Big)$

The first and second terms cancel, leaving

$\displaystyle \frac{\ln W}{N} = - \sum_{\alpha=1}^\Omega \frac{N_\alpha}{N}\ln \Big(\frac{N_\alpha}{N}\Big)$

$\displaystyle \frac{\ln W}{N}=-\sum_{\alpha=1}^\Omega p_\alpha\ln p_\alpha$

It is this quantity that we call the Gibbs entropy:

$\displaystyle \frac{\ln W}{N} = S_G$

$\displaystyle S_G=-\sum_{\alpha} p_\alpha\ln p_\alpha$

How does this help? It is important to note that the natural logarithm increases monotonically with its argument – when $W$ increases, so too does its logarithm, though not at the same rate. Since $N$ is a strictly positive scale factor, we can guarantee that as $W$ increases, $S_G$ increases. So to maximise $S_G$ is to maximise $W$. Furthermore, $S_G$ is a function written explicitly in terms of the probabilities $\{p_\alpha\}$, making it easy to find the values of $\{p_\alpha\}$ for which $W$ is maximised. To reiterate: the fairest assignment of the $\{p_\alpha\}$ is that which maximises $S_G$.

As yet we have said absolutely nothing about the physics of the system we are describing – it is therefore natural to ask, when will the microphysical structure of the system enter our equations? The only answer can be: when we maximise the Gibbs entropy. We’ll learn how this happens in a following post. But before that, its necessary to ensure we’re familiar a particular mathematical technique.

$\$

That the Gibbs entropy of a system tends to increase to a maximum is an empirically validated postulate, often called the second law of thermodynamics. The artificial game above gives an idea of why the Gibbs entropy has the form it does, but really there is no correct way of ‘deriving’ $S_G$.

The Gibbs entropy (or the Shannon entropy) has its foundations in information theory. The Shannon entropy is a carefully constructed function of a set of probabilities that satisfies a number of constraints. These constraints are chosen such that entropy measures the uncertainty associated with a probability distribution. The association of entropy with uncertainty, and how this uncertainty manifests itself in real systems, is explored in a later post.