Some rough notes about information theory, thermodynamics and quantum mechanics.
Fluctuation Relations:
Great review by Jarzinsky
Expectation values of work distribution with irreversibility, valid for equilibrium/non-equilibrium and in small systems.
From Jensen's inequality:
and for probability of statistical violation of 2nd law:
i.e. "What's the probability the 2nd law will be violated by at least ?" Exponentially decaying tail in thermodynamically forbidden region. Interestingly, no upper bound other than , violation with
50% possible. See e.g. Single electron transistor with 65% probability of decreasing (important is only expectation value!).
Incorporating information from measurement with feedback control by external controller (e.g. Maxwell's Demon):
is the stochastic mutual information between probability distribution of feedback system before and after measurement/ extracting information from system.
so ensemble average is the usual mutual information between system and memory:
Hypothesis test for forward vs. reverse in time (direction defined by ), given exchanged work. The likelihood is:
Easy to tell when exchange of work large, at equilibrium impossible to tell.
Symmetry between forward and backward work distributions, forces crossing at .
- Fluctuation–Dissipation theorem
Relates the imaginary (dissipative) part of the linear response of (A) to its symmetrized fluctuation spectrum at equilibrium. Fluctuations and dissipation are connected by the factor .
From a detalied balance point of view in a two-state system, this can be expressed as:
This is useful in showing thermal properties of Rindler, Unruh, Hawking.
- Fluctuation theorem
Quantifies the exponential asymmetry between positive and negative entropy production fluctuations over a time interval .
- Generalized Fluctuation theorem
Effective temperature becomes
Defines a frequency-dependent “effective temperature” from the ratio of fluctuation power to dissipation.
Here, the two-point correlators are:
Symmetrized noise spectrum of the observable .
Imaginary (dissipative) part of the linear susceptibility of .
Commutator.
Anticommutator.
- Fluctuation–Dispersion and Dissipation-Dispersion relations
The imaginary part of the susceptibility is proportional to the fluctuation spectrum due to the fluctuation–dissipation theorem:
The expression for the full response function obey the Kramers–Kronig relations:
Substituting the fluctuation–dissipation theorem into the first of these gives a fluctuation–dispersion relation:
Using Kramers–Kronig and the fluctuation-dissipation theorem together, all parts of the response can be reconstructed given one of them:
Equilibrium fluctuations Dissipation Dispersion
Similarly for a dissipation-dispersion relation:
is the Cauchy principal value of the integral, which converges for analytical functions (analyticity follows from causality):
- Applications to mechanics, optics, acoustics, electronics
Fluctuation-dissipation (and its corollaries fluctuation-dispersion and dissipation-dispersion) hold in any system that is linear, time‐translationally invariant, causal within Linear response theory (Kubo formalism), both in equilibrium and out-of-equilibrium with the generalized :
a. Overdamped Brownian particles (Smoluchowski dynamics)
The mobility and velocity‐autocorrelation spectrum satisfy
and obey Kramers–Kronig.
b. Electromagnetic response of linear media
The complex permittivity
obeys Kramers–Kronig, and at thermal equilibrium
is tied to the equilibrium current‐fluctuation spectrum via the fluctuation-dissipation theorem.
Interestingly enough, we can get an estimate of the amplitude and phase noise a beam of light picks up when travelling through a material of known index of refraction or absorption spectrum.
The optical power is . The medium absorbs with coefficient over length , so the excess intensity fluctuations from thermal polarization noise follow from from the fluctuation–dissipation theorem.
Shot noise is
Therefore
In terms of SNR this becomes:
Thermal refractive‐index fluctuations over length also introduce phase noise. With wavevector , the fluctuation-dissipation theorem yields
Shot‐noise‐limited phase diffusion for a coherent beam of flux is
Thus
Both of these effects are usually minuscule fractions of the shot noise floor, but interesting to know about.
c. Acoustic (sound) waves in fluids or solids
The complex bulk modulus or sound‐attenuation coefficient has real and imaginary parts related by Kramers–Kronig and the attenuation spectrum is set by equilibrium pressure‐fluctuations.
d. Electronic transport in conductors
The complex conductivity
obeys Kramers–Kronig, and
is given by current‐noise via the Johnson–Nyquist relation.
Usually, the world split into system and environment, where .
However, at small scales interactions between system and environment become non-negligible, then It becomes necessary to include the solvation (mean-force) potential . In this case, the Jarzinsky equality becomes:
Here, the free-energy change of the Hamiltonian of mean force.
- Bochkov–Kuzovlev equality
This shows that the exponential average of the “exclusive” work done by an external force (with no change in potential) equals unity.
- Evans–Searles fluctuation theorem
It states that over a finite time (t), the probability of observing entropy production vs. is exponentially biased by .
- Gallavotti–Cohen steady-state fluctuation theorem
In a nonequilibrium steady state, the long-time scaled log-ratio of entropy fluctuations equals the production itself.
- Kurchan fluctuation theorem
An extension of transient FTs to thermostatted steady states, showing the same exponential symmetry for entropy production.
- Lebowitz–Spohn fluctuation theorem
The ratio of the probability of a trajectory to its time-reversal is the exponential of its total entropy production.
- Hummer–Szabo relation
Enables reconstruction of equilibrium free-energy profiles from ensembles of non-equilibrium work measurements.
- Seifert integral fluctuation theorem
States that for any stochastic process, the exponential average of the total entropy production equals unity.
Complexity vs. Entropy
Take some system with a phase transition like a 2d Ising model. As temperature increases, entropy monotonically increases. However, in a way both the T=0 and T=inf limits are equally simple: the system is homogeneous in both cases and every cell is like every other cell.
However in between, there is nontrivial structure and more "information" contained in the lattice, which is maximized at the critical point. It seems like this quantity might in general be proportional to .
Some keywords to pick this up later:
- Statistical Complexity (Crutchfield, Shalizi, et al.)
- Excess Entropy (This is the deviation in entropy from an ideal gas, i.e. a fully decoupled system. Any correlations should show up here!)
- There's some notion of susceptibility here, i.e. the higher the heat capacitance the more energy is needed to displace the system from its current state. At the critical point the system seems to be able to "withstand" the most energy to produce a given change in T and structure? Not sure how to express this more rigorously. Something like (integral over original and perturbed density of states) is maximized there? Maybe some kind of deviation metric is useful here, say Ruppeiner geometry? Is this the same as computational (ir)reducability?
Coordinate remapping
Take a Smoluchowski equation with some and stationary distribution and apply a nonlinear map such that is Gaussian. After re-expressing the entropy in the new set of coordinates
then defining
the new potential now takes the form
a potential consistent with a Gaussian distribution.
The Jacobian term is absorbed into the coordinates and entropy and energy have been repartitioned. This transformation maintains the free energy and partition function (which something like counts the number of accessible states). Explicitly,
Functional optimization
Smoluchowski dynamics are a gradient flow of the free-energy functional so and its unique stationary solution is the minimizer of under .
In the case of the Schrodinger equation, there is no dissipation (evolution is unitary). However, applying a Wick-rotation into imaginary time the Schrodinger equation turns into a diffusion equation:
which is the gradient flow of
Through the Wick rotation the higher eigenmodes of to decay in the same way as those in diffusion:
- Smoluchowski: .
- Imaginary-time Schrödinger: . In each case only the lowest eigenmode = stationary distribution survives as , driving the system to the minimum of the corresponding functional.
Schrodinger kinetic energy as diffusion term
The kinetic energy operator in the Schrodinger equation penalizes curvature/gradients, driving a diffusion like spread. This can be formalized by applying the Madelung transform with . For real (as is the case in a stationary ground state distribution among others), the kinetic term becomes the Weizsacker functional
which is precisely proportional to the Fisher information of . Thus the ground state can be seen as minimizing
The Fisher information is: , so
Therefore the Weizsacker kinetic energy for is
so up to a constant factor the kinetic term is the Fisher information.
Some properties and intuition on Fisher information:
- Score function sensitivity: measures how sensitive log‐likelihood is to shifts; averaging its square under is total information about location.
- Cramér–Rao bound: .
- Geometric curvature: penalizes steep features in .
- Infinitesimal KL shift: . Similar to previous question on entropy susceptibility at critical point?
Coordinate transform in Schrodinger equation
Analogous to Smoluchowski equation if we remap and , the total energy expectation
is invariant, but you generate an extra Jacobian term in the potential . The invariants here are energy (Smoluchowski: free energy) and normalization (Smoluchowski: partition function). This is interesting because this means there is nothing special about quantum mechanics in , coordinates. As long as the Stone–von Neumann guarantees that any linear canonical change leads to a unitarily equivalent representation. Specifically, any pair of self-adjoint operators on a separable Hilbert space is canonical if (or equivalently their exponentials satisfy the Weyl relations).
In that case Stone–von Neumann tells you there is up to unitary equivalence exactly one irreducible representation of those relations, and you get equivalent wave-functions in -space or in -space.
I initially thought any set of variables with non-zero commutator might do the trick as long as it forms a basis for the symplectic structure of phase space, but it turns out the the are more stringent conditions in QM. The commutator must be a multiple of in order to generate the canonical Heisenberg algebra. I don't understand what that means, maybe look into it later.
Nonlinear canonical transformations are more complicated but analogous to the Smoluchowski case: A non-linear coordinate transformation introduces Jacobian terms that get put in one of two places. Assuming a complete basis set of wavefunctions, the coordinate transformation can be expressed as a unitary transform of the wavefunction: with a unitary transformation between the new and old forms of the wavefunction in the new and old coordinates. Due to unitarity this does not change the normalization of the wavefunction. Now the question is whether to include this change of coordinates in the operators and keep the old forms of the wavefunction, or keep the old form of the operators and include this change in the wavefunction. This difference seems to map on nicely onto the QM Schrodinger vs. Heisenberg picture, as a more general, time-independent version of it (since the regular formulation talks about changes over time):
Schrödinger‐picture coordinate change:
-
Move state into the new coordinates:
-
Carry your Hamiltonian and all other operators along via
-
Then expectation values are .
Heisenberg‐picture coordinate change
-
Keep wavefunction alone in the original Hilbert space,
-
transform every operator into changed coordinates:
-
States live in the old –space, but the operators hold all the Jacobian/ordering corrections.
-
Expectation values stay invariant as
This may be a good analogy in general for these coordinate transforms?
As discussed in entropic gravity, one can find a set of coordinates in which the distribution () becomes uniform, or any other shape. E.g. using the cumulative-distribution map
The external is absorbed into a Jacobian quantum potential , and you can choose to view all of as coming from the Fisher (kinetic) term in the –frame.
However, since the energy expectation values are preserved (up to a common shift) the higher order modes do not in general transform to match the higher order modes of the new system. For instance, transforming some system into one where the
Entropy as Information Flow
There seems to be a common theme that in systems with dissipation some kind of functional is optimized for in the ground/stationary state. This is not true for unitary evolution, hence the Wick-rotation for the Schrodinger equation is necessary.
I think a good way to think about this is that from the perspecitve of a linear Markovian process (e.g. random walk) there is some kernel that reallocates state density at each time step. The question is what the fix point of repeated convolution of this kernel with itself is.
For most kernels the fix point of auto-convolution is a Gaussian and this is probably enough to qualify as dissipation.
If the kernel is a delta function or a permutation this is not true, the density must be split to at least two (on average, 50% chance of staying fully, 50% of transferring fully might also be ok) states. More rigurously what's needed is a irreducible (for long times all parts of the state space communicate), aperiodic, stochastic convolution kernel (or the continuous‐time master equation) has a unique stationary that maximizes entropy (or minimizes free energy), with the KL divergence as a Lyapunov functional.
There's also a neat connection between entropy production rate and entropy. For a Smoluchowski equation:
the relative entropy
satisfies
so . Here
is Fisher information.
Thus the H-theorem is the statement that the KL divergence decays monotonically, with rate proportional to Fisher info.
looks like the KL divergence of the current distribution and the equilibrium and its decay rate is proportional to Fisher information.
What's also cool is that given the current non-equilibrium distribution one can calculate what minimum entropy production rate some other process must expend in order to maintain the distribution, based on Harada-Sasa link. In order to maintain the distribution requires at least
as the entropy production rate to offset it (per particle if P is a probability, if particle number then scales with and ).
Ideal gas vs. Smoluchowski
While both diffuse there's an interesting difference between how these systems behave. Above I've written about repartitioning of energy and entropy in the Smoluchowski equation. By changing coordinates the distribution can be flattened, so the system is potential-free and the energy expectation value is 0. In an ideal gas in a box, the distribution is also homogeneous. However, the ideal gas still has a non-zero internal energy expectation value, more exactly , or more generally if there are additional internal degrees of freedom due to equipartition. So why does one of them have a "residual" internal energy, while the other doesn't?
What is going on here is that in Smolukowski we kill of any kinetic energy contributions. The Smoluchowski equation is an overdamped limit where momentum immediately dissipates and is position or potential independent. On the other hand in a system that behaves like an ideal gas the mean free path is very long (ballistic limit), so the distribution spreads over both x and p in phase space.
The Smoluchowski equation the overdamped limit of the more general full Kramers that supports full phase‐space dynamics:
where momentum expoential decays with damping rate . Here both kinetic energy and potential energy appear, and the stationary distribution is a Maxwell–Boltzmann distribution over all of phase space with potential:
The corresponding free energy functional is
and contains a Gibbs-Shannon entropy integrated over all of phase space.
How exactly the system behaves next depends on the ensemble.
In a isolated system with fixed total energy or microcanonical ensemble, the local kinetic energy can vary as a function of . This is actually a cool example where temperature gradients can develop spontaneously (albeit transiently, in the end they even out) without breaking the second law due to the entropy increase from occupying additional space. In this case the marginal distributions do not factorize in general. For large and far away from phase transitions (+ some additional qualifications) the final stationary distribution approaches:
At that point the momentum distribution is independent of position and only depends on the temperature of the system.
In a single, uniform heat bath at temperature instantaneously rethermalizes the kinetic degrees of freedom, so the temperature is uniform and every part of the system rapidly () decays to the same Maxwell–Boltzmann distribution. The distribution generally factorizes because of this and is the same as the large limit of the microcanonical ensemble but without the qualifications:
The gradient flow for this diffusion in momentum space can be written as:
This means decays to its unique minimum, the Maxwell–Boltzmann equilibrium. The rate of dissipation of non-equilibrium modes
(and rate of entropy production) can again be expressed through the Fisher information. However, note that this is only the Fisher information
of the momentum part of the distribution. There is no term because there is no diffusion in the Kramers equation.
This fact appears strage to me, since an ideal gas clearly spreads out to a uniform distribution.
In fact, momentum carries you in via the advection . In the overdamped Smoluchowski limit :
where (Stokes-Einstein equation).
This expression has the explicit diffusion term.
In the underdamped case, there is a deterministic transport (imagine a beam of gas propagating in a vacuum):
This term acts to transport probability density from high to low and acts like an effective diffusion, but is not limited to it.