Notes on Information Theory

May 9, 2025

Some rough notes about information theory, thermodynamics and quantum mechanics.

Fluctuation Relations:

Great review by Jarzinsky

eβW=eβΔF\langle e^{-\beta W} \rangle = e^{-\beta \Delta F}

Expectation values of work distribution with irreversibility, valid for equilibrium/non-equilibrium and in small NN systems.

From Jensen's inequality:

WΔF \langle W \rangle \geq \Delta F

and for probability of statistical violation of 2nd law:

P(WΔFξ)eβξP(W \leq \Delta F - \xi) \leq e^{- \beta \xi}

i.e. "What's the probability the 2nd law will be violated by at least ξ\xi?" Exponentially decaying tail in thermodynamically forbidden region. Interestingly, no upper bound other than P(W<ΔF)<1P(W \lt \Delta F) \lt 1, violation with

50% possible. See e.g. Single electron transistor with 65% probability of decreasing SS (important is only expectation value!).

Incorporating information from measurement with feedback control by external controller (e.g. Maxwell's Demon):

eβWI=eΔF \langle e^{- \beta W - I} \rangle = e^{- \Delta F}

II is the stochastic mutual information between probability distribution of feedback system before and after measurement/ extracting information from system.

I(x,m)=ln ⁣P(mx)P(m),I(x,m) = \ln\!\frac{P(m \mid x)}{P(m)},

so ensemble average I\langle I \rangle is the usual mutual information between system and memory:

I=x,mP(x,m)ln ⁣P(mx)P(m)\langle I \rangle = \sum_{x,m} P(x,m)\,\ln\!\frac{P(m \mid x)}{P(m)}

Hypothesis test for forward vs. reverse in time (direction defined by dSdt\frac{dS}{dt}), given exchanged work. The likelihood is:

L(ForwardW)=11+eβ(WΔF)L(Forward | W) = \frac{1}{1 + e^{-\beta (W - \Delta F)}}

Easy to tell when exchange of work large, at equilibrium impossible to tell.

PF(W)PR(W)=eβ(WΔF)\frac{P_F(W)}{P_R(-W)} = e^{\beta \,(W - \Delta F)}

Symmetry between forward and backward work distributions, forces crossing at W=ΔFW = \Delta F.

  • Fluctuation–Dissipation theorem

χA(ω)=tanh ⁣(βω2)SA(ω)\chi_A''(\omega) = \tanh\!\bigl(\tfrac{\beta\omega}{2}\bigr)\,S_A(\omega)

Relates the imaginary (dissipative) part of the linear response of (A) to its symmetrized fluctuation spectrum at equilibrium. Fluctuations and dissipation are connected by the factor tanh(βω/2)\tanh(\beta\omega/2).

From a detalied balance point of view in a two-state system, this can be expressed as:

Γ+ΓΓΓ=1+eβω1eβω=tanh ⁣(βω2)\frac{\Gamma_{\downarrow} + \Gamma_{\uparrow}}{\Gamma_{\downarrow} - \Gamma_{\uparrow}} = \frac{1 + e^{-\beta\omega}}{1 - e^{-\beta\omega}} = \tanh\!\bigl(\tfrac{\beta\omega}{2}\bigr)

This is useful in showing thermal properties of Rindler, Unruh, Hawking.

  • Fluctuation theorem

P(Σt=A)P(Σt=A)=eA\frac{P(\Sigma_t = A)}{P(\Sigma_t = -A)} = e^{A}

Quantifies the exponential asymmetry between positive and negative entropy production fluctuations over a time interval tt.

  • Generalized Fluctuation theorem

Effective temperature becomes T(ω)T(\omega)

Teff(ω)    ωSA(ω)2χA(ω)T_{\rm eff}(\omega) \;\equiv\; \frac{\omega\,S_A(\omega)}{2\,\chi_A''(\omega)}

Defines a frequency-dependent “effective temperature” from the ratio of fluctuation power to dissipation.

Here, the two-point correlators are:

SA(ω)=12 ⁣dt  eiωt{A(t),A(0)}  =  12 ⁣dt  eiωtA(t)A(0)+A(0)A(t)S_A(\omega) = \frac{1}{2}\int_{-\infty}^{\infty}\!dt\;e^{i\omega t}\,\bigl\langle\{A(t),A(0)\}\bigr\rangle \;=\;\frac{1}{2}\int_{-\infty}^{\infty}\!dt\;e^{i\omega t}\,\bigl\langle A(t)A(0)+A(0)A(t)\bigr\rangle

Symmetrized noise spectrum of the observable AA.

χA(ω)=12i ⁣dt  eiωt[A(t),A(0)]  =  12i ⁣dt  eiωtA(t)A(0)A(0)A(t)\chi_A''(\omega) = \frac{1}{2i}\int_{-\infty}^{\infty}\!dt\;e^{i\omega t}\,\bigl\langle[A(t),A(0)]\bigr\rangle \;=\;\frac{1}{2i}\int_{-\infty}^{\infty}\!dt\;e^{i\omega t}\,\bigl\langle A(t)A(0)-A(0)A(t)\bigr\rangle
Imaginary (dissipative) part of the linear susceptibility of AA.

[X,Y]=XYYX[X,Y] = XY - YX
Commutator.

{X,Y}=XY+YX\{X,Y\} = XY + YX
Anticommutator.

  • Fluctuation–Dispersion and Dissipation-Dispersion relations

The imaginary part of the susceptibility is proportional to the fluctuation spectrum due to the fluctuation–dissipation theorem:

χA(ω)=tanh ⁣(βω2)SA(ω)\chi_A''(\omega) = \tanh\!\bigl(\tfrac{\beta\omega}{2}\bigr)\,S_A(\omega)

The expression for the full response function obey the Kramers–Kronig relations:

χA(ω)=χA(ω)+iχA(ω)\chi_A(\omega) = \chi_{A'}(\omega) + i \, \chi_{A''}(\omega)

χA(ω)=1πP ⁣χA(ω)ωωdω,χA(ω)=1πP ⁣χA(ω)ωωdω.\chi_A'(\omega) = \frac{1}{\pi}\,\mathcal{P}\!\int_{-\infty}^{\infty} \frac{\chi_A''(\omega')}{\omega' - \omega}\,d\omega', \qquad \chi_A''(\omega) = -\frac{1}{\pi}\,\mathcal{P}\!\int_{-\infty}^{\infty} \frac{\chi_A'(\omega')}{\omega' - \omega}\,d\omega'.

Substituting the fluctuation–dissipation theorem into the first of these gives a fluctuation–dispersion relation:

χA(ω)=1πP ⁣tanh ⁣(βω2)SA(ω)ωωdω.\chi_A'(\omega) = \frac{1}{\pi}\,\mathcal{P}\!\int_{-\infty}^{\infty} \frac{\tanh\!\bigl(\tfrac{\beta\omega'}{2}\bigr)\,S_A(\omega')}{\omega' - \omega}\,d\omega'.

Using Kramers–Kronig and the fluctuation-dissipation theorem together, all parts of the response can be reconstructed given one of them:

Equilibrium fluctuations SA(ω)      S_A(\omega) \;\; \longleftrightarrow \; Dissipation χ      \chi'' \;\; \longleftrightarrow \; Dispersion χ\chi'

Similarly for a dissipation-dispersion relation:

χA(ω)=1πP ⁣χA(ω)ωωdω=1πP ⁣tanh ⁣(βω2)SA(ω)ωωdω\chi_A'(\omega) = \frac{1}{\pi}\,\mathcal{P}\!\int_{-\infty}^{\infty}\frac{\chi_A''(\omega')}{\omega'-\omega}\,d\omega' = \frac{1}{\pi}\,\mathcal{P}\!\int_{-\infty}^{\infty}\frac{\tanh\!\bigl(\frac{\beta\omega'}{2}\bigr)\,S_A(\omega')}{\omega'-\omega}\,d\omega'

P\mathcal{P} is the Cauchy principal value of the integral, which converges for analytical functions (analyticity follows from causality):

P ⁣f(ω)ωωdω  =  limε0+[ωεf(ω)ωωdω  +  ω+εf(ω)ωωdω].\mathcal{P}\!\int_{-\infty}^{\infty}\frac{f(\omega')}{\omega'-\omega}\,d\omega' \;=\; \lim_{\varepsilon\to0^+} \left[ \int_{-\infty}^{\omega-\varepsilon}\frac{f(\omega')}{\omega'-\omega}\,d\omega' \;+\; \int_{\omega+\varepsilon}^{\infty}\frac{f(\omega')}{\omega'-\omega}\,d\omega' \right].

  • Applications to mechanics, optics, acoustics, electronics

Fluctuation-dissipation (and its corollaries fluctuation-dispersion and dissipation-dispersion) hold in any system that is linear, time‐translationally invariant, causal within Linear response theory (Kubo formalism), both in equilibrium and out-of-equilibrium with the generalized T(ω)T(\omega):

a. Overdamped Brownian particles (Smoluchowski dynamics)

The mobility μ(ω)\mu(\omega) and velocity‐autocorrelation spectrum satisfy

Imμ(ω)=tanh ⁣(βω2)Sv(ω),Sv(ω)=coth ⁣(βω2)Imμ(ω)\mathrm{Im}\,\mu(\omega) = \tanh\!\Bigl(\tfrac{\beta\omega}{2}\Bigr)\,S_v(\omega), \qquad S_v(\omega) = \coth\!\Bigl(\tfrac{\beta\omega}{2}\Bigr)\,\mathrm{Im}\,\mu(\omega)

and obey Kramers–Kronig.

b. Electromagnetic response of linear media

The complex permittivity

ε(ω)=ε(ω)+iε(ω)\varepsilon(\omega) = \varepsilon'(\omega) + i\,\varepsilon''(\omega)

obeys Kramers–Kronig, and at thermal equilibrium

Imε(ω)=tanh ⁣(βω2)SP(ω),SP(ω)=coth ⁣(βω2)Imε(ω)\mathrm{Im}\,\varepsilon(\omega) = \tanh\!\Bigl(\tfrac{\beta\omega}{2}\Bigr)\,S_P(\omega), \qquad S_P(\omega) = \coth\!\Bigl(\tfrac{\beta\omega}{2}\Bigr)\,\mathrm{Im}\,\varepsilon(\omega)

is tied to the equilibrium current‐fluctuation spectrum via the fluctuation-dissipation theorem.

Interestingly enough, we can get an estimate of the amplitude and phase noise a beam of light picks up when travelling through a material of known index of refraction or absorption spectrum.

The optical power is P=ωΦP=\hbar\omega\,\Phi. The medium absorbs with coefficient α(ω)\alpha(\omega) over length LL, so the excess intensity fluctuations from thermal polarization noise follow from from the fluctuation–dissipation theorem.

SI,add(Ω)2ωP  ×  kBTωα(ω)L=2PkBTα(ω)L.S_{I,\rm add}(\Omega) \approx 2\,\hbar\omega\,P \;\times\;\frac{k_B T}{\hbar\omega}\,\alpha(\omega)\,L =2\,P\,k_B T\,\alpha(\omega)\,L.

Shot noise is

SI,shot=2ωP.S_{I,\rm shot}=2\,\hbar\omega\,P.

Therefore

SI,addSI,shot=kBTωα(ω)L.\frac{S_{I,\rm add}}{S_{I,\rm shot}} =\frac{k_B T}{\hbar\omega}\,\alpha(\omega)\,L.

In terms of SNR this becomes:

SNR=ASI,shot+SI,add=ASI,shot(1+SI,addSI,shot)=ASI,shot(1+kBTωαL).\mathrm{SNR} = \frac{A}{S_{I,\rm shot} + S_{I,\rm add}} = \frac{A}{S_{I,\rm shot}\bigl(1 + \frac{S_{I,\rm add}}{S_{I,\rm shot}}\bigr)} = \frac{A}{S_{I,\rm shot}\bigl(1 + \frac{k_B T}{\hbar\omega}\,\alpha L\bigr)}.

Thermal refractive‐index fluctuations over length LL also introduce phase noise. With wavevector k0=ωnck_0= \frac{\omega \, n}{c}, the fluctuation-dissipation theorem yields

Sϕ,add(Ω)(k0L)2kBTωα(ω)L.S_{\phi,\rm add}(\Omega) \approx (k_0L)^2\,\frac{k_B T}{\hbar\omega}\,\alpha(\omega)\,L.

Shot‐noise‐limited phase diffusion for a coherent beam of flux Φ=P(ω)\Phi=\frac{P}{(\hbar\omega)} is

Sϕ,shot=12Φ=ω2P.S_{\phi,\rm shot}=\frac{1}{2\Phi}=\frac{\hbar\omega}{2P}.

Thus

Sϕ,addSϕ,shot=2Pω(k0L)2kBTωα(ω)L.\frac{S_{\phi,\rm add}}{S_{\phi,\rm shot}} =2\,\frac{P}{\hbar\omega}\,(k_0L)^2\,\frac{k_B T}{\hbar\omega}\,\alpha(\omega)\,L.

Both of these effects are usually minuscule fractions of the shot noise floor, but interesting to know about.

c. Acoustic (sound) waves in fluids or solids

The complex bulk modulus or sound‐attenuation coefficient has real and imaginary parts related by Kramers–Kronig and the attenuation spectrum is set by equilibrium pressure‐fluctuations.

ImK(ω)=tanh ⁣(βω2)Sp(ω),Sp(ω)=coth ⁣(βω2)ImK(ω)\mathrm{Im}\,K(\omega) = \tanh\!\Bigl(\tfrac{\beta\omega}{2}\Bigr)\,S_p(\omega), \qquad S_p(\omega) = \coth\!\Bigl(\tfrac{\beta\omega}{2}\Bigr)\,\mathrm{Im}\,K(\omega)

d. Electronic transport in conductors

The complex conductivity

σ(ω)\sigma(\omega)

obeys Kramers–Kronig, and

Reσ(ω)\mathrm{Re}\,\sigma(\omega)

is given by current‐noise via the Johnson–Nyquist relation.

Reσ(ω)=tanh ⁣(βω2)SJ(ω),SJ(ω)=coth ⁣(βω2)Reσ(ω)\mathrm{Re}\,\sigma(\omega) = \tanh\!\Bigl(\tfrac{\beta\omega}{2}\Bigr)\,S_J(\omega), \qquad S_J(\omega) = \coth\!\Bigl(\tfrac{\beta\omega}{2}\Bigr)\,\mathrm{Re}\,\sigma(\omega)

Usually, the world split into system and environment, where ΔUsystem+ΔUenvironment=0\Delta U_{system} \, + \, \Delta U_{environment} = 0.

However, at small scales interactions between system and environment become non-negligible, then It becomes necessary to include the solvation (mean-force) potential ϕ\phi. In this case, the Jarzinsky equality becomes:

eβW+β[ϕ(xt,λt)ϕ(x0,λ0)]  =  exp ⁣[βΔF]\bigl\langle e^{-\beta W + \beta\bigl[\phi(x_t,\lambda_t)-\phi(x_0,\lambda_0)\bigr]}\bigr\rangle \;=\;\exp\!\bigl[-\beta\,\Delta F^*\bigr]

Here, ΔF\Delta F^* the free-energy change of the Hamiltonian of mean force.

  • Bochkov–Kuzovlev equality

eβWex=1\big\langle e^{-\beta W_{\mathrm{ex}}}\big\rangle = 1

This shows that the exponential average of the “exclusive” work done by an external force (with no change in potential) equals unity.

  • Evans–Searles fluctuation theorem

P(Σt=A)P(Σt=A)=eA\frac{P(\Sigma_t = A)}{P(\Sigma_t = -A)} = e^{A}

It states that over a finite time (t), the probability of observing entropy production +A+A vs. A-A is exponentially biased by AA.

  • Gallavotti–Cohen steady-state fluctuation theorem

limt1tlnP(Σt=A)P(Σt=A)=A\lim_{t\to\infty}\frac{1}{t}\ln\frac{P(\Sigma_t = A)}{P(\Sigma_t = -A)} = A

In a nonequilibrium steady state, the long-time scaled log-ratio of entropy fluctuations equals the production itself.

  • Kurchan fluctuation theorem

P(Σt=A)P(Σt=A)=eA\frac{P(\Sigma_t = A)}{P(\Sigma_t = -A)} = e^{A}

An extension of transient FTs to thermostatted steady states, showing the same exponential symmetry for entropy production.

  • Lebowitz–Spohn fluctuation theorem

P(ω)P(Θω)=eΣ(ω)\frac{P(\omega)}{P(\Theta \omega)} = e^{\Sigma(\omega)}

The ratio of the probability of a trajectory ω\omega to its time-reversal Θω\Theta\omega is the exponential of its total entropy production.

  • Hummer–Szabo relation

δ(xxt)eβW  =  eβF(x)Z0\big\langle \delta(x - x_t)\,e^{-\beta W}\big\rangle \;=\;\frac{e^{-\beta F(x)}}{Z_0}

Enables reconstruction of equilibrium free-energy profiles F(x)F(x) from ensembles of non-equilibrium work measurements.

  • Seifert integral fluctuation theorem

eΔstot=1\big\langle e^{-\Delta s_{\mathrm{tot}}}\big\rangle = 1

States that for any stochastic process, the exponential average of the total entropy production equals unity.

Complexity vs. Entropy

Take some system with a phase transition like a 2d Ising model. As temperature increases, entropy monotonically increases. However, in a way both the T=0 and T=inf limits are equally simple: the system is homogeneous in both cases and every cell is like every other cell.

However in between, there is nontrivial structure and more "information" contained in the lattice, which is maximized at the critical point. It seems like this quantity might in general be proportional to ST=C(T)T\frac{\partial S}{\partial T} = \frac{C(T)}{T}.

Some keywords to pick this up later:

  • Statistical Complexity (Crutchfield, Shalizi, et al.)
  • Excess Entropy (This is the deviation in entropy from an ideal gas, i.e. a fully decoupled system. Any correlations should show up here!)
  • There's some notion of susceptibility here, i.e. the higher the heat capacitance the more energy is needed to displace the system from its current state. At the critical point the system seems to be able to "withstand" the most energy to produce a given change in T and structure? Not sure how to express this more rigorously. Something like Phasespaceρ(T)ρ(T+ϵ)dV\int_{Phase space} \rho(T) \cdot \rho(T + \epsilon) dV (integral over original and perturbed density of states) is maximized there? Maybe some kind of deviation metric is useful here, say Ruppeiner geometry? Is this the same as computational (ir)reducability?

Coordinate remapping

Take a Smoluchowski equation with some V(x)V(x) and stationary distribution P(x)P(x) and apply a nonlinear xyx\to y map such that P(y)eVeff(y)/DP(y)\propto e^{-V_{\textrm eff}(y)/D} is Gaussian. After re-expressing the entropy in the new set of coordinates

Sy=P(y)lnP(y)dyS_y=-\int P(y)\ln P(y)\,dy

then defining

Veff(y)=DlnP(y)+constV_{\textrm eff}(y)=-D\ln P(y)+\mathrm{const}

the new potential now takes the form

Veff(y)y2V_{\textrm eff}(y)\propto y^2

a potential consistent with a Gaussian distribution.

The Jacobian term ,Dlndxdy,-D\ln|\frac{dx}{dy}| is absorbed into the coordinates dy(dx)dy(dx) and entropy and energy have been repartitioned. This transformation maintains the free energy and partition function (which something like counts the number of accessible states). Explicitly,

P(y)=Px(x(y))dxdy.P(y) = P_x(x(y)) |\frac{dx}{dy}|.

Functional optimization

Smoluchowski dynamics are a gradient flow of the free-energy functional F[P]=V(x)P(x)dx+kBTP(x)lnP(x)dx,\mathcal F[P]=\int V(x)\,P(x)\,dx + k_{B} T\int P(x)\ln P(x)\,dx, so F˙0\dot{\mathcal F}\le0 and its unique stationary solution PeV/(kBT)P\propto e^{-V/(k_{B}T)} is the minimizer of F\mathcal F under P=1\int P=1.

In the case of the Schrodinger equation, there is no dissipation (evolution is unitary). However, applying a Wick-rotation into imaginary time the Schrodinger equation turns into a diffusion equation:

τψ=H^ψ-\hbar\,\partial_\tau\psi = \widehat H\,\psi

which is the gradient flow of

E[ψ]=ψH^ψ=ψ(22m2+V(x))ψdx.\mathcal E[\psi] = \langle\psi|\widehat H|\psi\rangle =\int\psi^*\bigl(-\tfrac{\hbar^2}{2m}\nabla^2+V(x)\bigr)\psi\,dx.

Through the Wick rotation the higher eigenmodes of ψ\psi to decay in the same way as those in diffusion:

  • Smoluchowski: P(x,t)=Peq+ciϕi(x)eλitP(x,t)=P_{\textrm eq}+\sum c_i\phi_i(x)e^{-\lambda_i t}.
  • Imaginary-time Schrödinger: ψ(x,τ)=anψn(x)eEnτ/\psi(x,\tau)=\sum a_n\psi_n(x)e^{-E_n\tau/\hbar}. In each case only the lowest eigenmode = stationary distribution survives as t,τt,\tau\to\infty, driving the system to the minimum of the corresponding functional.

Schrodinger kinetic energy as diffusion term

The 2-\nabla^2 kinetic energy operator in the Schrodinger equation penalizes curvature/gradients, driving a diffusion like spread. This can be formalized by applying the Madelung transform ψ=ρ,eiS/\psi=\sqrt\rho,e^{iS/\hbar} with ρ=ΨΨ\rho = \Psi^* \Psi. For real ψ\psi (as is the case in a stationary ground state distribution among others), the kinetic term becomes the Weizsacker functional

TW[ρ]  =  28mρ2ρdx,T_W[\rho] \;=\; \frac{\hbar^2}{8m}\int\frac{|\nabla\rho|^2}{\rho}\,dx,

which is precisely proportional to the Fisher information of ρ\rho. Thus the ground state can be seen as minimizing

F[ρ]=28m ⁣ρ2ρentropy  +  V(x)ρ(x)dx.\mathcal F[\rho] =\underbrace{\tfrac{\hbar^2}{8m}\!\int\frac{|\nabla\rho|^2}{\rho}}_{\textrm “entropy”} \;+\;\int V(x)\,\rho(x)\,dx.

The Fisher information is: (xlnρ)2=(xρ)2/ρ2(\partial_x\ln\rho)^2=(\partial_x \rho)^{2}/\rho^{2}, so ρ(xlnρ)2dx=(xρ)2ρdx.\int\rho(\partial_x\ln\rho)^2\,dx =\int\frac{(\partial_x \rho)^2}{\rho}\,dx.

I[ρ]=ρ2ρdx.I[\rho]=\int\frac{|\nabla\rho|^2}{\rho}\,dx. I[ρ]=(xρ)2ρdx,ρ=ψ2,I[\rho] =\int\frac{(\partial_x \rho)^2}{\rho}\,dx, \quad \rho=\psi^2,

Therefore the Weizsacker kinetic energy for ψ=ρ\psi=\sqrt{\rho} is

TW[ρ]=22mψ2 dx=28mI[ρ],T_W[\rho] =\frac{\hbar^2}{2m}\int|\nabla\psi|^2\ dx =\frac{\hbar^2}{8m} I[\rho],

so up to a constant factor the kinetic term is the Fisher information.

Some properties and intuition on Fisher information:

  1. Score function sensitivity: xlnρ\partial_x\ln\rho measures how sensitive log‐likelihood is to shifts; averaging its square under ρ\rho is total information about location.
  2. Cramér–Rao bound: Var(θ^)1/I[ρ]\mathrm{Var}(\hat\theta)\ge1/I[\rho].
  3. Geometric curvature: penalizes steep features in ρ\rho.
  4. Infinitesimal KL shift: KL[ρρ(+ε)]12,I[ρ],ε2\mathrm{KL}[\rho|\rho(\cdot+\varepsilon)]\approx\tfrac12,I[\rho],\varepsilon^2. Similar to previous question on entropy susceptibility at critical point?

Coordinate transform in Schrodinger equation

Analogous to Smoluchowski equation if we remap xy=f(x)x\to y=f(x) and ψϕ(y)=dxdy,ψ(x(y))\psi\to\phi(y)=\sqrt{\frac{dx}{dy}},\psi(x(y)), the total energy expectation

H=ψ(22mx2+V(x))ψdx=ϕ(22my2+V~(y))ϕdy\langle H\rangle =\int\psi^*\bigl(-\tfrac{\hbar^2}{2m}\partial_x^2+V(x)\bigr)\psi\,dx =\int\phi^*\bigl(-\tfrac{\hbar^2}{2m}\partial_y^2+\widetilde V(y)\bigr)\phi\,dy

is invariant, but you generate an extra Jacobian term in the potential QJ(y)Q_J(y). The invariants here are energy (Smoluchowski: free energy) and normalization (Smoluchowski: partition function). This is interesting because this means there is nothing special about quantum mechanics in xx, pp coordinates. As long as the Stone–von Neumann guarantees that any linear canonical change (x,p)(q,P)(x,p)\to(q,P) leads to a unitarily equivalent representation. Specifically, any pair of self-adjoint operators (Q^,P^)(\hat Q,\hat P) on a separable Hilbert space is canonical if [Q^,P^]  =  i1,[\hat Q,\hat P] \;=\; i\hbar\,\mathbf1, (or equivalently their exponentials satisfy the Weyl relations).

In that case Stone–von Neumann tells you there is up to unitary equivalence exactly one irreducible representation of those relations, and you get equivalent wave-functions in QQ-space or in PP-space.

I initially thought any set of variables with non-zero commutator might do the trick as long as it forms a basis for the symplectic structure of phase space, but it turns out the the are more stringent conditions in QM. The commutator must be a multiple of ii \hbar in order to generate the canonical Heisenberg algebra. I don't understand what that means, maybe look into it later.

Nonlinear canonical transformations are more complicated but analogous to the Smoluchowski case: A non-linear coordinate transformation introduces Jacobian terms that get put in one of two places. Assuming a complete basis set of wavefunctions, the coordinate transformation can be expressed as a unitary transform of the wavefunction: Φ(Q)dQ=UΨ(x(Q))dx\Phi(Q) dQ = U\, \Psi(x(Q)) dx with a unitary transformation UU between the new and old forms of the wavefunction in the new and old coordinates. Due to unitarity this does not change the normalization of the wavefunction. Now the question is whether to include this change of coordinates in the operators and keep the old forms of the wavefunction, or keep the old form of the operators and include this change in the wavefunction. This difference seems to map on nicely onto the QM Schrodinger vs. Heisenberg picture, as a more general, time-independent version of it (since the regular formulation talks about changes over time):

Schrödinger‐picture coordinate change:

  • Move state into the new coordinates:

    ϕ(y)=[Uψ](y)=dxdy  ψ(x(y)),\phi(y)=\bigl[U\,\psi\bigr](y) = \sqrt{\frac{dx}{dy}}\;\psi\bigl(x(y)\bigr),

  • Carry your Hamiltonian and all other operators along via

    Hy=UHxU1,Ay=UAxU1.H_y = U\,H_x\,U^{-1}, \quad A_y = U\,A_x\,U^{-1}.

  • Then expectation values are ψAxψ=ϕAyϕ\langle\psi|A_x|\psi\rangle = \langle\phi|A_y|\phi\rangle.

Heisenberg‐picture coordinate change

  • Keep wavefunction alone ψ(x)\psi(x) in the original Hilbert space,

  • transform every operator into changed coordinates:

    H~=U1HxU,A~=U1AxU.\widetilde H = U^{-1}\,H_x\,U, \quad \widetilde A = U^{-1}\,A_x\,U.

  • States live in the old xx–space, but the operators hold all the Jacobian/order­ing corrections.

  • Expectation values stay invariant as ψAxψ=ψA~ψ,\langle\psi|A_x|\psi\rangle = \langle\psi|\widetilde A|\psi\rangle,

This may be a good analogy in general for these coordinate transforms?

As discussed in entropic gravity, one can find a set of coordinates in which the distribution (ρ\rho) becomes uniform, or any other shape. E.g. using the cumulative-distribution map

y=F(x)=xminxρ0(x)dx,ϕ(y)=ψ0(x(y))ρ0(x(y))=1.y = F(x)=\int_{x_{\min}}^x\rho_0(x')\,dx', \quad \phi(y)=\frac{\psi_0(x(y))}{\sqrt{\rho_0(x(y))}} = 1.

The external VV is absorbed into a Jacobian quantum potential QJ(y)Q_J(y), and you can choose to view all of E0E_0 as coming from the Fisher (kinetic) term in the yy–frame.

However, since the energy expectation values are preserved (up to a common shift) the higher order modes do not in general transform to match the higher order modes of the new system. For instance, transforming some system into one where the

Entropy as Information Flow

There seems to be a common theme that in systems with dissipation some kind of functional is optimized for in the ground/stationary state. This is not true for unitary evolution, hence the Wick-rotation for the Schrodinger equation is necessary.

I think a good way to think about this is that from the perspecitve of a linear Markovian process (e.g. random walk) there is some kernel that reallocates state density at each time step. The question is what the fix point of repeated convolution of this kernel with itself is.

For most kernels the fix point of auto-convolution is a Gaussian and this is probably enough to qualify as dissipation.

If the kernel is a delta function or a permutation this is not true, the density must be split to at least two (on average, 50% chance of staying fully, 50% of transferring fully might also be ok) states. More rigurously what's needed is a irreducible (for long times all parts of the state space communicate), aperiodic, stochastic convolution kernel KK (or the continuous‐time master equation) has a unique stationary PeqP_{\textrm eq} that maximizes entropy (or minimizes free energy), with the KL divergence DKL[PPeq]D_{KL}[P|P_{\textrm eq}] as a Lyapunov functional.

There's also a neat connection between entropy production rate and entropy. For a Smoluchowski equation:

tP=x(x(V)P+DKLxP),PeqeV/D,\partial_tP=\partial_x\bigl(\partial_x (V) P+D_{KL}\,\partial_x P\bigr), \quad P_{\textrm eq}\propto e^{-V/D},

the relative entropy

DKL[PPeq]=PlnPPeqdxD_{KL}[P\|P_{\textrm eq}]=\int P\ln\frac{P}{P_{\textrm eq}}\,dx

satisfies dDKLdt=DKL ⁣(xP)2Pdx=DKL  I[P],\frac{dD_{KL}}{dt} =-D_{KL} \!\int\frac{(\partial_x P)^2}{P}\,dx =-D_{KL}\;I[P],

so D˙KL0\dot D_{KL}\le0. Here

I[P]=(xP)2/PdxI[P]=\int(\partial_x P)^2/P \, dx

is Fisher information.
Thus the H-theorem is the statement that the KL divergence decays monotonically, with rate proportional to Fisher info. DD looks like the KL divergence of the current distribution and the equilibrium and its decay rate is proportional to Fisher information.

What's also cool is that given the current non-equilibrium distribution one can calculate what minimum entropy production rate some other process must expend in order to maintain the distribution, based on Harada-Sasa link. In order to maintain the distribution requires at least

σmin=kBDKLI[P]\sigma_{min} = k_B \, D_{KL} \, I[P]

as the entropy production rate to offset it (per particle if P is a probability, if particle number then I[P]I[P] scales with NN and σtot=Nσmin\sigma_{tot} = N \cdot \sigma_{min}).

Ideal gas vs. Smoluchowski

While both diffuse there's an interesting difference between how these systems behave. Above I've written about repartitioning of energy and entropy in the Smoluchowski equation. By changing coordinates the distribution can be flattened, so the system is potential-free and the energy expectation value is 0. In an ideal gas in a box, the distribution is also homogeneous. However, the ideal gas still has a non-zero internal energy expectation value, more exactly E=32kBT\langle E \rangle = \frac{3}{2} k_B \, T, or more generally E=3+f2kBT\langle E \rangle = \frac{3 + f}{2} k_B \, T if there are ff additional internal degrees of freedom due to equipartition. So why does one of them have a "residual" internal energy, while the other doesn't?

What is going on here is that in Smolukowski we kill of any kinetic energy contributions. The Smoluchowski equation is an overdamped limit where momentum immediately dissipates and is position or potential independent. On the other hand in a system that behaves like an ideal gas the mean free path is very long (ballistic limit), so the distribution spreads over both x and p in phase space.

The Smoluchowski equation the overdamped limit of the more general full Kramers that supports full phase‐space dynamics:

tf(x,p,t)=pmxf(x,p,t)+x(V)pf(x,p,t)+γp(pf(x,p,t)+mkBTpf(x,p,t)),\partial_t f(x,p,t) = -\frac p m\,\partial_x f(x,p,t) + \partial_x (V)\,\partial_p f(x,p,t) + \gamma\,\partial_p\bigl(p \, f(x,p,t) + m k_BT\,\partial_p f(x,p,t)\bigr),

where momentum expoential decays with damping rate γ\gamma. Here both kinetic energy p22m\tfrac{p^2}{2m} and potential energy V(x)V(x) appear, and the stationary distribution is a Maxwell–Boltzmann distribution over all of phase space with potential:

feq(x,p,t)ep22mkBTV(x)kBTf_{\textrm eq}(x,p,t)\propto e^{-\frac{p^2}{2 \,m \, k_B \, T} - \frac{V(x)}{k_B \, T}}

The corresponding free energy functional is

F[f]=(p22m+V(x))f(x,p,t)  +  kBTf(x,p,t)lnf(x,p,t)dxdp \mathcal F[f] =\int \Bigl(\frac{p^2}{2m}+V(x)\Bigr)\,f(x,p,t) \;+\;k_{B} T \, f(x,p,t) \, \ln f(x,p,t) \, dx\,dp

and contains a Gibbs-Shannon entropy integrated over all of phase space.

How exactly the system behaves next depends on the ensemble.

In a isolated system with fixed total energy or microcanonical ensemble, the local kinetic energy can vary as a function of V(x)V(x). This is actually a cool example where temperature gradients can develop spontaneously (albeit transiently, in the end they even out) without breaking the second law due to the entropy increase from occupying additional space. In this case the marginal distributions do not factorize in general. For large NN and far away from phase transitions (+ some additional qualifications) the final stationary distribution approaches:

feq(x,p,t)eHkBT=ep22mkBTV(x)kBT.f_{\textrm eq}(x,p,t) \propto e^{\frac{-H}{k_{B}T}} = e^{-\frac{p^2}{2 \, m \, k_{B} \, T} - \frac{V(x)}{k_{B}T}}.

At that point the momentum distribution is independent of position and only depends on the temperature of the system.

In a single, uniform heat bath at temperature TT instantaneously rethermalizes the kinetic degrees of freedom, so the temperature is uniform and every part of the system rapidly (t γt ~ \gamma) decays to the same Maxwell–Boltzmann distribution. The distribution generally factorizes because of this and is the same as the large NN limit of the microcanonical ensemble but without the qualifications:

feq(x,p,t)eHkBT=ep22mkBTV(x)kBT.f_{\textrm eq}(x,p,t) \propto e^{\frac{-H}{k_{B}T}} = e^{-\frac{p^2}{2 \, m \, k_{B} \, T} - \frac{V(x)}{k_{B}T}}.

The gradient flow for this diffusion in momentum space can be written as:

dFdt=γkBT ⁣  (pf(x,p,t))2fdxdp  =  γkBT  Ip[f(x,p,t)]    0\frac{d\mathcal F}{dt} =-\gamma\,k_B T\!\int \;\frac{\bigl(\partial_p f(x,p,t)\bigr)^2}{f} dx\,dp \;=\;-\,\gamma\,k_BT\;I_p[f(x,p,t)] \;\le\;0

This means F\mathcal F decays to its unique minimum, the Maxwell–Boltzmann equilibrium. The rate of dissipation of non-equilibrium modes (and rate of entropy production) can again be expressed through the Fisher information. However, note that this is only the Fisher information of the momentum part of the distribution. There is no xf\partial_x f term because there is no diffusion x2f\partial_x^2f in the Kramers equation.
This fact appears strage to me, since an ideal gas clearly spreads out to a uniform distribution.

In fact, momentum carries you in xx via the advection pmxf-\tfrac p m\,\partial_xf. In the overdamped Smoluchowski limit γ\gamma\to\infty:

tP=x(1γx(V)P+kBTmγxP)\partial_tP = \partial_x\Bigl(\frac{1}{\gamma} \partial_x (V) P + \frac{k_BT}{m\gamma}\,\partial_xP\Bigr)

where D=kBTmγ\quad D = \frac{k_{B} \, T}{{m \, \gamma}} (Stokes-Einstein equation).

This expression has the explicit x2P\partial_x^2P diffusion term.

In the underdamped case, there is a deterministic transport (imagine a beam of gas propagating in a vacuum):

pmxf(x,p,t)and+x(V)pf(x,p,t)-\tfrac p m\,\partial_x f(x,p,t) \quad\text{and}\quad + \partial_x (V) \,\partial_p f(x,p,t)

This term acts to transport probability density from high to low and acts like an effective diffusion, but is not limited to it.