Entropic Gravity Part 1: Addendum

March 20, 2026

In part 1 on this topic I showed this plot:

It's easy enough to reparametrize the radial coordinate in a way that flattens out the distribution. What is not obviously clear is what happens to the angular distributions of the individual chain links. In the plot shown, the distribution of the total chain length arises from a sum over 10,000 randomly oriented links. For each random link, the angle between the previous link and the following one is sampled from a uniform distribution. 

On the other hand, reparametrizing the radial coordinate makes the total length uniformly distributed. As a consequence, the angular uniform distribution of the links must become distorted, non-uniform. 

This fact, as we've also seen in the previous post, goes to the heart of what randomness is. When there are different parameters, there are multiple choices for which parameter to make "random": in our case, that can be either the total length distribution, or the radial distribution of the individual links. This is a great example of Bertrand's paradox.

So, let's derive the angular distributions of the chain links. 

Wormlike-chain uniformized

The endpoint of link mm is located at

Xm=i=1m(cosθi,sinθi),m=1,,N,X_m = \sum_{i=1}^{m} (\cos \theta_i,\sin \theta_i), \qquad m=1,\dots,N,

with N=10,000N=10{,}000 in our case, and the radial distribution is

ρm=Xm.\rho_m = |X_m|.

The full chain endpoint is XNX_N, with distance from the origin, as plotted above, r=ρNr=\rho_N.

Coordinate system A1

In coordinate system A1, the link angles are sampled uniformly,

θiUnif[0,2π).\theta_i \sim \mathrm{Unif}[0,2\pi).

The resulting radial distribution is not exactly Rayleigh, because the endpoint cannot lie farther than NN. The exact finite-NN law is the 2D Pearson random-walk law:

pm(A1)(ρ)=ρ0J0(ρt)J0(t)mtdt,0ρm.p_m^{(A1)}(\rho) = \rho \int_0^\infty J_0(\rho t)\,J_0(t)^m\,t\,dt, \qquad 0\le \rho \le m.

So for the full chain we have

pN(A1)(r)=r0J0(rt)J0(t)Ntdt,0rN.p_N^{(A1)}(r) = r \int_0^\infty J_0(rt)\,J_0(t)^N\,t\,dt, \qquad 0\le r \le N.

For large mm, this is extremely well approximated by the usual Rayleigh form

pm(A1)(ρ)2ρmeρ2/m,p_m^{(A1)}(\rho)\approx \frac{2\rho}{m}e^{-\rho^2/m},

but the exact expression above is the right one if we want the support to end at ρ=m\rho=m, as we otherwise introduce avoidable artifacts.

Because the walk is isotropic, the polar angle of XmX_m is uniform. So the position distribution of the endpoint of link mm is

Pm(A1)(ρ,φ)=12πpm(A1)(ρ).P_m^{(A1)}(\rho,\varphi)=\frac{1}{2\pi}p_m^{(A1)}(\rho).

The azimuth is flat, while the radius follows the Pearson law appropriate to mm steps.

Flattening the chain length distribution

To uniformize the endpoint radius of the full chain, the correct transformation is the cumulative distribution function

u=FN(r)=0rpN(A1)(s)ds.u = F_N(r) = \int_0^r p_N^{(A1)}(s)\,ds.

This maps the interval [0,N][0,N] to [0,1][0,1], and by construction makes uu uniformly distributed.

Using the Bessel identity ddr(rJ1(rt))=rtJ0(rt)\frac{d}{dr}\big(rJ_1(rt)\big)=rtJ_0(rt), the same CDF can be written as

FN(r)=r0J1(rt)J0(t)Ndt.F_N(r) = r\int_0^\infty J_1(rt)\,J_0(t)^N\,dt.

This is the finite-support replacement for the large-NN approximation

FN(r)1er2/N.F_N(r)\approx 1-e^{-r^2/N}.

To define a single coordinate change for the whole chain, not just for the final endpoint, we apply the same radial map to every point in the plane:

Y=T(X)=f(ρ)ρX,ρ=X,f(ρ)=FN(ρ).Y = T(X) = \frac{f(\rho)}{\rho}X, \qquad \rho=|X|, \qquad f(\rho)=F_N(\rho).

This gives us coordinate system A2.

Coordinate system A2

Under this map, the radial coordinate of every point becomes

u=f(ρ)=FN(ρ),u = f(\rho)=F_N(\rho),

while the polar angle stays the same. So the endpoint of link mm is still isotropic in angle, but its radial law changes by the Jacobian:

pm(A2)(u)=pm(A1)(ρ(u))f(ρ(u))=pm(A1)(ρ(u))pN(A1)(ρ(u)),p_m^{(A2)}(u) = \frac{p_m^{(A1)}(\rho(u))}{f'(\rho(u))} = \frac{p_m^{(A1)}(\rho(u))}{p_N^{(A1)}(\rho(u))},

where ρ(u)=FN1(u)\rho(u)=F_N^{-1}(u).

So the full position distribution of the endpoint of link mm in A2 is

Pm(A2)(u,φ)=12πpm(A2)(u)=12πpm(A1)(ρ(u))pN(A1)(ρ(u)).P_m^{(A2)}(u,\varphi) = \frac{1}{2\pi}p_m^{(A2)}(u) = \frac{1}{2\pi}\frac{p_m^{(A1)}(\rho(u))}{p_N^{(A1)}(\rho(u))}.

For m=Nm=N, this becomes

pN(A2)(u)=1,p_N^{(A2)}(u)=1,

which was the whole point of the construction.

So the transformation flattens the final endpoint distribution exactly, but it does not flatten the position distribution of every intermediate link endpoint. Each link index mm keeps its own nontrivial radial profile.

Angle laws in A1

There are two natural angles one can talk about.

The first is the absolute orientation of a link tangent in the lab frame. In A1 this is just θi\theta_i, and by assumption

p(A1)(θi)=12π.p^{(A1)}(\theta_i)=\frac{1}{2\pi}.

The second is the angle a link makes relative to the local radial direction. If the endpoint of link mm sits at polar angle φm\varphi_m, then the next link angle relative to the outward radial direction is

αm=θm+1φm.\alpha_m = \theta_{m+1}-\varphi_m.

Since θm+1\theta_{m+1} is sampled independently and uniformly, this is also uniform:

p(A1)(αmρm)=12π.p^{(A1)}(\alpha_m\mid \rho_m)=\frac{1}{2\pi}.

So in A1 the link tangent does not prefer radial or tangential directions. Locally, every direction is equally likely.

However, a radial coordinate change is not conformal unless it is just a linear rescaling. It stretches radial and tangential directions by different amounts.

Write a small displacement in polar form as

dX=dρr^+ρdφφ^.dX = d\rho\,\hat{\mathbf r} + \rho\,d\varphi\,\hat{\boldsymbol\varphi}.

Under the map u=f(ρ)u=f(\rho), this becomes

dY=f(ρ)dρr^+f(ρ)dφφ^.dY = f'(\rho)\,d\rho\,\hat{\mathbf r} + f(\rho)\,d\varphi\,\hat{\boldsymbol\varphi}.

So radial and tangential directions scale by

λr(ρ)=f(ρ)=pN(A1)(ρ),λt(ρ)=f(ρ)ρ=FN(ρ)ρ.\lambda_r(\rho)=f'(\rho)=p_N^{(A1)}(\rho), \qquad \lambda_t(\rho)=\frac{f(\rho)}{\rho} = \frac{F_N(\rho)}{\rho}.

Now suppose a link in A1 makes angle α\alpha relative to the local radial direction. Its transformed angle α\alpha' in A2 satisfies

tanα=λt(ρ)λr(ρ)tanα=FN(ρ)ρpN(A1)(ρ)tanα.\tan \alpha' = \frac{\lambda_t(\rho)}{\lambda_r(\rho)}\tan \alpha = \frac{F_N(\rho)}{\rho\,p_N^{(A1)}(\rho)}\tan \alpha.

Define

Λ(ρ)=λt(ρ)λr(ρ)=FN(ρ)ρpN(A1)(ρ).\Lambda(\rho) = \frac{\lambda_t(\rho)}{\lambda_r(\rho)} = \frac{F_N(\rho)}{\rho\,p_N^{(A1)}(\rho)}.

Then

tanα=Λ(ρ)tanα.\tan \alpha'=\Lambda(\rho)\tan \alpha.

Since α\alpha was uniform in A1, the transformed conditional angle law in A2 is

p(A2)(αρ)=12πΛ(ρ)Λ(ρ)2cos2α+sin2α.p^{(A2)}(\alpha'\mid \rho) = \frac{1}{2\pi} \frac{\Lambda(\rho)} {\Lambda(\rho)^2\cos^2\alpha' + \sin^2\alpha'}.

If Λ(ρ)=1\Lambda(\rho)=1, nothing happens and the angle law stays flat. But generically Λ(ρ)1\Lambda(\rho)\neq 1, so the local angle distribution is no longer uniform. Depending on radius, the transformation will favor directions closer to radial or closer to tangential.

The unconditional angle law for link m+1m+1 is then obtained by averaging over the position distribution of the endpoint of link mm:

pm(A2)(α)=0mpm(A1)(ρ)p(A2)(αρ)dρ.p_m^{(A2)}(\alpha') = \int_0^m p_m^{(A1)}(\rho)\, p^{(A2)}(\alpha'\mid \rho)\,d\rho.

Equivalently, in the transformed radial coordinate,

pm(A2)(α)=0FN(m)pm(A2)(u)p(A2)(αρ(u))du.p_m^{(A2)}(\alpha') = \int_0^{F_N(m)} p_m^{(A2)}(u)\, p^{(A2)}(\alpha'\mid \rho(u))\,du.

So, taken together:

  • in A1, link positions follow the Pearson random-walk law, and link angles are locally uniform;
  • in A2, the final endpoint radius has been flattened, each intermediate link endpoint acquires a transformed radial profile, and the local link-angle law becomes position-dependent and non-uniform.

Running the same logic in the opposite direction

The previous section started from a chain whose link angles were sampled uniformly, and then changed coordinates so that the endpoint radius became uniform.

We can now reverse that logic.

Instead of starting from the angle law, let us start from the coordinate density that we ended up with in A2, and now simply declare that density to be the native one. That gives a second pair of coordinate systems, which we can call B1 and B2.

Coordinate system B1

In B1, the full-chain radial coordinate is taken to be uniform from the start:

u=YN[0,1],pN(B1)(u)=1.u = |Y_N| \in [0,1], \qquad p_N^{(B1)}(u)=1.

For the intermediate link endpoints, we take exactly the radial profiles obtained above in A2. If YmY_m is the endpoint of link mm, with polar coordinates (u,φ)(u,\varphi), then

Pm(B1)(u,φ)=12πpm(B1)(u)=12πpm(A1)(ρ(u))pN(A1)(ρ(u)),P_m^{(B1)}(u,\varphi) = \frac{1}{2\pi}p_m^{(B1)}(u) = \frac{1}{2\pi}\frac{p_m^{(A1)}(\rho(u))}{p_N^{(A1)}(\rho(u))},

where

u=FN(ρ),ρ(u)=FN1(u).u = F_N(\rho), \qquad \rho(u)=F_N^{-1}(u).

So the full-chain endpoint is flat in uu, while the intermediate link endpoints keep the nontrivial transformed profiles inherited from A2.

Now consider the angle of link m+1m+1 relative to the local radial direction at YmY_m. Call it βm\beta_m. In A2 we found that the local angle law is

p(A2)(αρ)=12πΛ(ρ)Λ(ρ)2cos2α+sin2α,p^{(A2)}(\alpha'\mid \rho) = \frac{1}{2\pi} \frac{\Lambda(\rho)} {\Lambda(\rho)^2\cos^2\alpha' + \sin^2\alpha'},

with

Λ(ρ)=FN(ρ)ρpN(A1)(ρ).\Lambda(\rho)=\frac{F_N(\rho)}{\rho\,p_N^{(A1)}(\rho)}.

In B1 we now take that very same density as the native angle law:

p(B1)(βmu)=12πΛ(ρ(u))Λ(ρ(u))2cos2βm+sin2βm.p^{(B1)}(\beta_m\mid u) = \frac{1}{2\pi} \frac{\Lambda(\rho(u))} {\Lambda(\rho(u))^2\cos^2\beta_m + \sin^2\beta_m}.

So in B1 the length distribution is uniform, and the local angle law is non-uniform.

At the level of coordinate densities, B1 is therefore identical to A2:

Pm(B1)(u,φ)=Pm(A2)(u,φ),P_m^{(B1)}(u,\varphi)=P_m^{(A2)}(u,\varphi),

and

p(B1)(βmu)=p(A2)(αu).p^{(B1)}(\beta_m\mid u)=p^{(A2)}(\alpha'\mid u).

So if one only inspects the density in the displayed coordinates, B1 and A2 look the same.

Flattening the angle laws

Now perform the opposite reparametrization: instead of flattening the radial coordinate, flatten the local angle law.

The clean way to do this is to undo the distortion factor Λ\Lambda. In A2 we had

tanα=Λ(ρ)tanα.\tan \alpha' = \Lambda(\rho)\tan\alpha.

So the inverse map is

tanγ=1Λ(ρ)tanβ,\tan \gamma = \frac{1}{\Lambda(\rho)}\tan\beta,

with the branch chosen so that the full angle on [0,2π)[0,2\pi) is tracked continuously.

Equivalently, one can write this as the conditional cumulative transform

γ=G(β;u)=2π0βp(B1)(βu)dβ.\gamma = G(\beta;u) = 2\pi \int_0^\beta p^{(B1)}(\beta'\mid u)\,d\beta'.

By construction, this makes the transformed angle variable γ\gamma uniform:

p(B2)(γu)=12π.p^{(B2)}(\gamma\mid u)=\frac{1}{2\pi}.

This is the angular analogue of the radial flattening that took us from A1 to A2.

Coordinate system B2

In B2, the local angle law has been flattened, so the links are described by uniform tangent angles once again:

p(B2)(γmu)=12π.p^{(B2)}(\gamma_m\mid u)=\frac{1}{2\pi}.

Once the link angles have been put back into uniform form, the chain endpoint distribution reverts to the original Pearson random-walk law. So for the endpoint of link mm,

Pm(B2)(ρ,φ)=12πpm(A1)(ρ),P_m^{(B2)}(\rho,\varphi) = \frac{1}{2\pi}p_m^{(A1)}(\rho),

and in particular for the full chain,

pN(B2)(r)=pN(A1)(r)=r0J0(rt)J0(t)Ntdt.p_N^{(B2)}(r) = p_N^{(A1)}(r) = r\int_0^\infty J_0(rt)\,J_0(t)^N\,t\,dt.

Likewise the local angle law is again flat:

p(B2)(γm)=12π.p^{(B2)}(\gamma_m)=\frac{1}{2\pi}.

So at the level of density, B2 is identical to A1:

Pm(B2)(ρ,φ)=Pm(A1)(ρ,φ),P_m^{(B2)}(\rho,\varphi)=P_m^{(A1)}(\rho,\varphi),

and

p(B2)(γm)=p(A1)(θm)=12π.p^{(B2)}(\gamma_m)=p^{(A1)}(\theta_m)=\frac{1}{2\pi}.

In other words, we can run the whole construction in reverse:

  • A1 starts with uniform link angles and produces a non-uniform endpoint radius;
  • A2 flattens that radius and distorts the local angle law;
  • B1 takes that flattened-radius/non-uniform-angle density as the starting point;
  • B2 then flattens the angle law and recovers the original Pearson radial distribution.

So based on density alone,

A2 and B1 look the same,A1 and B2 look the same.\text{A2 and B1 look the same,} \qquad \text{A1 and B2 look the same.}

Energy and entropy of the matching pairs

If one computes entropy and effective energy directly from the local coordinate density, then the matching pairs above necessarily agree.

Take any displayed coordinate qq with normalized density ρq(q)\rho_q(q). Define the differential entropy

S[q]=kBρq(q)logρq(q)dq,S[q] = -k_B\int \rho_q(q)\log \rho_q(q)\,dq,

and the corresponding effective energy landscape

Eq(q)=kBTlogρq(q)+C,E_q(q) = -k_B T\log \rho_q(q)+C,

where CC is an arbitrary additive constant.

The corresponding mean effective energy is

Eq=ρq(q)Eq(q)dq=kBTρq(q)logρq(q)dq+C=TS[q]+C.\langle E\rangle_q = \int \rho_q(q)\,E_q(q)\,dq = -k_B T\int \rho_q(q)\log \rho_q(q)\,dq + C = T\,S[q]+C.

So both the entropy and the average effective energy are determined entirely by the local density ρq\rho_q.

But we have already shown that the relevant coordinate densities match pairwise:

ρB1=ρA2,ρB2=ρA1.\rho_{B1}=\rho_{A2}, \qquad \rho_{B2}=\rho_{A1}.

Therefore the corresponding entropies match:

S[B1]=S[A2],S[B2]=S[A1],S[B1]=S[A2], \qquad S[B2]=S[A1],

and, with the same additive convention for CC, the mean effective energies match as well:

EB1=EA2,EB2=EA1.\langle E\rangle_{B1}=\langle E\rangle_{A2}, \qquad \langle E\rangle_{B2}=\langle E\rangle_{A1}.

The same statement holds link-by-link for the endpoint distributions of every intermediate link mm, since those coordinate densities also match pairwise.

How to tell the two constructions apart

At this point, something subtle has happened.

We have constructed two different stories:

  • in A1, the links are sampled with uniform angles, and the non-uniform radial distribution follows from that;
  • in B1, the radial distribution is taken to be uniform from the start, and the non-uniform angle law follows from that.

After the corresponding coordinate changes, these can be made to look pairwise identical at the level of the displayed density:

A2B1,A1B2.A2 \leftrightarrow B1, \qquad A1 \leftrightarrow B2.

So if all we are shown is the equilibrium density in the chosen coordinates, there is no obvious label attached saying which one came first.

This is exactly the same structural issue that appears in Bertrand’s paradox.

In Bertrand’s paradox, the phrase "choose a random chord" is incomplete until one specifies what is being sampled uniformly. Different choices of what counts as the primitive random variable lead to different quantitative answers, even though each one sounds perfectly reasonable in words.

The same thing is happening here. The phrase "make the chain random" is also incomplete until one decides what is being sampled uniformly.

One can start from

dPA(ω)=dNθ(2π)N,dP_A(\omega)=\frac{d^N\theta}{(2\pi)^N},

which says that the primitive randomness lies in the link angles, or one can start from

dPB(ω)=g(r(ω))pA(r(ω))dPA(ω),g(r)=1N,dP_B(\omega)=\frac{g(r(\omega))}{p_A(r(\omega))}\,dP_A(\omega), \qquad g(r)=\frac1N,

which says that the primitive randomness lies in the endpoint length.

These are quantitatively different ensembles. But by reparametrizing the coordinates, they can be made to share the same density.

So how do we tell them apart?

1: Correlations between variables

A single equilibrium density is too little information.

Suppose a(ω)a(\omega) is some second observable: a bend angle, the angle of a link relative to the end-to-end direction, a local curvature, or anything else extracted from the chain.

In ensemble A, the joint law is

pA(r,a).p_A(r,a).

In ensemble B, the reweighting only depends on rr, so

pB(r,a)=g(r)pA(r)pA(r,a).p_B(r,a) = \frac{g(r)}{p_A(r)}\,p_A(r,a).

This means the joint distributions are different whenever aa and rr are not independent.

So one straightforward way to distinguish the two constructions is to measure a joint density such as

p(r,ϕ),p(r,Δθ),p(r,κ),p(r,\phi), \qquad p(r,\Delta\theta), \qquad p(r,\kappa),

rather than only the one-dimensional marginal of rr.

That one extra variable is enough to break the ambiguity.

2: Dynamical measurements

A still stronger way to distinguish the two is to stop looking only at equilibrium snapshots and instead record trajectories.

The reason is simple: a coordinate transformation changes how a given trajectory is described, but it does not create a new ensemble of trajectories. By contrast, changing the underlying sampling rule does.

In A, the path measure is generated by uniform angle sampling. In B, the path measure is reweighted by the final endpoint radius. So although one can make the equilibrium densities match in suitable coordinates, the trajectory statistics need not match.

That means one can compare quantities such as

P(qt+Δtqt),q(t+τ)q(t),first-passage times,P(q_{t+\Delta t}\mid q_t), \qquad \langle q(t+\tau)q(t)\rangle, \qquad \text{first-passage times},

or the response to an external perturbation. These depend on the full trajectory ensemble, not just the static equilibrium density, and so they generally distinguish the two constructions.

3: Measurements under different conditions

Another way to separate the two is to change a control parameter and ask whether the same underlying model continues to fit.

A passive coordinate transformation simply rewrites the same ensemble. It does not invent a new physical dependence on temperature or on the external controls.

By contrast, if one insists on imposing a given target density as fundamental, the effective weighting needed to maintain that target can itself acquire nontrivial dependence on the control parameter.

So if the same microscopic model is required to explain data across a family of conditions, the ambiguity can be broken.

What this has to do with Bertrand's paradox

Bertrand’s paradox is not just a trick about circles. It is a warning that the phrase "uniformly random" has no meaning until one specifies the measure.

That is exactly the moral here.

There is no contradiction between saying

  • "the links are uniformly random, therefore the endpoint radius is non-uniform,"

and saying

  • "the endpoint radius is uniformly random, therefore the link angles are non-uniform."

Both are mathematically legitimate once the primitive random variable has been specified.

What changes is the ensemble.

In that sense, A and B are different answers to the question: which variable is taken to be uniformly random before anything else is derived?

That is why the two constructions are quantitatively different, just as the different chord-generating procedures in Bertrand’s paradox are quantitatively different.

At the same time, within each construction, the paired coordinate systems are valid reparametrizations of one another:

A1A2,B1B2.A1 \leftrightarrow A2, \qquad B1 \leftrightarrow B2.

So there are really two separate layers:

  1. the choice of ensemble, which determines what is taken as primitive randomness;
  2. the choice of coordinates, which determines how that ensemble is displayed.

Bertrand’s paradox lives in the first layer. The flattening transformations live in the second.

Once those two layers are separated, the structure becomes clearer:

  • changing coordinates can make two different ensembles look the same at the level of a one-dimensional density;
  • but looking at joint observables, unconditional angle laws, trajectories, or parameter dependence reveals that the ensembles are not actually the same.

Markovianity as a guide to interpretation

There is one more principle that is useful here, because it helps distinguish a good description from one that is merely a clever reparametrization.

That principle is Markovianity.

A variable qtq_t is Markovian if its future depends only on its present value, not on the rest of its history:

P(qt+Δtqt,qtΔt,qt2Δt,)=P(qt+Δtqt).P(q_{t+\Delta t}\mid q_t,q_{t-\Delta t},q_{t-2\Delta t},\dots) = P(q_{t+\Delta t}\mid q_t).

In practice, this means that once the current state is known, the past adds no further predictive power.

Why does this matter here?

Because equilibrium densities alone are too permissive. A great many different microscopic constructions can be made to reproduce the same static distribution in some chosen coordinate. But the dynamics are much less forgiving. If the coordinate being used is a good one, then its evolution should be close to Markovian. If it is a bad one, then hidden variables will keep leaking through, and the apparent dynamics will remember the past.

This makes Markovianity a useful guide for interpretation.

If one coordinate system gives a description in which the observed dynamics are approximately Markovian, while another requires long memory kernels, path history, or extra hidden coordinates to explain the same data, then the former is usually the more natural description.

In that sense, Markovianity does not tell us which variable is "truly random" in some metaphysical sense. But it does tell us which parametrization is closer to being dynamically complete.

This is especially relevant in the present setting. A coordinate transformation can flatten a density, or flatten an angle law, or make one ensemble look deceptively similar to another. But if that transformation pushes important information into hidden correlations, then the resulting coordinate will generally look less Markovian.

So although equilibrium density alone cannot distinguish the constructions above, the time evolution often can. A good coarse-grained coordinate is not just one with a neat-looking stationary distribution. It is one in which the dynamics close on themselves as much as possible.