Finding the Shortest Path Visually

The Shortest Path -- A Derivation by Picture

The calculus of variations is fundamental to modern physics in general, and the Lagrangian formulation of mechanics in particular. Unfortunately, the derivation of the conditions under which the path over which we evaluate a line integral will produce a minimal value can seem obscure or even opaque. My first encounter with it, in an old (first) edition of Goldstein, left me feeling bewildered; the "trick" which was used to switch the order of differentiation around (swapping a total derivative for a partial) didn't even seem valid, let alone being easy to picture.

On this page I present two "visual" derivations of the minimality condition for a path. The first derivation is (in my own opinion!) delightfully easy to picture and understand; unfortunately it's not really valid (as I explain below). The second derivation is only slightly less visual, and has the advantage of being valid.

(For a more "traditional" approach, see the classic derivation. For an application of minimization over a path, see Lagrangian mechanics.)

Consider a function, f, defined on all paths through R^N. We wish to find a path such that the integral of f along the path will be minimized. We would like to find the conditions which the path must satisfy directly, just by examining a graph of the path, without using algebraic "tricks". Just so we can say we've been rigorous about it, we've also provided a conventional algebraic derivation of the necessary conditions on another page, but on this page we're going to stick with the proof-by-picture approach.

The definition of a path and statement of the problem, given below, are the same here as those given on the proof-by-parts page; if you've already seen them there, you might as well just skip down to the derivation.

Definition of a Path

We define a path in R^N from point X_ato point X_b as a smooth mapping from the unit interval on the real line [0,1] into R^N:

Figure 1 -- Some possible paths in R²:
Three paths

Statement of the Problem

We are given a function f which is defined along any path. Along any particular path, f is a function of the Xⁱ and of their derivatives, which we show with an overdot. So, f maps a particular path, and a point in R^N which lies on that path, into R. Thus, we have:

Note that f is a function of the “shape” of the path, and a function of the speed at which we travel along the path.

We wish to find a particular path which will minimize (or extremize) the integral of f over the path. That integral we will call F:

More specifically, we wish to find a path such that the integral of f along any nearby path is at least as large as the integral along our chosen path.

Figure 2 -- Some "nearby" paths in R²:
Some nearby paths

Note, however, that something important isn't shown on figures 1 and 2: The function, f, is a function of the location along the path, and is also a function of how fast we are moving along the path. The derivatives of x and y with respect to t do not appear in those pictures but they are important none the less. (We will need to use a picture in which we can see the derivative with respect to t to complete the derivation.)

The condition a path must satisfy, to be a minimum (or maximum) of F, is the familiar one: The derivative of F (with respect to the path) must vanish. That means that any “infinitesimal” deviation from the path will result in no change -- or, in other words, the path must be a stationary point for F in the space of all possible paths. (If that were not true, then the path would not be minimal. If a particular infinitesimal deviation produced a better result, obviously the path would not be minimal. If a particular infinitesimal deviation produced a worse result, on the other hand, then an infinitesimal deviation the opposite way would produce a better result and again the original path was not minimal.)

From here we could proceed with a conventional rigorous derivation using integration by parts, and we have done that, here. But on this page, we will proceed to derive the minimization conditions visually, if not entirely rigorously.

The Problem Re-Drawn in 1 Spatial Dimension

First, to do this visually, we're going to need a picture which lets us see how "fast" we're moving on the path. So, we must show t, the path parameter, explicitly in the graph. Since we can't easily show more than 2 dimensions, we'll have to limit ourselves to a path through R¹. That's sufficient for deriving the formula for a single dimension, however, and generalizing the result to N dimensions is straightforward since the derivative is a linear function of the partial derivatives. Because it's linear, the derivative with respect to a small deviation in N dimensions is just the sum of the derivatives of the integral taken over each dimension separately.

So here is a 1-dimensional path (figure 3). It's just a function of one variable: the value of X at each point is just X(t), and its “speed” along the path is just dX/dt -- the slope of the curve at each point.

Figure 3 -- A 1-dimensional path from X=a to X=b, parametrized by t:

If the path is minimal, then moving away from it “a little bit” will not change the value of the integral (to first order).

The First Derivation -- A Clear Picture (that Doesn't Quite Work)

Moving from a path which is just a little to one side of our chosen path to a different path an equal distance to the other side of our “chosen path” will result in no change (to second order). We will now “zoom in” on a little piece of the path, which is so short we can treat it as being “straight” (figure 4). We show two other paths, P1 and P2. Path P2 “races ahead” briefly at twice the rate of the main path, and then holds a constant value until the main path catches up. Path P1 holds a constant value while path P2 moves ahead, then races at twice the rate of the main path to catch up with path P2. Since P1 and P2 are equal perturbations on either side of our “minimum” path, we would like the integral of f to be equal (to second order) along P1 and P2.

Figure 4 -- The path segments we'll compare:
The paths we'll compare

In this image, we've broken apart the effect of shifting the path along the X axis and changing the speed at which we move along the path. In words, the difference in the “flat parts” of paths P1 and P2 -- segments A and B in figure 4 -- is due solely to the rate of change in f as X changes. If f grows larger as X increases, then the integral along path P2 will have a larger value over the “flat part” of its course than path P1.

On the “racing” parts of paths P1 and P2 -- segments C and D in figure 4 -- the paths pass through the same X values, so the derivative of f with respect to X doesn't matter there. They have the same slopes, too -- but path P1 races ahead, along segment D, after path P2 finishes racing along segment C. So, if the effect on f of the velocity along the curve changes as time goes by, then the integral along segment D will be larger by the amount that the derivative of f with respect to the velocity changed.

Again just in words, in order for the net integral along P1 to equal the integral along P2, the effect due to the increase in X between segments A and B must exactly balance the effect of racing ahead later along segment D versus segment C. The former effect is the partial derivative of f with respect to X; the latter effect is the time derivative of the partial derivative of f with respect to the velocity.

We'd like to extract the actual minimization condition from figure 4. Let's translate the origin to the point marked f₀ and replace f with its first-order Maclaurin expansion:

Using formula (4), we can see from figure 4 that the difference in f between segment A and segment B must be:

and, since each segment covers Δt, the difference in the integral of f between segment A and segment B must be:

Looking at segments C and D on the path, and applying (4), we can see that the average value of X on each segment is the same same as X at f₀, and the slope of each segment is 2(ΔX/Δt). So, the average value of f on segment C must be:

(N.B. -- this step in the derivation is easy to visualize but it's not really valid; we'll have more to say about that below.) Similarly, the average value of f on segment D must be:

Their difference, then, will be:

And the difference in the path integrals across segment D versus segment C must be:

Finally, the difference in the integral over path P2 versus path P1 must be:

and this difference can only be zero if

which is, indeed, the minimization condition we wanted to find.

Oops -- That wasn't really valid.

Unfortunately, the derivation just given isn't valid, because the first-order expansion we used in (4) is only accurate in the limit as we approach the origin. It's only valid for small values of x and for small values of dx/dt. But dx/dt is actually on the order of Δx/Δt, which is in general not small (we can minimize each of Δx and Δt by restricting ourselves to a small region on the path but their ratio is unaffected by shrinking the neighborhood).

The use of "flat spots" in the alternate paths, where x is momentarily fixed while t continues to advance, is very nice for visualizing the minimization conditions, because it completely splits the effect of

from the effect of

. Unfortunately, the difference in speed between a "flat spot" and the main path is typically not infinitesimal! So, we're no longer looking at an infinitesimal variation in the path, and the derivation doesn't really work when done this way, even though we managed to pull out the right answer.

A Second Derivation: A Bit Less Clear, but More Correct

As observed above, when we look for an alternate path that's "close to" our selected path, we can't have "flat spots" where the slope of the alternate path goes to 0. We must keep the slope similar to the slope of the main path. So, we can't really split the effect of speed (slope of the path) completely apart from the effect of traveling over different ground (value of x at each point). But we can nonetheless derive the desired result from a very simple diagram.

In figure 5, we've shown a tiny piece of the main path (path "M"), and we've shown one alternate path (path "P"). The alternate path goes a little faster than the main path on segment A, then proceeds more slowly along segment B until path "M" once again catches up with it. Thus, path "P" spends more time at larger values of X, which represents a cost if

is positive. However, it does more of its fast traveling earlier, and can idle along a bit later; this tradeoff represents a benefit if

is positive (the price of gas is going up). Thus, if the overall tradeoff is to net to zero (which must be true for very nearby paths, if the main path is minimal), then

and

must again balance. We will now derive that relationship more precisely.

Figure 5 -- The path segments to be compared, second time around:
Valid visual derivation

Figure 5 is largely self explanatory but a few things (which can be read directly from the figure) should, perhaps, be pointed out, if only because the print on the figure may be too small to read on some screens!

Path M has slope Δx/Δt in the figure (assumed constant in this tiny region). The point f₀, at location (t₀,X₀), has the average value of f on path M (to first order), and the average t and X values on path M are t₀ and X₀. Path P takes time Δt to traverse segment A, which it does with slope

. During that time, path M traverses segment M1; its average X value on segment M1 is X_a, and the average X value on segment A is

. Path P takes time Δt to traverse segment B, which it does with slope

. During that time, the average X value of segment M2 is X_b, and the average X value on segment B is

. Now, let us proceed.

The average value of f on segment A of path P, to first order, is:

The average value of f on segment B of path P, to first order, is:

Summing and dividing by 2, we get the average value of f on path P:

We can make the following simple substitutions, all accurate to first order:

Simplifying and multiplying by the total time, we obtain the integral of f over path P (to first order):

But the integral over path M is (to first order):

So, if F(M) = F(P), and keeping in mind that "to first order" is redundant when comparing first derivatives, we must have:

which was to be shown.

Explicit Dependency of f on t

f may depend explicitly on t as well. Do we need to modify the above derivations in that case?

In the first derivation we compare the integral over segments which are separated by time. The differences would be affected if

were nonzero. However, a brief inspection shows that, when we subtract the difference between segments C and D from the difference between segments A and B, the differences due to

will cancel and the final result will be unaffected.

In the second derivation, we compared integrals only over paths with identical average times, so nonzero

will not affect the calculations, or the result, at all.

Page last modified 10/31/06. Reference to Lagrangian mechanics added on 11/13/06.