|
The
Shortest Path -- A Derivation by Picture
|
The calculus of variations is fundamental to
modern physics in general, and the Lagrangian formulation of mechanics
in particular. Unfortunately, the derivation of the conditions
under which the path over which we evaluate a line integral will
produce a minimal value can seem obscure or even opaque. My first
encounter with it, in an old (first) edition of Goldstein, left me
feeling
bewildered; the "trick" which was used to switch the order of
differentiation around (swapping a total derivative for a partial)
didn't even seem valid, let alone being easy to picture.
On this page I present two "visual" derivations of the minimality
condition for a path. The
first
derivation is (in my own opinion!) delightfully easy to picture and
understand; unfortunately it's not really valid (as I explain
below). The
second
derivation is only slightly less visual, and has the advantage of
being valid.
(For a more "traditional" approach, see the
classic derivation. For an application of minimization over a path, see
Lagrangian mechanics.)
Consider a
function,
f,
defined on all paths through R
N. We
wish to find a path such that the integral of
f
along the path will be minimized. We would like to find the
conditions which the path must satisfy
directly,
just by examining a graph of the path, without using algebraic
"tricks". Just so we can say we've been rigorous about it,
we've also provided a
conventional algebraic derivation of the necessary conditions on
another page,
but on this page we're going to stick with the proof-by-picture
approach.
The definition of
a path and statement of the problem, given below, are the same here as
those given on the
proof-by-parts
page; if you've already seen them there, you might as well
just skip down
to
the derivation.
Definition
of a Path
We define
a
path in R
N from
point X
a to point X
b as
a smooth mapping from
the unit interval on the real line [0,1] into R
N:
Figure 1 --
Some
possible paths in R2:
Statement of the Problem
We are
given a function
f which is
defined along any path. Along any
particular path,
f
is a function of the X
i and
of their derivatives, which we show with an overdot. So,
f
maps a particular path, and a point in R
N
which lies on that path, into R. Thus, we have:
Note that
f is a
function of the “shape” of the path,
and a
function
of the speed at which we travel along the path.
We wish to find a particular path which
will minimize (or extremize) the integral of
f
over the
path. That integral we will call
F:
More
specifically, we wish to find a path such that the integral of
f
along any
nearby path is at least as large as the
integral along our chosen path.
Figure
2 -- Some "nearby" paths in R2:
Note,
however, that something important isn't shown on figures 1 and 2:
The function,
f, is a
function of the location along the path, and is also a function of
how
fast we are moving along the path. The derivatives
of
x and
y with respect to
t
do not appear in those pictures but they are important none the less.
(We will need to use a picture in which we can see the
derivative with respect to
t to complete the
derivation.)
The
condition a path must satisfy, to be a minimum (or maximum) of
F,
is the familiar one: The derivative of
F (with
respect to the
path) must vanish. That means that any
“infinitesimal”
deviation from the path will result in no change -- or, in other
words, the path must be a stationary point for F in the space of all
possible paths. (If that were not true, then the path would
not be minimal. If a particular infinitesimal deviation
produced a better result, obviously the path would not be minimal.
If a particular infinitesimal deviation produced a
worse
result, on the other hand, then an infinitesimal deviation
the
opposite way would produce a better result and again the
original path was not minimal.)
From
here we
could proceed with a conventional rigorous
derivation
using integration by parts, and we have done that,
here.
But
on this page, we will proceed to derive the minimization
conditions visually, if not entirely rigorously.
The
Problem Re-Drawn in 1 Spatial Dimension
First, to do this visually, we're
going to need a picture which lets us see how "fast" we're moving on
the path. So, we must show
t,
the path parameter,
explicitly in the graph. Since we can't easily show more than 2
dimensions, we'll have to limit ourselves to a path through R
1.
That's sufficient for deriving the formula for a single dimension,
however, and generalizing the result to N dimensions is
straightforward since the derivative is a linear function of the
partial derivatives. Because it's linear, the derivative with respect
to
a
small deviation in N dimensions is just the sum of the
derivatives of the integral
taken over each dimension separately.
So here is a 1-dimensional path (
figure
3). It's just a function of one variable: the value of
X
at
each point is just
X(t), and its
“speed” along the
path is just d
X/dt -- the slope of the curve at each
point.
Figure
3 -- A 1-dimensional path from X=a to X=b,
parametrized by t:
If the path is minimal, then moving
away from it “a little bit” will not change the value of the
integral (to first order).
The
First Derivation -- A Clear Picture (that Doesn't Quite Work)
Moving
from a path
which is just a little to
one side of our chosen
path to a
different path an equal distance to the
other side
of our
“chosen path” will result in no change (to second order). We
will now “zoom in” on a little piece of the path, which is so
short we can treat it as being “straight” (figure 4). We show two
other paths,
P1 and
P2. Path
P2
“races ahead”
briefly at twice the rate of the main path, and then holds a constant
value until the main path catches up. Path
P1 holds
a constant
value while path
P2 moves ahead, then races at
twice the rate of the main path to catch up with
path
P2. Since
P1 and
P2
are equal
perturbations on either side of our “minimum” path, we would like
the integral of
f
to be equal (to second order) along
P1
and
P2.
Figure
4 -- The path segments we'll
compare:
In this image, we've broken apart the effect of shifting the
path along the
X axis and changing the speed at
which we move along the path. In words, the difference in the
“flat
parts” of paths
P1 and
P2 --
segments
A and
B in
figure 4
-- is due solely to the
rate of change
in
f as
X
changes. If
f grows
larger as
X increases, then the integral along path
P2
will have a larger value over the “flat part” of its course than
path
P1.
On the “racing” parts
of paths
P1 and
P2 -- segments
C
and
D in
figure 4
-- the paths pass through the same
X
values, so the derivative of
f
with respect to
X doesn't matter
there. They have the same slopes, too --
but path
P1
races ahead, along segment
D,
after
path
P2 finishes racing along segment
C.
So, if the effect on
f
of the
velocity along the curve
changes as time goes by, then
the integral along segment
D will be larger by the
amount that the
derivative of
f with respect to
the velocity changed.
Again just in words, in
order for the net integral along
P1 to equal the
integral along
P2, the effect due to the increase
in
X between segments
A and
B
must exactly balance the effect of racing ahead
later
along segment
D versus segment
C.
The former effect is the partial derivative of
f
with respect to
X; the latter effect is the time
derivative of the partial derivative of
f
with respect to the velocity.
We'd like to extract
the actual minimization condition from figure 4. Let's
translate the origin to the point marked
f0
and replace
f with its
first-order Maclaurin expansion:
Using
formula (
4), we can see from figure 4 that the
difference in
f between segment
A
and segment
B must be:
and,
since each segment covers Δt, the difference in the integral of
f
between segment
A and segment
B
must be:
Looking
at segments
C and
D on the
path, and applying
(4), we can see
that the average value of
X on
each segment is the same same as
X
at
f0,
and the slope of each segment is 2(ΔX/Δt). So, the
average value of
f on segment
C
must be:
(
N.B.
-- this step in the derivation is easy to visualize but it's not really
valid; we'll have more to say about that below.)
Similarly, the average value of
f
on segment
D must be:
Their
difference, then, will be:
And the
difference in the path integrals across segment
D
versus segment
C must be:
Finally,
the difference in the integral over path
P2 versus
path
P1 must be:
and this
difference can only be zero if
which
is, indeed, the minimization condition we wanted to find.
Oops -- That wasn't really valid.
Unfortunately,
the derivation just given isn't valid, because the first-order
expansion we used in
(4)
is only accurate in the limit as we approach the origin. It's
only valid for small values of
x and
for small values of d
x/dt. But d
x/dt
is actually on the order of Δx/Δt, which is in general not small (we
can minimize each of Δx and Δt by restricting ourselves to a small
region on
the path but their ratio is unaffected by shrinking the neighborhood).
The
use of "flat spots" in the alternate paths, where
x is
momentarily fixed while
t continues to advance, is
very nice for visualizing the minimization conditions, because it
completely splits the effect of
from the effect of
. Unfortunately, the
difference in
speed
between a "flat spot" and the main path is typically not infinitesimal!
So, we're no longer looking at an infinitesimal variation in
the path, and the derivation doesn't
really work
when done this way, even though we managed to pull out the right answer.
A
Second Derivation: A Bit Less Clear, but More Correct
As
observed above, when we look for an alternate path that's "close to"
our selected path, we can't have "flat spots" where the slope of the
alternate path goes to 0. We must keep the slope
similar to the slope of the main path. So, we can't really
split the effect of speed (slope of the path) completely apart from the
effect of traveling over different ground (value of x at each point).
But we can nonetheless derive the desired result from a very
simple diagram.
In
figure
5, we've shown a tiny piece of the main path (path "M"), and
we've shown one alternate path (path "P"). The alternate path
goes a little faster than the main path on segment A, then proceeds
more slowly along segment B until
path "M" once again catches up with it. Thus, path "P" spends
more time at larger values of
X, which
represents a
cost if
is
positive.
However, it does more of its fast traveling earlier, and can
idle along a bit later; this tradeoff represents a
benefit
if
is positive (the price of gas is going up). Thus, if the
overall tradeoff is to net to zero
(which must be true for very nearby paths, if the main path is
minimal), then
and
must again balance.
We will now derive that relationship more precisely.
Figure 5 --
The path segments to be compared, second time around:
Figure 5 is largely self explanatory
but a few things (which can be read directly from the figure) should,
perhaps, be pointed out, if only because the print on the figure may be
too small to read on some screens!
Path
M
has slope Δx/Δt in the figure (assumed constant in this tiny region).
The point
f0,
at location (
t0,
X0),
has the
average value of
f
on path
M (to first order), and the average
t
and
X values on path
M are
t0
and
X0. Path
P
takes time Δt to traverse segment
A, which
it does with slope
. During that time,
path
M traverses segment
M1;
its average
X value on segment
M1
is
Xa, and the average
X
value on segment
A is
.
Path
P
takes time Δt to traverse segment
B, which
it does with slope
.
During that time,
the average
X value of segment
M2
is
Xb, and the average
X
value on segment
B is
.
Now, let us
proceed.
The average value of
f
on segment
A of path
P, to
first order, is:
The
average value of
f on segment
B
of path
P, to first order, is:
Summing
and dividing by 2, we get the average value of
f
on path
P:
We can
make the following simple substitutions, all accurate to first order:
Simplifying
and multiplying by the total time, we obtain the integral of
f
over path
P (to first order):
But the
integral over path
M is (to first order):
So, if
F(M)
=
F(P), and keeping in mind that
"to first order" is redundant when comparing first
derivatives, we must have:
which
was to be shown.
Explicit Dependency of f
on t
f may
depend explicitly on
t as well.
Do we need to modify the above derivations in that case?
In
the
first derivation we
compare the integral over segments which are separated by time.
The differences would be affected if
were nonzero.
However, a brief inspection shows that, when we subtract the
difference between segments C and D from the difference between
segments A and B, the differences due to
will cancel and the final
result will be unaffected.
In the
second derivation, we
compared integrals only over paths with identical average times, so
nonzero
will not affect the
calculations, or the result, at all.
Page last modified 10/31/06. Reference to Lagrangian mechanics added on 11/13/06.