As I've stated elsewhere, the primary purpose of these calculus pages is to
motivate
some results which are all too often proven but not explained. On
this page, we provide (what we hope is) clear motivation for the
fundamental theorem of calculus -- but, at least initially, we will not
be providing a rigorous proof of it. (I may add one at a later
date.)
After we present the two parts of the fundamental theorem, we'll say a bit more about how the
dx notation relates to this, and discuss the visualization of
dx
as "a small change in x" and its use in understanding these formulas.
And finally, we'll say a bit about the issue of "lack of rigor"
when using "sloppy infinitesimals", and as an illustration of the
contrast between the "sloppy infinitesimal"
motivation of a theorem and a "rigorous proof", we'll present a proof of the chain rule.
The Derivative of an Integral: The Fundamental Theorem, Part I
Figure 1: A definite integral
The integral of a function is the area under its curve (figure 1).
The
derivative of the integral, with respect to its upper bound, is the rate at which the
area increases as we move the upper bound to the right.That is -- rather obviously! -- just the value of
f(x) at the upper bound.
If we add one more little "piece" to the total area under the curve, and the width of of that "piece" is
dx units, then the
area of that piece must be
f(x) ·
dx units.
If the area we added was
f(x) ·
dx units, when we moved
dx units to the right, then the
rate at which we're adding area must be -- once again --
f(x). In other words:
(1)
Figure 2: Change in an Integral
|
I just looked up the fundamental theorem in Thomas's ninth edition, a respectable calculus text, and the authors emphasize how
surprising
this result is. I find that statement inexplicable; this is among the most
obvious facts in calculus. It's also very important, however. So we
will dwell on it a bit longer.
In figure 2, we have increased the upper bound by
Δx2, and the value of the integral has increased by the area shown in pink, which is f(x) ·
Δx2. The derivative -- the
rate at which the integral increases, as the bound increases -- is that additional area, divided by the distance we moved the bound:
(2)
I should hasten to add that I've left out a limit operation here; for a finite change in the bound,
f will typically
vary over the width of the added "panel". I wrote the equation and drew the picture as though
f were constant, which is only really legitimate in the limit of infinitesimal
Δx2.
Taking the limit explicitly, and paying attention to boundary cases, adds a page or so of algebra to the
operation and doesn't much clarity; I may add a formal proof later but for the time being I'm going to stop here.
Again, what we've just shown is the rate of change
in the integral as we move the upper bound; the sum up:
The derivative
of the integral of a function is the function itself. |
The Integral of a Derivative: The Fundamental Theorem, Part II
The derivative of a function,
f, with respect to
x, is the
rate of change of
f as
x changes.
If we multiply the (average)
rate at which
f changes as
x changes, by the total change in
x, we will, of course, find the total amount by which
f changed. And that is all this theorem says. We'll illustrate this with a simple example. We'll use
speed in the example; speed is the
change of position over
time; in other words, speed is the derivative of location with respect to time.
As
our first, totally trivial example, if the average speed of a car is 30
MPH, and the car travels for an hour, it will cover 1 hour * 30 MPH =
30 miles.
If, however, the speed of the car varies, we need to be a little more clever. We might proceed as follows:
If
it starts out going 10 MPH, we can assume it maintains that speed (with
little change) for a brief period -- say, 1 minute. So, multiply
10 MPH by 1/60 of an hour, and we find out how far it traveled in the
first minute.
At the end of a minute, (we suppose) the car is traveling 12 MPH. So, to get the distance traveled during the second minute, we multiple 12 MPH by 1/60 of an hour.
And we proceed like this for the entire duration of the trip.
If
the car's speed varies very rapidly, our result may not be very
accurate; in that case we need to "divide up the time" in to smaller
intervals. We might check the car's speed every 10 seconds rather
than every minute -- or every second, or twice a second. As we
divide up the time into ever smaller intervals and sum up the product
of speed and time in each of those (tiny) intervals, our total sum will
certainly approach the correct answer, which is the total distance traveled.
The point in all this is that speed is the
derivative of position with respect to time. By summing the speed times distance over many tiny intervals, we are
integrating the speed of the car, over time -- and the result of that integral is the
total distance traveled. In this particular example, what we're finding is this:
(3)
In other words, we integrated the
derivative of the position, and obtained the
change in position.
This
should not be surprising; indeed it should seem completely natural.
It is an illustration of the second part of the fundamental
theorem of calculus, which can be written as:
(4)
Stated succinctly,
The integral of the derivative of a function is the function itself. |
As
with the first part of the fundamental theorem, the conclusion seems
clear; a detailed proof won't add a lot to the clarity. I may add
one at some future date but for now I'm going to stop here.
A Little More about "∫", and about "d", the "differential operator"
There are a number of interpretations of
d, the "differential operator". For calculus of a single real variable, the simplest interpretation is the best:
dx is a "tiny" (but imprecisely specified) change in
x. The "d" stands for
difference; it is the
difference between the new and old values of
x. This "visualization" of what is going on is not only simple, but powerful.
The
symbol also has a very simple interpretation: It's a stretched-out "S", and it means
Sum. It indicates we should take the
sum of its arguments. Let's see where these notions lead.
We have:
(5)
That last one might come as a bit of a surprise; it just seems too simple. But it makes sense: If
x changes, then
f will surely change by that same amount,
times the ratio of the amount
f changes by to the amount
x changes by! And, if we continue to think of
d as meaning "a small change in", then it's also obvious: The
dx term simply "
cancels":
Now, let's consider the
symbol in this context. It means, "Sum all the tiny changes over
a range". The simplest possible integral is this (where we
haven't specified the bounds):
(6)
Specifying the bounds, that's:
(7)
If we add up all the changes in
x in getting from
x=x
0 to
x=x
1, of course we'll just get the
total change!
There is nothing special about the variable
x in the integral; regardless of what we call the tiny differences we're summing, the result will be the
total change in the integrand. Let's look at a few examples of how this can be applied.
Suppose we have the derivative of some function:
(8)
Let's integrate it:
(9)
But now, let's plug in equation (8):
(10)
This
is, of course, just Part II of the fundamental theorem, which we
discussed earlier on this page. The point is that, if we treat
dx simply as a "small change in
x", the fundamental theorem simply "falls out".
Given an arbitrary integral, what is a "
small change" in that integral? What does it mean to apply the
d
operator to the integral? In general, what we mean in that case
is that we are asking for a small change in the value of the integral
when we change the
upper bound of integration by a small amount.
But in that case, the "change" will just the the area of the last "panel" in the sum. So, we'll have
(11)
This is a strange-looking expression. What can we do with it?
If we divide through by
dx, we just get back part I of the fundamental theorem:
(12)
In general, it's often convenient to use
d where one is actually interested in obtaining some derivative, and then divide by one of the
d terms and rearrange the equation to get the result one wants. For example:
(13)
In
(13), we started with two functions which are equal. We
differentiated both sides -- but we did it by taking "differentials".
A small change in
f is the derivative of
f with respect to
x, times a small change in
x; similarly,
g is written as a function of
y, a small change in
g is its derivative with respect to
y, times
dy. And finally, we decided we wanted the derivative of
x with respect to
y -- so we just divided through by
f' ·
dy, and voila, we have the derivative of
x with respect to
y.A Caveat: The Return of Clutter, and a Return to Rigor
No
doubt some "purists" may disapprove of the preceding section.
And, in fact, it is possible to get in trouble using
differentials that way. There are two things to keep in mind.
- The differences are not independent. dy and dx cannot actually be pulled out of the equations and treated as independent numbers (though we can think of them that way if we're a little careful).
- There is a limit operation
which we're not writing down. Without the limit process, the
equations are not actually correct. (But we can nearly always
leave out the explicit limit ... as long as we know it's there, in the
background.)
It's also worth keeping in mind that most simple visualizations only work for
well behaved -- i.e.,
smooth
-- functions. It has been said that all functions used in physics
are so smooth you could ski on their graphs. This is not
completely accurate but there's some truth in it -- the really weird
counter-example functions which crop up in mathematics don't typically
have any physical significance.
How can we put this on a more rigorous footing? First, we need to recognize that, the way we're using
dx, it isn't
really infinitesimal. We're thinking of it as being
very small. So, it might be better to write it that way. We can, for example, say:
(14)
with
perfect assurance that it's correct. However, a lot of the
equations we're interested in are not longer true when written this
way. For example, in general we'll have:
(15)
This is where the limit operation comes in; it's present, implicitly, when we use the
dx notation. What we really intend, rather than equation (15), is:
(16)
But we still have one more imprecision to deal with: We've written Δ
f and Δ
x as though they're independent. They're not. We actually need to expand Δ
f, so that the equation becomes:
(17)
At
this point, we're back to the ordinary definition of a derivative.
The point is the "missing" pieces actually are present; we're
just not writing them down.
Let's look at one more example, which is the chain rule. In
dx notation it's trivial:
(18)
The meaning is crystal clear, and the "reason" it is true is obvious: The
dv's
cancel. But now let's expand it into "rigorous form". First
we rewrite it in terms of Δx and put in the limit operations:
(
19)
This
is still more or less recognizable, and the "reason" it's true is still
apparent. Unfortunately, though, there's something else wrong
here: The thing on the left isn't
exactly the product two derivatives. The problem is that the two
dv terms in the original expression, one in the denominator and one in the numerator, are actually
independent!
They aren't necessarily the same value -- and that's the source
of the claim that you can't just "cancel" them out. Written
properly, (19) turns into this:
(20)
and
now we have completely lost the intuitive clarity of equation (18).
In fact, at first glance it's not even obvious how to prove (20)
is true -- or, indeed,
if it is true. Nearly all of the squirrelly manipulations which follow are directed at showing that we
can treat the two
dx's in (18) as having the same value, and so we really
can just "cancel them out" after all.
Throughout the following, we're going to assume that
v is
not constant at the point where we're trying to evaluate the derivative. (If it's constant there, then
Δv is zero and both (19) and (20) are in deep trouble.)
To proceed with the proof, let's start by defining
(21)
Next let's rewrite the term in brackets on left side of equation (
19) using definitions (21):
(22)
To reduce the clutter a bit, let's substitute:
(22b)
After multiplying out and substituting (22b), the right hand side of (22) becomes:
(23)
Now, as we vary Δx, the terms
u'(v(x)) and
v'(x) are fixed -- they don't change. But as Δx goes to zero, both
g(Δv) and
h(Δx) go to zero. So, all but the first term in (23) will
vanish as Δx goes to zero. But (23) was the term inside brackets in equation (
19). So, applying these results to the left side of (
19), we can conclude that,
(24)
We're finally back to something we can work with! The two terms in
v inside the brackets in (24) certainly
do cancel, giving us:
(25)
We recognize the right hand side of (25) as the derivative of
u(v(x)):
(26)
Equating the right hand sides of (24) and (26) we finally obtain the result we want:
(27)
And so we see, first, how easily we can come to the right conclusion using the
dy/
dx "sloppy infinitesimal" notation, and second, how hard it can be to actually
prove that the conclusion we came to is "really correct".
Page
created on 11/03/2007