Elementary calculus, as I said elsewhere, is geometry in a
tuxedo. It's got a fair amount of algebra thrown in for
decoration, but for the most part it can be explained in pictures.
Too
often, the explanations given for the fundamental facts of calculus are
presented in terms of epsilonics (which is valuable for proofs but
rarely of value for
explaining things), and are presented
as though
they should be difficult to grasp or obscure. My introduction to
calculus was a case in point: We used a text book that wasn't
very good, and my instructor was, to put it charitably, uninspired.
The understanding of the subject I got from that course was, to
say the least, weak.
One
of the more remarkable moments in the
class came when we had a guest lecture by the head of the math
department, who also was less than stellar at explaining the Zen of
calculus. At one point in the lecture he talked about the chain
rule. He wrote it out on
the chalk board, then observed,
"A lot of students, when they see this, want to just do this..."
and he crossed out the common terms, like this:
Then he gave a kind of conspiratorial laugh, and said, "
Of course, you can't do that!". And I sat there, wondering
why you can't do that... Surely my ignorance must have been deep, for didn't see the problem with doing that....
The aggravating thing here is that
you certainly CAN do that;
in fact, that's all there is to the chain rule! It's a product of
two ratios, and the common terms cancel. I must hasten to add
that "canceling terms" that way does not produce a rigorous proof of
the rule all by itself; for that you need to be careful of exactly how you take the
limits for small differences. But at the level of
understanding what the rule means, it's legitimate.
After
all, we can restate the rule in English: "If Joe runs twice as
fast as Sally, and Sally runs three times as fast as Tim, then Joe must
run six times as fast as Tim". There really isn't anything more to it
than that -- yet it's sometimes presented as though it's obscure, deep, and
hard to fathom.
My wretched calculus class worked out well in the end, though. Through
a combination of great good luck and lack of preparation, I bombed the
calculus AP test, and consequently had to take the subject over again in
college. That second time around, we used Thomas (which is
actually a pretty good book for studying calculus restricted to 3
dimensions, in my opinion). Better still, my instructor in the
course, Gene Kleinberg, was one of the clearest lecturers I've ever
encountered. I can still recall sitting in class one day (about 30 years ago...) in
absolute amazement as he
explained why Taylor series is the way
it is -- I had learned it "by rote" in my high school class, with no
idea that it actually could be presented in a way that
made sense. But we'll go into that more later on.
Some Definitions: What's That "dx" Thing?
We
need to start somewhere and this is as good a place as any. I
lost sleep over what "dx" meant when I was first learning this stuff.
The instructor couldn't really say, beyond saying it was
something more than just "a small change", and the text we used went
off into hyperspace by attempting to introduce the concept of a 1-form
into an otherwise rather shallow elementary calculus course.
You
don't need to do any of that. There are actually multiple legitimate ways to define
"dx", but for understanding elementary calculus, the simplest is the
best: It's a tiny change in "x". We can call it an
"
infinitesimal" change if we like.
You can use "dx" in an equation, standing alone; you can divide "dy" by "dx" to get the
rate at which y is changing as x changes,
and in most cases it will work out with no problem. The technical
term for what we're doing when we think of it this way is using "
physicist's sloppy infinitesimals".
It's not rigorous -- not without some care and more work -- but
it conveys the meaning very well, and in general it behaves just fine.
I dare say it's the way Leibnitz thought of it, too.
Now, I keep saying "It's not rigorous". What's that mean? It means we haven't proved it works in
all cases, and in fact, absent such a proof, one should usually suspect that something
doesn't
work in all cases. But the cases where using "sloppy
infinitesimals" doesn't work are likely to be pretty pathological, and
we don't need to worry about them if all we're trying to do is
understand the subject.
Just how far are we from "rigor" when we say "
dx is an infinitesimal change in X"? Not very far, it turns out -- an equation with
dx and
dy can typically be replaced with an almost identical equation which uses "Δx" in place of "dx", and which is true
in the limit as
Δx is made very small. But for picturing it all, we can treat dx
as an infinitesimal, and forget, for a while, that there's a limit
operation lurking in the background.
Integrals
The integral is, of course, the area under a curve ... or so it's usually represented.
The key here is that it's an
area, or more generally a volume of some sort.
The common notation is actually a recipe for taking an integral:
That
symbol on the left is actually a large, stylized "S", and it stands for
"Sum". The "dx" is, as we already said, a tiny change in X.
So, the whole thing says you slice up the region you're
interested into tiny chunks, each of which is
dx units wide,
you find the area (or volume) of each one, and you add them up.
And that is the definition we will take as "an integral".
To restate it:
Definition 1:
The integral of a function, from point a to point b, is the area under the function's
curve between a and b, if you plot it ... and it's found by dividing up the region over
which we're integrating into tiny (infinitesimal) slices, finding the area (or volume) of each slice, and adding them up.The
area (or volume) of an infinitesimal slice will just be the value of
the function we're integrating, times the width of the slice.
This isn't the most general definition of an integral but it's a sensible way to think of it.
To
make it rigorous, one either needs to rigorously define infinitesimals,
or define it in terms of limits. Just to be complete we'll
mention a definition of the Riemann integral here, which is what the
notation corresponds to most closely:
Definition 1b: where: and: Again, this isn't the most general definition of an integral, but for our purposes here it will be just fine.
Derivatives
The classic example of a derivative is speed: how much distance is covered per unit time.
Leibnitz's notation is nice because it says exactly what the derivative is:
Definition 2: The derivative of f with respect to x is the ratio of the change in f to the change in x when x changes a little (infinitesimal) bit:
Now, formally, we haven't defined "infinitesimal" so it's sloppy to say "
dx
is an infinitesimal". To be rigorous we either need to precisely
define infinitesimals (which takes more work than I'm willing to put in
on this, and takes us far afield from what I want to do with this
subject) or we need to present the definition in terms of limits.
Just to be complete we'll mention it in passing:
Definition 2b: where: and that's about all we'll have to say about limits here.
Fundamental Properties
There
are a number of basic properties of integrals and derivatives which
should be obvious, simply from the definitions given above. I
won't be giving proofs for any of these.
And that about does it for the introduction. From here on it will be pictures (almost) all the way.
One More Thing: What About Robinson, and Non-Standard Analysis?
Back
in the 1960's, the use of infinitesimals was placed on a secure footing
by Abraham Robinson. It turns out it's possible to
prove that infinitesimals exist (well, they exist in the world of mathematics, anyway, if not
exactly
in the physical universe). Use of these "rigorous infinitesimals"
provides a powerful tool for understanding analysis. However, the
proofs which result tend to be highly algebraic, and that's not a
direction I want to go in this section of the website. My goal is
to explain many of the basic principles using simple geometry and
pictures, and the use of nonstandard analysis doesn't help with that.
To
show what I mean, I'll give an example. (Since this is chosen to
show that the hyperreal approach is not a gateway to instant clarity,
don't expect the meaning to be glaringly obvious!)
Here's
a proof of the product rule for derivatives, using
infinitesimals. (We'll be going over this later, using pictures
-- it's another example of something which is obvious when pictured
properly, and the derivation I'm about to give is certainly
not what I mean by a "proper explanation"):
A
little explanation is needed here, of course, since we've just
introduced an entirely new discipline, with new notation and new
semantics!
(I should also mention that the notation used here is
what I learned in college, and it may not match anything you find in
current use in this field. Hence, there is a
double need for an explanation of what I just did...)
Anything with a tilde over it is an
infinitesimal
(an actual, legitimate one this time). The "*" operator takes the
"real" (non-infinitesimal) part of a value: it returns the
real part, leaving off the infinitesimal fuzz. Two values are equal within the first-order real numbers
if their
real parts are equal -- in other words, if the "*" operator returns the same value for both.
In hyperreal calculus, we take a derivative by finding the change in a function when we move an
infinitesimal
amount. We divide that change by the (infinitesimal) distance
moved, and then take the real part of the result. The proof above
shows that the (real part of the) derivative of the product of two
functions, given on the first line, is equal to the product of one with
the other's derivative plus the product of the second with the first's
derivative, which is the point of the exercise.
A couple of the lines in the middle may not be obvious, however.
On the second line, I introduced
s and
t, which are both (unknown) infinitesimals. I replaced
with
-- that is, I replaced the value of
f(x) at a point infinitesimally far from
x with its value at x, plus the derivative of
f, plus an infinitesimal error term,
times the (infinitesimal) distance moved. That's legitimate as long as
f is well-behaved at
x
-- and if it's not, then it's not differentiable there either and the
proof is a bust no matter how you do it. And I did the same for
g(x), using
t for the second unknown infinitesimal.
On
the fourth line, "u" suddenly appears, pulled out of a hat, as it were.
But it's nothing very obscure: I just collected all the
terms which I can see are
infinitesimal, and lumped them
together in a new variable, which I named "u". I did that because
I know that the "*" operator is going to throw away everything that's
infinitesimal. And I lumped every term which was multiplied by
any infinitesimal into "u" because I know that any infinitesimal times
any (finite) real number is also infinitesimal.
In many ways
it's a very cute proof. It didn't require taking any explicit
limits; in fact all it used was simple fractions, straight out of high
school algebra -- yet it really is a rigorous proof (unless I botched
the algebra). I haven't done it justice here; it's possible to
write it out far more readably than my rather hasty scribble. But
-- and for me, this is a big "
but" -- it is
wholly non-pictorial. It's purely symbolic.
So,
since the whole point of this section is to get away from the (too
often opaque) use of symbols and do as much as possible with pictures,
we won't be pursuing the hyperreal path any farther -- and a little
farther along, we will see the product rule again, "done right".
Page first posted on 11/04/2007. Minor typos corrected on 2/27/2008.