Transcript of a keynote address presented at MathML and Math on the Web: MathML International Conference 2000
Most mathematical notation now in use is between one and
five hundred years old. I will review how it developed, with precursors in antiquity
and the Middle Ages, through its definition at the hands of Leibniz, Euler,
Peano and others, to its widespread use in the nineteenth and twentieth centuries.
I will discuss the extent to which mathematical notation is like ordinary human
language—albeit international in scope. I will show that some general principles
that have been discovered for ordinary human language and its history apply
to mathematical notation, while others do not.
Given its historical basis, it might have been that mathematical notation—like
natural language—would be extremely difficult for computers to understand.
But over the past five years we have developed in Mathematica capabilities for
understanding something very close to standard mathematical notation. I will
discuss some of the key ideas that made this possible, as well as some features
of mathematical notation that we discovered in doing it.
Large mathematical expressions—unlike pieces of ordinary text—are often generated
automatically as results of computations. I will discuss issues involved in
handling such expressions and making them easier for humans to understand.
Traditional mathematical notation represents mathematical objects but not mathematical
processes. I will discuss attempts to develop notation for algorithms, and experiences
with these in APL, Mathematica, theorem-proving programs and other systems.
Ordinary language involves strings of text; mathematical notation often also
involves two-dimensional structures. I will discuss how mathematical notation
might make use of more general structures, and whether human cognitive abilities
would be up to such things.
The scope of a particular human language is often claimed to limit the scope
of thinking done by those who use it. I will discuss the extent to which traditional
mathematical notation may have limited the scope of mathematics, and some of
what I have discovered about what generalizations of mathematics might be like.
When this conference was first being put together, people thought it would be good to have someone talk about general issues of mathematical notation. And there was an obvious candidate speaker—a certain Florian Cajori—author of a classic book entitled A History of Mathematical Notation. But upon investigation, it turned out that there was a logistical problem in inviting the esteemed Dr. Cajori—he has been dead for no less than seventy years.
So I guess I’m the substitute.
And I think there weren’t too many other possible choices. Because it turns out that so far as we could all tell, there’s almost nobody who’s alive today who’s really thought that much about basic issues regarding mathematical notation.
In the past, the times these things have ever been thought about, even a bit, have mostly coincided with various efforts to systematize mathematics. So Leibniz and other people were interested in these things in the mid-1600s. Babbage wrote one of his rather ponderous polemics on the subject in 1821. And at the end of the 1800s and the beginning of the 1900s, when abstract algebra and mathematical logic were really getting going, there was another burst of interest and activity. But after that, pretty much nothing.
But in a sense it’s not surprising that I’ve been interested in this stuff. Because with Mathematica, one of my big goals has been to take another big step in what one can think of as the systematization of mathematics. And I guess in general my goal with Mathematica was somehow to take the general power of computation and harness it for all kinds of technical and mathematical work. There are really two parts to that: how the computation inside works, and how people get to direct that computation to do what they want.
One of the big achievements of Mathematica, as probably most of you know, was to figure out a very general way to have the computations inside work and also be practical—based on just doing transformations on symbolic expressions, with the symbolic expressions representing data or programs or graphics or documents or formulas or whatever.
But just being able to do computations isn’t enough; one also has to have a way for people to tell Mathematica what computations they want done. And basically the way that people seem to communicate anything sophisticated that they want to communicate is by using some kind of language.
Normally, languages arise through some sort of gradual historical process of consensus. But computer languages have historically been different. The good ones tend to get invented pretty much all at once, normally by just one person.
So what’s really involved in doing that?
Well, at least the way I thought about it for Mathematica was this: I tried to think of all the possible computations that people might want to do, and then I tried to see what chunks of computational work came up over and over again. And then essentially I gave names to those chunks, and we implemented the chunks as the built-in functions of Mathematica.
In a sense, we were leveraging on English in a crucial way in doing this because the names we used for those chunks were based on ordinary English words. And that meant that just by knowing English, one could get at least somewhere in understanding something written in Mathematica.
But of course the Mathematica language isn’t English—it’s a tremendously stylized fragment of English, optimized for explaining computations to Mathematica.
One might think that perhaps it might be nice if one could just talk full English to Mathematica. After all, we already know English so we wouldn’t have to learn anything new to talk to Mathematica.
But I think there are some good reasons why we’re better off thinking at a basic level in Mathematica than in English when we think about the kinds of computations that Mathematica does.
But quite independent of this we all know that having computers understand full natural language has turned out to be very hard.
OK, so what about mathematical notation?
Since most of the people who use Mathematica already know at least some mathematical notation, it seems like it would be really convenient if we could just have Mathematica understand ordinary familiar mathematical notation.
But one might have thought that that just wouldn’t work at all. Because one might have thought that one would run into something like what one runs into with ordinary human natural language.
But here’s the surprising fact—that certainly surprised me a lot. Unlike with ordinary human natural language, it is actually possible to take a very close approximation to familiar mathematical notation, and have a computer systematically understand it. That’s one of the big things that we did about five years ago in the third version of Mathematica. And at least a little of what we learned from doing that actually made its way into the specification of MathML.
What I want to talk about here today is some of the general principles that I’ve noticed in mathematical notation, and what those mean now and in the future.
This isn’t really a problem about mathematics. It’s really more a problem in linguistics. It’s not about what mathematical notation could conceivably be like; it’s about what mathematical notation as it’s actually used is actually like—as it’s emerged from history and presumably from the constraints of human cognition and so on.
And, in fact, I think mathematical notation is a pretty interesting example for the field of linguistics.
You see, what’s mostly been studied in linguistics has actually been spoken languages. Even things like punctuation marks have barely been looked at. And as so far as I know, no serious linguistic study of mathematical notation has been made at all in the past.
Normally in linguistics there are several big directions that one can take. One can see how languages evolved historically. One can see what happens when individual people learn to speak languages. And one can try to make empirical models of the structures that languages end up having.
Let’s talk first for a while about history.
So, where did all the mathematical notation that we use today come from?
Well, that’s all bound up with the history of mathematics itself, so we have to talk a bit about that. People often have this view that mathematics is somehow the way it is because that’s the only conceivable way it could be. That somehow it’s capturing what arbitrary abstract systems are like.
One of the things that’s become very clear to me from the big science project that I’ve been doing for the past nine years is that such a view of mathematics is really not correct. Mathematics, as it’s practiced, isn’t about arbitrary abstract systems. It’s about the particular abstract system that happens to have been historically studied in mathematics. And if one traces things back, there seem to be three basic traditions from which essentially all of mathematics as we know it emerged: arithmetic, geometry, and logic.
All these traditions are quite old. Arithmetic comes from Babylonian times, geometry perhaps from then but certainly from Egyptian times, and logic from Greek times.
And what we’ll see is that the development of mathematical notation—the language of mathematics—had a lot to do with the interplay of these traditions, particularly arithmetic and logic.
One thing to realize is that these three traditions probably came from rather different things, and that’s greatly affected the kinds of notation they use.
Arithmetic presumably came from commerce and from doing things like counting money, though it later got pulled into astrology and astronomy. Geometry presumably came from land surveying and things like that. And logic we pretty much know came from trying to codify arguments made in natural language.
It’s notable, by the way, that another very old kind of (formal) tradition that I’ll talk about a bit more later—grammar—never really got integrated with mathematics, at least not until extremely recently.
So let’s talk about notation that was used in the different early traditions for mathematics.
First, there’s arithmetic. And the most basic thing for arithmetic is numbers. So what notations have been used for numbers?
Well, the first representations for numbers that we recognize are probably notches in bones made perhaps 25,000 years ago. They typically worked in unary: to represent 7, you made 7 notches, and so on.
Well, of course we don’t absolutely know these were the first representations of numbers. I mean, we might not have found any artifacts from earlier representations. But also, if someone had invented a really funky representation for numbers, and put it in, for example, their cave painting, we might not know that it was a representation for numbers; it might just look like a piece of decoration to us.
So numbers can be represented in unary. And that idea seems to have gotten reinvented quite a few times, in many different parts of the world.
But when one looks at what happened beyond that, there’s quite a bit of diversity. It reminds one a little of the different kinds of constructions for sentences or verbs or whatever that got made up in different natural languages.
And actually one of the biggest issues with numbers, I suspect, had to do with a theme that’ll show up many more times: just how much correspondence should there really be between ordinary natural language and mathematical language?
Here’s the issue: it has to do with reusing digits, and with the idea of positional notation.
You see, in natural language one tends to have a word for “ten”, another word for “a hundred”, another for “a thousand”, “a million” and so on. But we know that mathematically we can represent ten as “one zero” (10), a hundred as “one zero zero” (100), a thousand as “one zero zero zero” (1000), and so on. We can reuse that 1 digit, and make it mean different things depending on what position it appears in the number.
Well, this is a tricky idea, and it took thousands of years for people generally to really grock it. And their failure to grock it had big effects on the notation that they used, both for numbers and other things.
As happens so often in human history, the right idea was actually had very early, but somehow didn’t win for a long time. You see, more than 5000 years ago, the Babylonians—and probably the Sumerians before them—had the idea of positional notation for numbers. They mostly used base 60—not base 10—which is actually presumably where our hours, minutes, seconds scheme comes from. But they had the idea of using the same digits to represent multiples of different powers of 60.
Here’s an example of their notation.
You see from this picture why archeology is hard. This is a very small baked piece of clay. There are about half a million of these little Babylonian tablets that have been found. And about one in a thousand—about 400 of them altogether—have math on them. Which, by the way, is a somehat higher fraction of texts having math in them than I think you’d find on the web today, although until MathML gets more common, it’s a little hard to figure that out.
But anyway, the little markings on this tablet look somewhat like little tiny bird foot impressions. But about 50 years ago it finally got figured out that this cuneiform tablet from the time of Hammurabi—around 1750 BC—is actually a table of what we now call Pythagorean triples.
Well, this fine abstract Babylonian scheme for doing things was almost forgotten for nearly 3000 years. And instead, what mostly was used, I suspect, were more natural-language-based schemes, where there were different symbols for tens, hundreds, etc.
So, for example, in Egyptian the symbol for a thousand was a lotus flower icon, and a hundred thousand a bird, and so on. Each different power of ten had a different symbol for it.
Then there was another big idea, which by the way the Babylonians didn’t have, nor did the Egyptians. And that was to have actual characters for digits: not to make up a 7 digit with 7 of something, and so on.
The Greeks—perhaps following the Phoenicians—did have this idea, though. Actually, they had a slightly different idea. Their idea was to label the sequence of numbers by the sequence of letters in their alphabet. So alpha was 1, beta was 2, and so on.
So here’s how a table of numbers look in Greek notation.
In[1]:= Range[200]
Out[1]=
In[2]:= GreekNumeralForm[%]
Out[2]=
(I guess this is how the sysadmins at Plato’s Academy would have customized their version of Mathematica; their virtual Mathematica version –600 or whatever.)
There are all kinds of problems with this scheme for numbers. For example, there’s a serious versioning problem: even if you decide to drop letters from your alphabet, you have to leave them in your numbers, or else all your previously-written numbers get messed up.
So that means that there are various obsolete Greek letters left in their number system: like koppa for 90 and sampi for 900. But since—as a consequence of my fine English classical education—I included these in the character set for Mathematica, our Greek number form works just fine here.
A while after this Greek scheme for numbers, the Romans came up with their number form that we’re still familiar with today.
And while it’s not clear that their number-letters actually started as letters, they certainly ended up that way.
So let’s try Roman number form.
In[3]:= RomanNumeralForm/@Range[200]
Out[3]=
It’s also a rather inconvenient scheme, particularly for big numbers.
In[4]:= 2^30
Out[4]=
In[5]:= RomanNumeralForm[%]
Out[5]=
It has all sorts of other fun features though. For example, the length of the representation of a number increases fractally with the size of the number.
And in general, for big numbers, schemes like this get in a lot of trouble. Like when Archimedes wrote his very nice sand reckoner paper about estimating the number of grains of sand that it would take to fill the universe—which he figured was about 1051. (I think the right answer is about 1090.) Well, he ended up pretty much just using words, not notation, to describe numbers that big.
But actually, there was a more serious conceptual problem with the letters-as-numbers idea: it made it very difficult to invent the concept of symbolic variables—of having some symbolic thing that stands for a number. Because any letter one might use for that symbolic thing could be confused with a piece of the number.
The general idea of having letters stand symbolically for things actually came quite early. In fact, Euclid used it in his geometry.
We don’t have any contemporary versions of Euclid. But a few hundred years later, there are at least versions of Euclid. Here’s one written in Greek.
And what you can see on these geometrical figures are points symbolically labeled with Greek letters. And in the description of theorems, there are lots of things where points and lines and angles are represented symbolically by letters. So the idea of having letters symbolically stand for things came as early as Euclid.
Actually, this may have started happening before Euclid. If I could read Babylonian I could probably tell you for sure. But here is a Babylonian tablet that relates to the square root of two that uses Babylonian letters to label things.
I guess baked clay is a more lasting medium that papyrus, so one actually knows more about what the original Babylonians wrote than one knows what people like Euclid wrote.
Generally, this failure to see that one could name numerical variables is sort of an interesting case of the language or notation one uses preventing a certain kind of thinking. That’s something that’s certainly discussed in ordinary linguistics. In its popular versions, it’s often called the Sapir–Whorf hypothesis.
And of course those of us who’ve spent some part of our lives designing computer languages definitely care about this phenomenon. I mean, I certainly know that if I think in Mathematica, there are concepts that are easy for me to understand in that language, and I’m quite sure they wouldn’t be if I wasn’t operating in that language structure.
But anyway, without variables things were definitely a little difficult. For example, how do you represent a polynomial?
Well, Diophantus—the guy who did Diophantine equations—had the problem of representing things like polynomials around 150 AD. He ended up with a scheme that used explicit letter-based names for squares, cubes, and things. Here’s how that worked.
In[6]:= Select[Table[Cyclotomic[i,x],{i,20}],Exponent[#,x]<6&]
Out[6]=
In[7]:= DiophantinePolynomialForm[%]
Out[7]=
At least to us now, Diophantus’ notation for polynomials looks extremely hard to understand. It’s an example of a notation that’s not a good notation. I guess the main reason—apart from the fact that it’s not very extensible—is that it somehow doesn’t make clear the mathematical correspondences between different polynomials and it doesn’t highlight the things that we think are important.
There are various other schemes people came up with for doing polynomials without variables, like a Chinese scheme that involved making a two dimensional array of coefficients.
The problem here, again, is extensibility. And one sees this over and over again with graphically-based notations: they are limited by the two dimensions that are available on a piece of paper or papyrus, or whatever.
OK, so what about letters for variables?
For those to be invented, I think something like our modern notation for numbers had to be invented. And that didn’t happen for a while. There are a few hints of Hindu-Arabic notation in the mid-first-millennium AD. But it didn’t get really set up until about 1000 AD. And it didn’t really come to the West until Fibonacci wrote his book about calculating around 1200 AD.
Fibonacci was, of course, also the guy who talked about Fibonacci numbers in connection with rabbits, though they’d actually come up more than a thousand years earlier in connection with studying forms of Indian poetry. And I always find it one of those curious and sobering episodes in the history of mathematics that Fibonacci numbers—which arose incredibly early in the history of western mathematics and which are somehow very obvious and basic—didn’t really start getting popular in mainstream math until maybe less than 20 years ago.
Anyway, it’s also interesting to notice that the idea of breaking digits up into groups of three to make big numbers more readable is already in Fibonacci’s book from 1202, I think, though he talked about using overparens on top of the numbers, not commas in the middle.
After Fibonacci, our modern representation for numbers gradually got more popular, and by the time books were being printed in the 1400s it was pretty universal, though there were still a few curious pieces of backtracking.
But still there weren’t really algebraic variables. Those didn’t get started pretty much until Vieta at the very end of the 1500s, and they weren’t common until way into the 1600s. So that means people like Copernicus didn’t have them. Nor for the most part did Kepler. So these guys pretty much used plain text, or sometimes things structured like Euclid.
By the way, even though math notation hadn’t gotten going very well by their time, the kind of symbolic notation used in alchemy, astrology, and music pretty much had been developed. So, for example, Kepler ended up using what looks like modern musical notation to explain his “music of the spheres” for ratios of planetary orbits in the early 1600s.
Starting with Vieta and friends, letters routinely got used for algebraic variables. Usually, by the way, he used vowels for unknowns, and consonants for knowns.
Here’s how Vieta wrote out a polynomial in the scheme he called “zetetics” and we would now call symbolic algebra:
In[8]:= Table[Cyclotomic[i,x]==0,{i,20}]
Out[8]=
In[9]:= VieteQuadraticForm[%,x]
Out[9]=
You see, he uses words for the operations, partly so the operations won’t be confused with the variables.
So how did people represent operations?
Well, the idea that operations are even something that has to represented probably took a long time to arrive. The Babylonians didn’t usually use operation symbols: for addition they mostly just juxtaposed things. And generally they tended to put things into tables so they didn’t have to write out operations.
The Egyptians did have some notation for operations—they used a pair of legs walking forwards for plus, and walking backwards for minus—in a fine hieroglyphic tradition that perhaps we’ll even come back to a bit in future math notation.
But the modern + sign—which was probably a shorthand for the Latin “et” for “and”—doesn’t seem to have arisen until the end of the 1400s.
But here, from 1579, is something that looks almost modern, particular being written in English, until you realize that those funny squiggles aren’t x’s—they’re special non-letter characters that represent different powers of the variable.
In the early to mid-1600s there was kind of revolution in math notation, and things very quickly started looking quite modern. Square root signs got invented: previously Rx—the symbol we use now for medical prescriptions—was what was usually used. And generally algebraic notation as we know it today got established.
One of the people who was most serious about this was a fellow called William Oughtred. One of the things he was noted for was inventing a version of the slide rule. Actually he’s almost like an unknown character. He wasn’t a major research mathematician, but he did some nice pedagogical stuff, with people like Christopher Wren as his students. It’s curious that I’d certainly never heard about him in school—especially since it so happens that he went to the same high school as me, though 400 years earlier. But the achievement of inventing a slide rule was not sufficiently great to have landed him a place in most mathematical history.
But anyway, he was serious about notation. He invented the cross for multiplication, and he argued that algebra should be done with notation, not with words, like Vieta had done. And actually he invented quite a bit of extra notation, like these kinds of squiggles, for predicates like IntegerQ.
Well, after Oughtred and friends, algebraic notation pretty quickly settled in. There were weird sidetracks—like the proposal to use waxing and waning moon symbols for the four operations of arithmetic: a fine example of poor, nonextensible design. But basically modern stuff was being used.
Here is an example.
This is a fragment of Newton’s manuscript for the Principia that shows Newton using basically modern looking algebraic notation. I think Newton was the guy, for example, who invented the idea that you can write negative powers of things instead of one over things and so on. The Principia has rather little notation in it, except for this algebraic stuff and Euclid-style presentation of other things. And actually Newton was not a great notation enthusiast. He barely even wanted to use the dot notation for his fluxions.
But Leibniz was a different story. Leibniz was an extremely serious notation buff. Actually, he thought that having the right notation was somehow the secret to a lot of issues of human affairs in general. He made his living as a kind of diplomat-analyst, shuttling between various countries, with all their different languages, and so on. And he had the idea that if one could only create a universal logical language, then everyone would be able to understand each other, and figure out anything.
There were other people who had thought about similar kinds of things, mostly from the point of view of ordinary human language and of logic. A typical one was a rather peculiar character called Ramon Llull, who lived around 1300, and who claimed to have developed various kinds of logic wheels—that were like circular slide rules—that would figure out answers to arbitrary problems about the world.
But anyway, what Leibniz really brought to things was an interest also in mathematics. What he wanted to do was somehow to merge the kind of notation that was emerging in mathematics into a precise version of human language to create a mathematics-like way of describing and working out any problem—and a way that would be independent, and above, all the particular natural languages that people happened to be using.
Well, like many other of his projects, Leibniz never brought this to fruition. But along the way he did all sorts of math, and was very serious about developing notation for it. His most famous piece of notation was invented in 1675. For integrals, he had been using “omn.”, presumably standing for omnium. But on Friday October 29, 1675 he wrote the following on a piece of paper.
What we see on this fragment of paper for the first time is an integral sign. He thought of it as an elongated S. But it’s obviously exactly our modern integral sign. So, from the very first integral sign to the integral sign we use today, there’s been very little change.
Then on Thursday November 11 of the same year, he wrote down the “d” for derivative. Actually, he said he didn’t think it was a terribly good notation, and he hoped he could think of a better one soon. But as we all know, that didn’t happen.
Well, Leibniz corresponded with all kinds of people about notation. He thought of himself as chairing what would now be called a standards committee for mathematical notation. He had the point of view that notation should somehow be minimal. So he said things like, “Why use a double set of two dots for proportion, when you could use just one?”
He tried a few ideas that haven’t worked out. For example, whereas he used letters to stand for variables, he used astronomical signs to stand for complete expressions: kind of an interesting idea, actually.
And he had a notation like this for functions.
In[10]:= f[x] + g[x] + f[x, y] + h[u, v] //LeibnizForm
Out[10]=
In[8]:= f[x] + g[x] + f[x, y] + h[u, v] + h[x] + f[a, b, c] //LeibnizForm
Out[8]=
Well, apart from these things, and with a few exceptions like the “square intersection” sign he used for equal, Leibniz pretty much settled on the notation that still gets used today.
Euler, in the 1700s, was then a big systematic user of notation. But he pretty much followed what Leibniz had set up. I think, though, that he was the first serious user of Greek as well as Roman letters for variables.
There are some other pieces of notation that came shortly after Leibniz. This next example is actually from a book under the Newton brand name published a few years after Newton died. It’s a text book of algebra and it shows very traditional algebraic notation already being printed.
Here is a book by L’Hôpital, printed around the same time, showing pretty much standard algebraic notation.
And finally, here is an example from Euler, showing very much modern notation for integrals and things.
One of the things that Euler did that is quite famous is to popularize the letter 𝜋 for pi—a notation that had originally been suggested by a character called William Jones, who thought of it as a shorthand for the word perimeter.
So the notation of Leibniz and friends pretty much chugged along unchanged for quite a while. A few pieces of streamlining happened, like x x becoming always written as x2. But not much got added.
But then one gets to the end of the 1800s, and there’s another burst of notational activity, mostly associated with the development of mathematical logic. There was some stuff done by physicists like Maxwell and Gibbs, particularly on vectors and vector analysis, as an outgrowth of the development of abstract algebra. But the most dramatic stuff got done by people starting with Frege around 1879 who were interested in mathematical logic.
What these people wanted to do was a little like part of what Leibniz had wanted to do. They wanted to develop notation that would represent not only formulas in mathematics, but also deductions and proofs in mathematics. Boole had shown around 1850 that one could represent basic propositional logic in mathematical terms. But what Frege and people wanted to do was to take that further and represent predicate logic and, they hoped, arbitrary mathematical arguments in mathematical terms and mathematical notation.
Frege decided that to represent what he wanted to represent, he should use a kind of graphical notation. So here’s a piece of his so-called “concept notation.”
Unfortunately, it’s very hard to understand. And actually if one looks at all of notational history, essentially every piece of graphical notation that anyone’s ever tried to invent for anything seems to have had the same problem of being hard to understand. But in any case, Frege’s notation definitely didn’t catch on.
Then along came Peano. He was a major notation enthusiast. He believed in using a more linear notation. Here’s a sample:
Actually, in the 1880s Peano ended up inventing things that are pretty close to the standard notations we use for most of the set-theoretical concepts.
But,a little like Leibniz, he wasn’t satisfied with just inventing a universal notation for math. He wanted to have a universal language for everything. So he came up with what he called Interlingua, which was a language based on simplified Latin. And he ended up writing a kind of summary of mathematics—called Formulario Mathematico—which was based on his notation for formulas, and written in this derivative of Latin that he called Interlingua.
Interlingua, like Esperanto—which came along about the same time—didn’t catch on as an actual human language. But Peano’s notation eventually did. At first nobody much knew about it. But then Whitehead and Russell wrote their Principia Mathematica, and in it they used Peano’s notation.
I think Whitehead and Russell probably win the prize for the most notation-intensive non-machine-generated piece of work that’s ever been done. Here’s an example of a typical page from Principia Mathematica.
They had all sorts of funky notations in there. In fact, I’m told—in a typical tale often heard of authors being ahead of their publishers—that Russell ended up having to get fonts made specially for some of the notation they used.
And, of course, in those days we’re not talking about TrueType or Type 1 fonts; we’re talking about pieces of lead. And I’m told that Russell could actually be seen sometimes wheeling wheelbarrows full of lead type over to the Cambridge University Press so his books could be appropriately typeset.
Well, for all that effort, the results were fairly grotesque and incomprehensible. I think it’s fairly clear that Russell and Whitehead went too far with their notation.
And even though the field of mathematical logic scaled back a bit from Russell and Whitehead, it’s still the field that has the most complicated notation of any, and the least standardization.
But what about what’s normally thought of as more mainstream mathematics?
For a while, at the beginning of the 1900s, there was almost no effect from what had been done in mathematical logic. But then, when the Bourbaki movement in France started taking root in the 1940s or so, there was suddenly a change.
You see, Bourbaki emphasized a much more abstract, logic-oriented approach to mathematics. In particular, it emphasized using notation whenever one could, and somehow minimizing the amount of potentially imprecise text that had to be written.
Starting around the 1940s, there was a fairly sudden transition in papers in pure mathematics that one can see by looking at journals or ICM proceedings or something of that kind. The transition was from papers that were dominated by text, with only basic algebra and calculus notation, to ones that were full of extra notation.
Of course, not all places where math gets used followed this trend. It’s kind of like what’s often done in linguistics of ordinary natural languages. One can see when different fields that use mathematics peeled off from the main trunk of mathematical development by looking at what vintage of mathematical notation they use. So, for example, we can tell that physics pretty much peeled off around the end of the 1800s, because that’s the vintage of math notation that it uses.
There’s one thing that comes through in a lot of this history, by the way: notation, like ordinary language, is a dramatic divider of people. I mean, there’s somehow a big division between those who read a particular notation and those who don’t. It ends up seeming rather mystical. It’s like the alchemists and the occultists: mathematical notation is full of signs and symbols that people don’t normally use, and that most people don’t understand.
It’s kind of curious, actually, that just recently there’ve been quite a few consumer product and service ads that have started appearing that are sort of centered around math notation. I think for some reason in the last couple of years, mathematical notation is becoming chic. Here’s an example of an ad that’s running right now.
But the way one tends to see math notation used, for example in math education, reminds me awfully of things like symbols of secret societies and so on.
Well, so that’s a rough summary of some of the history of mathematical notation.
And after all that history, there’s a certain notation that’s ended up being used. Apart from a few areas like mathematical logic, it’s become fairly standardized. There are not a lot of differences in how it’s used. Whatever ordinary language a book or paper is written in, the math pretty much always looks the same.
But now the question is: can computers be set up to understand that notation?
That depends on how systematic it really is, and how much the meaning of a piece of math can really be deduced just from the way it’s written down.
Well, as I hope I’ve shown you, the notation we have today has arisen through a pretty haphazard historical process. There have been a few people, like Leibniz and Peano, who’ve tried to think about it systematically. But mostly it’s just developed through usage, pretty much like ordinary human languages do.
And one of the surprising things is that so far as I know, there’s almost never been any kind of introspective study done on the structure of mathematical notation.
For ordinary human language, people have been making grammars for ages. Certainly lots of Greek and Roman philosophers and orators talked about them a lot. And in fact, already from around 500 BC, there’s a remarkably clean grammar for Sanskrit written by a person called Panini. In fact, Panini’s grammar is set up remarkably like the kind of BNF specifications of production rules that we use now for computer languages.
And not only have there been grammars for language; in the last centuries or so, there have been endless scholarly works on proper language usage and so on.
But despite all this activity about ordinary language, essentially absolutely nothing has been done for mathematical language and mathematical notation. It’s really quite strange.
There have even been mathematicians who’ve worked on grammars for ordinary language. An early example was John Wallis—who made up Wallis’ product formula for pi—who wrote a grammar for English in 1658. Wallis was also the character who started the whole fuss about when one should use “will” and when one should use “shall.”
In the early 1900s mathematical logicians talked quite a bit about different layers in well-formed mathematical expressions: variables inside functions inside predicates inside functions inside connectives inside quantifiers. But not really about what this meant for the notation for the expressions.
Things got a little more definite in the 1950s, when Chomsky and Backus, essentially independently, invented the idea of context-free languages. The idea came out of work on production systems in mathematical logic, particularly by Emil Post in the 1920s. But, curiously, both Chomsky and Backus came up with the same basic idea in the 1950s.
Backus applied it to computer languages: first Fortran, then ALGOL. And he certainly noticed that algebraic expressions could be represented by context-free grammars.
Chomsky applied the idea to ordinary human language. And he pointed out that to some approximation ordinary human languages can be represented by context-free grammars too.
Of course, linguists—including Chomsky—have spent years showing how much that isn’t really true. But the thing that I always find remarkable, and scientifically the most important, is that to a first approximation it is true that ordinary human languages are context-free.
So Chomsky studied ordinary language, and Backus studied things like ALGOL. But neither seems to have looked at more advanced kinds of math than simple algebraic language. And, so far as I can tell, nor has almost anyone else since then.
But if you want to see if you can interpret mathematical notation, you have to know what kind of grammar it uses.
Now I have to tell you that I had always assumed that mathematical notation was too haphazard to be used as any kind of thing that a computer could reasonably interpret in a rigorous way. But at the beginning of the 1990s we got interested in making Mathematica be able to interact with mathematical notation. And so we realized that we really had to figure out what was going on with mathematical notation.
Neil Soiffer had spent quite a number of years working on editing and interpreting mathematical notation, and when he joined our company in 1991, he started trying to convince me that one really could work with mathematical notation in a reasonable way, for both output and input.
The output side was pretty straightforward: after all, TROFF and TeX already did a moderately good job with that.
The issue was input.
Well, actually, one already learned something from output. One learned that at least at some level, a lot of mathematical notation could be represented in some kind of context-free form. Because one knew that in TeX, for instance, one could set things up in a tree of nested boxes.
But how about input? Well, one of the biggest things was something that always comes up in parsing: if you have a string of text, with operands and operators, how do you tell what groups with what?
So let’s say you have a math expression like this.
Sin[x+1]^2+ArcSin[x+1]+c(x+1)+f[x+1]
What does it mean? Well, to know that you have to know the precedence of the operators—which ones bind tighter to their operands and so on.
Well, I kind of suspected that there wasn’t much consistency to that across all the different pieces of math that people were writing. But I decided to actually take a look at it. So I went through all sorts of math books, and started asking all kinds of people how they would interpret random lumps of mathematical notation. And I found a very surprising thing: out of many tens of operators, there is amazing consistency about people’s conception about precedence. So one can really say: here’s a definite precedence table for mathematical operators.
We can say with pretty much confidence that this is the precedence table that people imagine when they look at pieces of mathematical notation.
Having found this fact, I got a lot more optimistic about us really being able to interpret mathematical notation input. One way one could always do this is by having templates. Like one has a template for an integral sign, and one just fills stuff into the integrand, the variable, and so on. And when the template pastes into a document it looks right, but it still maintains its information about what template it is, so a program knows how to interpret it. And indeed various programs work like this.
But generally it’s extremely frustrating. Because as soon as you try to type fast—or do editing—you just keep on finding that your computer is beeping at you, and refusing to let you do things that seem like you should obviously be able to do.
Letting people do free-form input is much harder. But that’s what we wanted to do.
So what’s involved in that?
Well, basically one needs a completely rigorous and unambiguous syntax for math. Obviously, one can have such a syntax if one just uses regular computer language like string-based syntax. But then you don’t have familiar math notation.
Here’s the key problem: traditional math notation isn’t completely unambiguous. Or at least it isn’t if you try to make it decently general. Let’s take a simple example, “i”. Well, is that Sqrt[-1] or is it a variable “i”?
In the ordinary textual InputForm of Mathematica all those kinds of ambiguities are resolved by a simple convention: everything that’s built into Mathematica has a name that starts with a capital letter.
But capital “I” doesn’t look like what one’s used to seeing for Sqrt[-1] in math texts. So what can one do about it? Here we had a key idea: you make another character, that’s also a lowercase “i” but it’s not an ordinary lowercase “i” and you make that be the “i” that’s the square root of -1.
You might have thought: Well, why don’t we just have two “i” characters, that look the same, exactly like in a math text, but have one of them be somehow special? Well, of course that would be absurdly confusing. You might know which “i” it was when you typed it in, but if you ever moved it around or anything, you’d be quite lost.
So one has to have two “i”s. What should the special one look like?
Well, the idea we had—actually I think I was in the end responsible for it—was to use double-struck characters. We tried all sorts of other graphical forms. But the double struck idea was the best. Partly because it sort of follows a convention in math of having notation for specific objects be double struck.
So, for example, a capital R in mathematical text might be a variable. But double struck R represents a specific object: the set of all real numbers.
So then double-struck “i” is the specific object that we call ImaginaryI. And it works like this:
In[9]:=
Out[9]=
In[10]:=
Out[10]=
Well, this double-struck idea solves a lot of problems.
Here’s a big one that it solves: integrals. Let’s say you try to make syntax for integrals. Well, one of the key issues is what happens with the “d” in the integral? What happens if perhaps there’s a “d” as a parameter in the integrand? Or a variable? Things get horribly confused.
Well, as soon as you introduce DifferentialD, or double-struck “d”, everything becomes easy. And you end up with a completely well defined syntax.
We might integrate x to the power of d over the square root of x+1. It works like this:
In[11]:=
Out[11]=
It turns out that there are actually very few tweaks that one has to make to the core of mathematical notation to make it unambiguous. It’s surprising. But it’s very nice. Because it means you can just enter free form stuff that’s essentially mathematical notation, and have it rigorously understood. And that’s what we implemented in Mathematica 3.
Of course, to make it really nice, there are lots of details that have to be right. One has to actually be able to type things in an efficient and easy-to-remember way. We thought very hard about that. And we came up with some rather nice general schemes for it.
One of them has to do with entering things like powers as superscripts. In ordinary textual input, when you enter a power you type ^. So what we did for math notation is to have it be that control-^ enters an explicit subscript. And with the same idea, control-/ enters a built-up fraction.
Well, having a clean set of principles like that is crucial to making this whole kind of thing work in practice. But it does. So here’s what it might look like to enter a slightly complicated expression.
In[12]:=
Out[12]=
But we can take pieces of this output and manipulate them.
And the point is that this expression is completely understandable to Mathematica, so you can evaluate it. And the thing that comes out is the same kind of object as the input, and you can edit it, pick it apart, use its pieces as input, and so on.
Well, to make all this work we’ve had to generalize ordinary computer languages and parsing somewhat. First of all, we’re allowing a whole zoo of special characters as operators. But probably more significant, we’re allowing two dimensional structures. So instead of just having things like prefix operators, we also have things like overfix operators, and so on.
If you look at the expression here you may complain that it doesn’t quite look like traditional math notation. It’s really close. And it certainly has all the various compactifying and structuring features of ordinary math notation. And the important thing is that nobody who knows ordinary math notation would be at all confused about what the expression means.
But at a cosmetic level, there are things that are different from the way they’d look in a traditional math textbook. Like the way trig functions are written, and so on.
Well, I would argue rather strongly that the Mathematica StandardForm, as we call it, is a better and clearer version of this expression. And in the book I’ve been writing for many years about the science project I’m doing, I use only Mathematica StandardForm to represent things.
But if one wants to be fully compatible with traditional textbooks one needs something different. And here’s another important idea that was in Mathematica 3: the idea of separating so-called StandardForm from so-called TraditionalForm.
Given any expression, I can always convert it to TraditionalForm.
And the actual TraditionalForm I get always contains enough internal information that it can unambiguously be turned back into StandardForm.
But the TraditionalForm looks just like traditional math notation. With all the slightly crazy things that are in traditional math notation, like writing sin squared x, instead of sin x squared, and so on.
So what about entering TraditionalForm?
You may notice those jaws on the right-hand side of the cell. Well, those mean there’s something dangerous here. But let’s try editing.
In[13]:=
We can edit just fine. Let’s see what happens if we try to evaluate this.
Well, we get a warning, or a disclaimer. But let’s go ahead anyway.
Out[13]=
Well, it figured out what we want.
Actually, we have a few hundred rules that are heuristics for understanding traditional form expressions. And they work fairly well. Sufficiently well, in fact, that one can really go through large volumes of legacy math notation—say specified in TeX—and expect to convert it automatically to unambiguously meaningful Mathematica input.
It’s kind of exciting that it’s possible to do this. Because if one was thinking of legacy ordinary language text, there’s just no way one can expect to convert it to something understandable. But with math there is.
Of course, there are some things with math, particularly on the output side, that are a lot trickier than text. Part of the issue is that with math one can expect to generate things automatically. One can’t generate too much text that actually means very much automatically. But with math, you do a computation, and out comes a huge expression.
So then you have to do things like figure out how to break the expression into lines elegantly, which is something we did a lot of work on in Mathematica. There are a lot of interesting issues, like the fact that if you edit an expression, its optimal line breaking can change all the time you’re editing it.
And that means there are nasty problems like that you can be typing more characters, but suddenly your cursor jumps backwards. Well, that particular problem I think we solved in a particularly neat way. Let’s do an example.
Did you see that? There was a funny blob that appeared just for a moment when the cursor had to move backwards. Perhaps you noticed the blob. But if you were typing, you probably wouldn’t notice that your cursor had jumped backwards, though you might notice the blob that appeared because that blob makes your eyes automatically move to the right place, without you noticing. Physiologically, I think it works by using nerve impulses that end up not in the ordinary visual cortex, but directly in the brain stem where eye motion is controlled. So it works by making you subconsciously move your eyes to the right place.
So we’ve managed to find a way to interpret standard mathematical notation. Does that mean we should turn everything Mathematica can do into math-like notation? Should we have special characters for all the various operations in Mathematica? We could certainly make very compact notation that way. But would it be sensible? Would it be readable?
The answer is basically no.
And I think there’s a fundamental principle here: one wants just so much notation, and no more.
One could have no special notation. Then one has Mathematica FullForm. But that gets pretty tiresome to read. And that’s probably why a language like LISP seems so difficult—because its syntax is basically Mathematica FullForm.
The other possibility is that everything could have a special notation. Well, then one has something like APL—or parts of mathematical logic. Here’s an example of that.
It’s fairly hard to read.
Here’s another example from Turing’s original paper showing the notation he made up for his original universal Turing machine, another not very satisfactory notation.
It’s pretty unreadable too.
The question is what’s right between the extremes of LISP and APL. I think it’s very much the same kind of issue that comes up with things like short command names.
Think about Unix. In early versions of Unix it seemed really nice that there were just a few quick-to-type commands. But then the system started getting bigger. And after a while there were zillions of few-letter commands. And most mere mortals couldn’t remember them. And the whole thing started looking completely incomprehensible.
Well, it’s the same kind of thing with mathematical notation, or any other kind of notation, for that matter. People can handle a modest number of special forms and special characters. Maybe a few tens of them. Kind of an alphabet’s worth. But not more. And if you try to give them more, particularly all at once, they just become confused and put off.
Well, one has to qualify that a bit. There are, for example, lots of relational operators.
But most of these are made up conceptually from a few elements, so there isn’t really a problem with them.
And, of course, it is in principle possible for people to learn lots of lots of different characters. Because languages like Chinese and Japanese have thousands of ideograms. But it takes people many extra years of school to learn to read those languages, compared to ones that just use alphabets.
Talking of characters, by the way, I think it’s considerably easier for people to handle extra ones that appear in variables than in operators. And it’s kind of interesting to see what’s happened historically with those.
One thing that’s very curious is that, almost without exception, only Latin and Greek characters are ever used. Well, Cantor introduced a Hebrew aleph for his infinite cardinal numbers. And some people say that a partial derivative is a Russian d, though I think historically it really isn’t. But there are no other characters that have really gotten imported from other languages.
By the way, you all know that in standard English, “e” is the most common letter, followed by “t,” and so on. Well, I was curious what that distribution was like for letters in math. So I had a look in MathWorld, which is a large website of mathematical information that has about 10,000 entries and looked at what the distribution of different letters was.
You can see that “e” is the most common. Actually, very strangely “a” is the second most common. That’s very unusual. We can see that lowercase is the most common followed by , , , , etc. and the uppercase ones, , are the most common.
OK. I’ve talked a bit about notation that is somehow possible to use in math. But what notation is good to use?
Most people who actually use math notation have some feeling for that. But there isn’t an analog of something like Fowler’s Modern English Usage for math. There’s a little book called Mathematics into Type put out by the AMS, but it’s mostly about things like how putting scripts on top of each other requires cutting pieces of paper or film.
The result of this is that there aren’t well codified principles, analogous to things like split infinitives in English.
If you use Mathematica StandardForm, you don’t need these much. Because anything you type will be unambiguously understandable. But for TraditionalForm, it would be good to have some principles. Like don’t write Sin–2x because it’s not clear what that means.
Perhaps to finish off, let me talk a little about the future of mathematical notation.
If there is any new notation, what should it be, for example?
Well, in books of symbols there are perhaps 2500 symbols listed that are supposedly common in some field or another and aren’t letters in languages. And with the right drawing of characters, quite a few of these could be made perfectly to fit in with other mathematical characters.
What would one use them for?
Well, the most obvious possibility is notation for representing programs as well as mathematical operations. In Mathematica, for instance, there are quite a few textual operators that are used in programs. And I’ve long thought that it would be very nice to be able to use actual special characters for these, rather than combinations of ordinary ASCII characters.
It turns out that there’s a very smooth way to do that sometimes. Because we picked the ASCII characters well, one can often get special characters that are visually very similar but more elegant. For example, if I type – > into Mathematica, it automatically gets turned into a nice arrow. And what makes all this work is that the parser for Mathematica can accept both the special character and non-special character forms of these kinds of operators.
Well, I’ve often wondered about extensions to this. And gradually they’re coming. Notice the number sign or pounds sign—or is it called octothorp—that we use for places where parameters go in a pure function. Well, it’s bit like a square, just with some tentacles. And in the future there’ll probably be a nice square, with tiny little serifs, that is the function parameter thing. And it’ll look really smooth, not like a piece of computer language input: more like something iconic.
How far can one go in that direction: making visual or iconic representations of things? It’s pretty clear that things like block diagrams in engineering, or commutative diagrams in pure mathematics, and flow charts and things work OK. At least up to a point. But how far can that go?
Well, I’m not sure it can go terribly far. You see, I think one is running into some fundamental limitations in human linguistic processing.
When languages are more or less context free—more or less structured like trees—one can do pretty well with them. Our buffer memory of five chunks, or whatever, seems to do well at allowing us to parse them. Of course, if we have too many subsidiary clauses, even in a context free language, we tend to run out of stack space and get confused. But if the stack doesn’t get too deep, we do well.
But what about networks? Can we understand arbitrary networks? I mean, why do we have to have operators that are just prefix, or infix, or overfix, or whatever? Why not operators that get their arguments by just pulling them in over arcs in some arbitrary network?
Well, I’ve been particularly interested in this because I’ve been doing some science things with networks. And I’d really like to be able to come up with a kind of language representation of networks. But even though I’ve tried pretty hard, I don’t think that at least my brain can deal with networks the way I deal with things like ordinary language or math that are structured in 1D or 2D in a context free way. So I think this may be a place where, in a sense, notation just can’t go.
Well, in general, as I mentioned earlier, it’s often the case that a language—or a notation—can limit what one manages to think about.
So what does that mean for mathematics?
Well, in my science project I’ve ended up developing some major generalizations of what people ordinarily think of as math. And one question is what notation might be used to think abstractly about those kinds of things.
Well, I haven’t completely solved the problem. But what I’ve found, at least in many cases, is that there are pictorial or graphical representations that really work much better than any ordinary language-like notation.
Actually, bringing us almost back to the beginning of this talk, it’s a bit like what’s happened for thousands of years in geometry. In geometry we know how to say things with diagrams. That’s been done since the Babylonians. And a little more than a hundred years ago, it became clear how to formulate geometrical questions in algebraic terms.
But actually we still don’t know a clean simple way to represent things like geometrical diagrams in a kind of language-like notation. And my guess is that actually of all the math-like stuff out there, only a comparatively small fraction can actually be represented well with language-like notation.
But we as humans really only grock easily this language-like notation. So the things that can be represented that way are the things we tend to study. Of course, those may not be the things that happen to be relevant in nature and the universe.
But that’d be a whole other talk, or much more. So I’d better stop here.
Thank you very much.
In the discussion after the talk, and in interactions with people at the conference, a few additional points came up.
Empirical laws for mathematical notations | Printed vs. on-screen notation | Graphical notation | Fonts and characters | Searching mathematical formulas | Non-visual notation | Proofs | Character selection | Frequency distribution of symbols | Parts of speech in mathematical notation
In the study of ordinary natural language there are various empirical historical laws that have been discovered. An example is Grimm’s Law, which describes general historical shifts in consonants in Indo-European languages. I have been curious whether empirical historical laws can be found for mathematical notation.
Dana Scott suggested one possibility: a trend towards the removal of explicit parameters.
As one example, in the 1860s it was still typical for each component in a vector to be a separately-named variable. But then components started getting labelled with subscripts, as in ai. And soon thereafter—particularly through the work of Gibbs—vectors began to be treated as single objects, denoted say by or a.
With tensors things are not so straightforward. Notation that avoids explicit subscripts is usually called “coordinate free.” And such notation is common in pure mathematics. But in physics it is still often considered excessively abstract, and explicit subscripts are used instead.
With functions, there have also been some trends to reduce the mention of explicit parameters. In pure mathematics, when functions are viewed as mappings, they are often referred to just by function names like f, without explicitly mentioning any parameters.
But this tends to work well only when functions have just one parameter. With more than one parameter it is usually not clear how the flow of data associated with each parameter works.
However, as early as the 1920s, it was pointed out that one could use so-called combinators to specify such data flow, without ever explicitly having to name parameters.
Combinators have not been used in mainstream mathematics, but at various times they have been somewhat popular in the theory of computation, although their popularity has been reduced through being largely incompatible with the idea of data types.
Combinators are particularly easy to set up in Mathematica—essentially by building functions with composite heads. Here’s how the standard combinators can be defined:
k[x_][y_]:=i x
s[x_][y_][z_]:= x[z][y[z]]
If one defines the integer n—effectively in unary—by Nest[s[s[k[s]][k]],k[s[k][k]],n] then addition is s[k[s]][s[k[s[k[s]]]][s[k[k]]]], multiplication is s[k[s]][k] and power is s[k[s[s[k][k]]]][k]. No variables are required.
The problem is that the actual expressions one gets are almost irreducibly obscure. I have tried to find clear ways to represent them and their evaluation. I have made a little progress, but have certainly not been fully successful.
[back to top]
Some people asked about differences between what is possible in printed and on-screen notation.
Notation needs to be familiar to be understood, so the differences cannot be too sudden or dramatic.
But there are some obvious possibilities.
First, on screen one can routinely use color. One might imagine that it would somehow be useful to distinguish variables that are different colors. In my experience it is fine to do this in annotating a formula. But it becomes totally confusing if, for example, a red and green x are supposed to be distinct variables.
Another possibility is to have animated elements in a formula. I suspect that these will be as annoying as flashing text, and not immediately useful.
A better idea may be to have the capability of opening and closing sections of an expression—like cell groups in a Mathematica notebook. Then one has the possibility of getting an overall view of an expression, but being able to click to see more and more details if one is interested.
[back to top]
Several people thought I had been too hard on graphical notations in my talk.
I should have made it clearer that the area I have found graphical notations difficult to handle is in representing traditional mathematical actions and operations. In my new science I use graphics all the time, and I cannot imagine doing what I do any other way.
And in traditional science and mathematics there are certainly graphical notations that work just fine, though typically for fairly static constructs.
Graph theory is an obvious place where graphical representations are used.
Related to this are, for example, chemical structure diagrams in chemistry and Feynman diagrams in physics.
In mathematics, there are methods for doing group theoretical computations—particularly due to Predrag Cvitanović—that are based on graphical notation.
And then in linguistics, for example, it is common to “diagram” a sentence, showing the tree of derivations that can be used to build up the sentence.
All of these notations, however, become quite obscure if one has to use them in cases that are too big. But in Feynman diagrams, two loops are the most that are routinely considered, and five loops is the maximum for which explicit general computations have ever been done.
[back to top]
I had meant to say something in my talk about characters and fonts.
In Mathematica 3 we went to a lot of trouble to develop fonts for over 1100 characters of relevance to mathematical and technical notation.
Getting exactly the right forms—even for things like Greek letters—was often fairly difficult. We wanted to maintain some semblance of “classical correctness,” but we also wanted to be sure that Greek letters were as distinct as possible from English letters and other characters.
In the end, I actually drew sketches for most the characters. Here’s what we ended up with for the Greek letters. We made both a Times-like font, and a monospaced Courier-like font. (We’re currently also developing a sans serif font.) The Courier font was particularly challenging. It required, for example, working out how stretch an iota so that it could sensibly fill a complete character slot.
Other challenges included script and Gothic (Fraktur) fonts. Often such fonts end up having letters that are so different in form from ordinary English letters that they become completely unreadable. We wanted to have letters that somehow communicated the appropriate script or Gothic theme, but nevertheless have the same overall forms as ordinary English letters.
Here’s what we ended up with:
Various people asked about searching mathematical formulas.
It’s obviously easy to specify what one means by searching plain text. The only issue usually is whether one considers upper- and lowercase letters to be equivalent.
For mathematical formulas things are more complicated, since there are many more forms that are straightforwardly equivalent. If one asks about all possible equivalences things become impossibly difficult, for basic mathematical reasons. But if one asks about equivalences that more or less just involve substituting one variable for another then one can always tell whether two expressions are equivalent.
However, it pretty much takes something with the power of the Mathematica pattern matcher to do this.
We’re planning formula searching capabilities for our new web site functions.wolfram.com, though as of right now this has not actually been implemented there.
[back to top]
Someone asked about non-visual notation.
My first response was that human vision tends to be a lot more sensitive than, say, human hearing. After all, we have a million nerve fibers connected to our eyes, and only 50,000 connected to our ears.
Mathematica has had audio generation capabilities since version 2 in 1991. And there are some times when I’ve found this useful for understanding data.
But I at least have never found it at all useful for anything analogous to notation.
[back to top]
Someone asked about presentations of proofs.
The biggest challenge comes in presenting long proofs that were found automatically by computer.
A fair amount of work has been done on presenting proofs in Mathematica. An example is the Theorema project.
The most challenging proofs to present are probably ones—say in logic—that just involve a sequence of transformations on equations. Here’s an example of such a proof:
[back to top]Given the Sheffer axioms of logic (f is the Nand operation):
{f[f[a,a],f[a,a]]==a,f[a,f[b,f[b,b]]]==f[a,a], f[f[a,f[b,c]],f[a,f[b,c]]]==f[f[f[b,b],a],f[f[c,c],a]]}
prove comutativity, i.e f[a,b]==f[b,a]:
Note: (a b) is equivalent to Nand[a,b]. In this proof, L==Lemma, A==Axiom, and T==Theorem
I had meant to say something about selecting characters to use in mathematical notation.
There are about 2500 commonly-used symbols that do not appear in ordinary text.
Some are definitely too pictorial: a fragile sign, for example. Some are too ornate. Some have too much solid black in them, so they’d jump out too much on a page. (Think of a radioactive sign, for example.)
But a lot might be acceptable.
If one looks at history, it is fairly often the case that particular symbols get progressively simplified over the course of time.
A specific challenge that I had recently was to come up with a good symbol for the logic operations Nand, Nor, and Xor.
In the literature of logic, Nand has been variously denoted:
I was not keen on any of these. They mostly look too fragile and not blobby enough to be binary operators. But they do provide relevant reminders.
What I have ended up doing is to build a notation for Nand that is based on one of the standard ones, but is “interpreted” so as to have a better visual form. Here’s the current version of what I came up with:
In the talk I showed the frequency distribution for Greek letters in MathWorld.
To complement this, I also counted the number of different objects named by each letter, appearing in the Dictionary of Physics and Mathematics Abbreviations. Here are the results.
In early mathematical notation—say the 1600s—quite a few ordinary words were mixed in with symbols.
But increasingly in fields like mathematics and physics, no words have been included in notation and variables have been named with just one or perhaps two letters.
In some areas of engineering and social science, where the use of mathematics is fairly recent and typically not too abstract, ordinary words are much more common as names of variables.
This follows modern conventions in programming. And it works quite well when formulas are very simple. But if they get complicated it typically throws off the visual balance of the formulas, and makes their overall structure hard to see.
[back to top]
In talking about the correspondence of mathematical language and ordinary language, I was going to mention the question of parts of speech.
So far as I know, all ordinary languages have verbs and nouns, and most have adjectives, adverbs, etc.
In mathematical notation, one can think of variables as nouns and operators as verbs.
What about other parts of speech?
Things like (And) sometimes play the role of conjunctions, just as they do in ordinary language. (Notably, all ordinary human languages seem to have single words for And or Or, but none have a single word for Nand.) And perhaps as a prefix operator can be viewed as an adjective.
But it is not clear to what extent the kinds of linguistic structure associated with parts of speech in ordinary language are mirrored in mathematical notation.