[ Jocelyn Ireson-Paine's Home Page | Publications | Interactive Category Theory Demonstrations | n-Category Café thread about Graphical Category Theory Demonstrations ]

What Might Categories do for AI and Cognitive Science?

Introduction

In August 2008, I posted my answer to the question "Why should we be interested in categories" on the A Categorical Manifesto thread in the n-Category Café blog. The posting was long, and I had to feed it in in sections, which became interrupted by readers' replies. So to make it easier to read, I've put it up here, as one document. The original, with replies, is at the link just mentioned.

Contents

Category theory and the Interesting Truths

Here’s my answer. In general, of course, categories will be useful because they’ll help us understand the “Interesting Truths” of computer science: the “subtle unifying insights between broad classes of mathematical structures”, to quote the paragraph from Greg Egan’s novel Incandescence that I posted:

‘Interesting Truths’ referred to a kind of theorem which captured subtle unifying insights between broad classes of mathematical structures. In between strict isomorphism — where the same structure recurred exactly in different guises — and the loosest of poetic analogies, Interesting Truths gathered together a panoply of apparently disparate systems by showing them all to be reflections of each other, albeit in a suitably warped mirror. Knowing, for example, that multiplying two positive integers was really the same as adding their logarithms revealed an exact correspondence between two algebraic systems that was useful, but not very deep. Seeing how a more sophisticated version of the same could be established for a vast array of more complex systems — from rotations in space to the symmetries of subatomic particles — unified great tracts of physics and mathematics, without collapsing them all into mere copies of a single example.

But more specifically, I’m interested in them because I think they have huge — and under-used — potential in understanding human cognition and in building “intelligent” computers. I’m going to give several examples, which will make quite a long post. I hope that’s OK.

Unifying unification with holographic reduced representations

My first example concerns understanding how “logical unification” and “holographic reduced representations” are related. Logical unification is a kind of pattern matching. (It’s often just called “unification”, but I’ll preface with “logical” to avoid confusion with “unify” as in “insight”. Logicians reading this, please forgive such a cursory description.)

For instance, in the programming language Prolog, unifying the term 2*x+3*y with the term U+V, will bind the variable U to 2*x and the variable V to 3*y. Prolog programmers love this, because it’s such an easy way to get at pieces of structures.

But — computer science also knows a different kind of pattern matching, implemented by so-called “holographic reduced representations” or HRRs. These encode structures as bitwise or numeric vectors in very high-dimensional spaces, using circular convolution and circular correlation (or analogous operations) to build and decompose vectors. I’ll show an example in the next paragraph; for those who want to know more, these references give some background:

Here’s an example from Kanerva’s paper. The following equation computes a bitwise vector encoding data about France:

France = [Capital*Paris + Location*WesternEurope + Currency*Euro]

Here, + is vector addition, * is exclusive-or. The brackets [] denote a “normalised mean”, whose exact definition is given in Kanerva’s paper. Capital, Location and Currency are three randomly-generated vectors. The vector Capital represents a slot or field whose value will represent the capital of France; similarly for the vectors Location and Currency.

Paris, WesternEurope and Euro are three more randomly-generated vectors, representing the values of these slots. It’s important to note that all six vectors should be interpreted as atomic symbols: the actual values of their components have no significance.

Having computed the vector France, we can “probe” it by exclusive-oring with the vector Capital:

WhatCity = Capital *France
= Capital * [Capital*Paris + Location*WesternEurope + Currency*Euro]
= Capital*Capital*Paris + Capital*Location*WesternEurope + Capital*Currency*Euro
= Paris + (a bit of other stuff).

This works because * (that is, exclusive-or) is its own inverse, and because it distributes over the normalised mean [].

The final stage is to do what Kanerva calls a “clean-up”. You assume there is an associative memory that contains the six vectors originally generated. Fed any vector, it will quickly return whichever of the six closest to it. Which in this case,we hope will be Paris.

For this to work, various conditions on the distribution of vectors must be satisfied. But if they are, you can do some really nifty analogical reasoning. For example, you can ask how France is related to Paris by exclusive-oring the vector France with the vector Paris. This works the same way as above, and after the clean-up, will return Capital. Kanerva goes on to describe more ambitious reasoning tasks, such as forming a vector that represents Sweden, then asking “What is the Paris of Sweden”?

On the surface, holographic reduced representations are utterly different from logical unification. But I can’t help feeling that, at a deeper level, they are closely related. And there is a categorical formulation of logical unification, which regards it as finding coequalisers in a category whose objects are sets of terms and whose arrows are substitutions. This is described in the first reference below, by Rydeheard and Burstall. They say their formulation is derived from an observation by Goguen. So it may be based (I’m not an expert) on the ideas in the second reference:

So, can we extend that categorical formulation to holographic reduced representations? I don’t know. But if we could, we would better understand how they are related to logic programming, and we might gain new tools for analogical reasoning. It’s worth trying.

Unifying neural nets with logic

My theme is that category theory has heaps to offer Artificial Intelligence and cognitive science. We ought to be doing all we can to promote it, and to make it easy to learn. And we should be showing people the kinds of task it could be useful for. Which includes unifying deeply disparate areas of computing. Which AI and cognitive science are full of!

So here’s my second example: neural networks, and logic-based representations. I suppose most readers have come across neural networks, but for those who haven’t, I’ll describe them briefly. Neural networks consist of “nodes” connected in a network. The connections between nodes correspond to synapses between biological neurons. Typically, a node sums the signals from its input nodes, multiplying each input signal by the strength of the corresponding connection. It may then process the sum in some way — e.g. by thresholding — to calculate an output and an “activation”. The entire network can be regarded as a vector of these activation values, representing the data in the network.

Most kinds of neural network can learn by adjusting the strengths of their connections, thus mimicking — in a very simplified way — how the brain learns by adjusting synaptic strengths.

In Artificial Intelligence, there is a vast cultural chasm between neural nets and symbolic ways to represent knowledge. It’s rare for the same researcher to work on both, and students usually get taught about them in different courses. This reflects the differences between the two. The symbols in symbolic representations denote things. But in some kinds of neural net, the individual nodes don’t, although a whole set of nodes might. Therefore, such nets are often called “subsymbolic”, a word introduced by Paul Smolensky in the following paper:

(Holographic reduced representation vectors are also subsymbolic.)

As well as the “symbolic/subsymbolic divide”, there is a “compositional/non-compositional divide”. Symbolic ways to represent knowledge are “compositional”, in that the meaning of a sequence of symbols is determined by the meanings of its parts. Compositionality is good: if you can represent “coffee cup” and “cake”, it’s nice to be able to represent “coffee cup and cake” too. But if you have (say) an image-processing net which has learnt to represent images of coffee cups by one pattern of activations, and images of cakes by another, you can’t assume that superimposing or adding the patterns will represent a coffee cup and a piece of cake. So there has been a lot of argument about questions like: is the brain compositional; is composition by superimposing activation patterns the only kind of compositionality; and (as engineers) how can we design nets that are compositional?

Yet another difference is that symbolic ways to represent knowledge are often said to be “contextual”: they are leaky, in that what happens at one node can be very sensitive to data elsewhere in the network. Which is different from my Prolog program that will happily unify 2*x+3*y with U+V, giving the same results no matter what else is in my laptop’s memory. Again, there’s been a lot of argument about this. Unfortunately, a lot of the key papers on this and the other topics, though on-line, are locked up in pay-to-view sites. The reference below seems a good, and free, starting point:

I mention these differences not only to show how far apart neural nets are from symbolic systems, but also because category theory must have a lot to offer in answering such questions. Which is why the two references below are exciting:

In the first, Brown and Porter suggest using colimits to understand brain activity in terms of the component structures and neurons.

Healy goes further. He sets up a category Conc whose objects are theories representing concepts. Arrows represent “is a subconcept of”: there is an arrow from c1 to c2 if c1 is a subconcept of c2. He sets up another category Neur, whose objects are nodes in a particular neural network. Arrows represent the “can activate” relationship: there is an arrow from n1 to n2 if n1 can activate n2. Such an arrow tells us something about the neural network, namely that there is a path between n1 and n2, and that all the synaptic connections along this path are strong enough to propagate a signal from n1 to n2.

This wouldn’t be interesting unless the categories Conc and Neur were related, and indeed, Healy lets us define functors from Conc to Neur. In effect, each functor maps a network of concepts, represented as theories, to regions in a neural network where these concepts are represented.

What’s nice is that we can combine these neural-network representations by using natural transformations. I hope I have this right, because Healy doesn’t give much detail, but the point seems to be that a single neural network might learn different instances of the same concept.

For example, imagine an ambidextrous robot with mirror-identical arms and hands. Each hand has touch sensors: the left hand’s sensors are wired to one set of input nodes in the network, and the right hand’s to another set of input nodes. Assume the robot is taught to grasp an orange with its left hand, and then, independently, with its right hand. The region of its neural network fed from the left hand sensors learns a set of associations between finger pressure and movement; the region fed from the right hand learns, independently, a very similar set. This is an example I made up, by the way, so blame me if it turns out to be misleading.

The left hand and right hand associations can then be regarded as two functors, L and R, from Conc to Neur. I’m not sure what the logical content of the objects in Conc would be — theories describing what an orange is for, or what its shape is, or how to grasp it — but at any rate, they’d say something about oranges, and they’d be represented in two different regions in the neural network. We can merge these, Healy says, by setting aside a third region in the network which combines corresponding outputs from each of the first two.

But is this in fact a valid representation of the learnt concepts? Yes, if we let natural transformations guide us. As well as functors L and R, let there be a functor M which maps Conc to this third region. Set up mappings from L to M and R to M. Then, Healy shows, the third region is a valid representation if these mappings are natural transformations.

Can such ideas be applied to other kinds of neural network? Can we show other kinds of neural network to be reflections, in suitably warped Eganian mirrors, of logical theories? Can we characterise neural networks by properties of the categories containing them? (I suspect many of these categories won’t have products, because of interference between subnetworks, but that’s only a passing thought.)

And — very importantly, because we engineers have such small minds, so need to break problems into tiny pieces before we can understand them — can we use colimits and other tools to help us build modular neural nets?

Unifying dynamical systems with logic

My third example is similar in spirit to the last. In the early days of programming, one of Artificial Intelligence’s motivating ideas was the Physical Symbol System Hypothesis put forward by Allen Newell and Herbert A. Simon in the following paper:

This states that: “A physical symbol system has the necessary and sufficient means for intelligent action”. I suppose every hacker believes this, and hopes in their deepest soul that, one day, they will write the program that shows this to be true.

More recently, various researchers have proposed using dynamical systems ideas to understand cognition. This is the Dynamical Systems Hypothesis, and there’s some background on it in these references:

I was reminded of this by a nice paper I found, written by David Tall about the learning of mathematical concepts. He wrote it before the Dynamical Systems Hypothesis became popular, but it uses dynamical systems ideas to explain how learning one concept might be blocked by a conflicting earlier version of the same concept:

So, in a spirit similar to that of my neural network section, I ask: can we use category theory to help us map between logical descriptions of knowledge and their implementation as dynamical systems. Can we use it to help us build modular dynamical systems implementations?

Understanding analogical reasoning — the core of cognition and a chemistry of concepts?

My fourth example ratchets up the stakes, and turns — I’ll explain why in a minute — to Margaret Thatcher. In the 1980s, she was Prime Minister of Britain, and Denis Thatcher was her husband. Ronald Reagan was President of America. So, if you were to ask an American “Who is First Lady of Britain?”, what could they reply? A reasonable answer is “Denis Thatcher”. But to make it, you must be willing to “slip” the concept of First Lady, weakening it to First Spouse as you transfer from an American frame of reference to a British.

Now consider the following two number patterns:

A: 1 2 3 4 5 5 4 3 2 1

B: 1 2 3 4 4 3 2 1

What is to B as 4 is to A? Most people would answer 3. It’s to the left of the central pair of numbers in B, just as 4 is to the left of the central pair of numbers in A. 4 in A and 3 in B are like Nancy Reagan and Denis Thatcher: they’re not the “most distinguished point” in their landscape, but they are closely related to that point. They fill the same rôle.

Here is another number problem:

A: 1 2 3 4 5 5 4 3 2 1

D: 1 1 2 2 3 3 4 4 5 4 4 3 3 2 2 1 1

What is to D as 4 is to A? A brute-force attempt to find D’s central pair won’t work. But it seems reasonable to “slip” the concepts of “pair” and “singleton” as we transfer from A’s frame of reference to D’s. Which gives us the answer 4 4.

These problems come from a column that Douglas Hofstadter wrote for Scientific American. He collected them, with later additions, into a book, and I’m particularly interested here in two of its chapters:

Hofstadter argues that analogy is at the core of human thought and creativity. To quote from Chapter 24:

Don’t press an analogy too far, because it will always break down. In that case, what good are analogies? Why bother with them? What is the purpose of trying to establish a mapping between two things that do not map onto each other in reality? The answer is surely very complex, but the heart of it must be that it is good for our survival (or our genes’ survival) , because we do it all the time. Analogy and reminding, whether they are accurate or not, guide all our thought patterns. Being attuned to vague resemblances is the hallmark of intelligence, for better or for worse.

Therefore, he says, human analogical thought is what we need to study. And we should do so by within “microdomains” such as the number problems, because then — in proper scientific spirit — we can isolate analogical thought from other phenomena.

He was particularly interested in why we slip some concepts more than others, and in what one might call rôles versus literals. As an example of the first: I’m in the pub, and I accidentally spill my pint of beer over my shoes. Why is it natural for me to “slip” my situation to “Good job I don’t have to go anywhere smart tonight!”, but not to “Shame gravity isn’t weaker, so the beer would have stayed in the air!”?

As an example of the second, why, when solving the two number problems, do we perceive 4 as filling a rôle at all in pattern A, then try to find a value for the equivalent rôle in patterns B and D? Why don’t we just say, “Well, the obvious answer is the same value, 4”?

There are some lovely examples in those two chapters, which I do recommend to anyone interested. But what’s the connection with categories?

Hofstadter and his colleagues wrote several programs to implement their theories of analogical thought in microdomains. These programs include include Copycat:

And they include Tabletop:

Doesn’t that title sound like something to interest every red-blooded n-categorist? Anyway, Tabletop solves problems such as “I am sitting at a table and I touch my nose. What should you touch?”. Or “I touch the flower vase in the centre. What should you touch?”. Or, one more: “I touch the cup near me. You have a glass but no cup. What should you touch?”

Now, these programs are complicated. They contain swarms of biologically-inspired enzyme-like entities swimming around in a kind of symbol-filled cytoplasmic glop, glomming onto bits of concepts, and slowly crystallising out into a finished answer to the problem. They lack — as far as I know — any easily understandable semantics.

So, I’m going to suggest, can we use categories to give them such a semantics? Or, more generally, to formalise our ideas about the best solutions to the kind of analogy problems that Hofstadter and his colleagues discuss? A good starting point here would be Goguen’s ideas on conceptual blending, already mentioned earlier in this thread. For example:

Let me finish this section with two more Hofstadter quotes. As I said at the beginning, the stakes are high:

[…] I have been emphasising the idea of the internal structure of one concept. In my view, the way that concepts can bond together and form conceptual molecules on all levels of complexity is a consequence of their internal structure. What results from a bond may surprise us, but it will nonetheless always have been completely determined by the concepts involved in the fusion, if only we could understand how they are structured. Thus the crux of the matter is the internal structure of a single concept, and how it “reaches out” towards things it is not. The crux is not some magical, mysterious process that occurs when two indivisble concepts collide; it is a consequence of the divisibility of concepts into subconceptual elements. As must be clear from this, I am not one to believe that wholes elude descriptions in terms of their parts. I believe that if we come to understand the “physics of concepts”, then perhaps we can derive from it a “chemistry of creativity”, just as we can derive the principles of the chemistry of atoms and molecules from those of the physics of quanta and particles. But as I said earlier, it is not just around the corner. Mental bonds will probably turn out to be no less subtle than chemical bonds.

You can think of concepts as start, and knob-twiddling as carrying you from one point on an orbit to another point. If you twiddle enough, you may well find yourself deep within the attractive zone of an unexpected but interesting concept and be captured by it. You may thus migrate from concept to concept. In short, knob-twiddling is a device that carries you from one concept to another, taking advantage of their overlapping orbits.

Of course, all this cannot happen with a trivial model of concepts. We see it happening all the time in minds, but to make it happen in computers or to locate it physically in brains will require a fleshing-out of what concepts really are. It’s fine to talk of “orbits around concepts” as a metaphor, but developing it into a full scientific notion that either can be realised in a computer model or can be located inside a brain is a huge task. This is the task that faces cognitive scientists if they wish to make “concept” a legitimate scientific term. This goal, suggested at the start of this article, could be taken to be the central goal of cognitive science, although such things are often forgotten in the inane hoopla that is surrounding artificial intelligence more and more these days

“These days” were in the 1980s, so perhaps we can disregard the remark about the inane hoopla. Though there seems to be a lot of inane hoopla surrounding computing in general. But what about the rest of the quotes. Does it make sense to think about a “chemistry of concepts”; to make “concept” a legitimate scientific notion?

Actually, Michael Healy, who I mentioned earlier for his work on mapping concepts to neural nets, has with coauthors Thomas Caudell and Timothy Goldsmith, released another report:

This is their abstract:

Categorization and the judgement of similarity are fundamental in cognition. We propose that these and other activities are based upon an underlying structure of knowledge, or concept representation, in the brain. Further, we propose that this structure can be represented mathematically in a declarative form via category theory, the mathematical theory of structure. We test the resulting mathematical model in an experiment in which human subjects provide judgements of similarity for pairs of line drawings using a numerical scale to represent degrees of similarity. The resulting numerical similarities are compared with those derived from the category-theoretic model by comparing diagrams. The diagrams represent distributed concept structures underlying the line drawings. To compare with a more conventional analysis technique, we also compare the human judgements with those provided by a two-dimensional feature space model equipped with a distance metric for the line drawings. The results are equally favorable for both models. Because of this and the putative explanatory power of the category-theoretic model, we propose that this model is worthy of further exploration as a mathematical model for cognitive science.

Which is extremely interesting, and reminiscent of Hofstadter’s remark: “Thus the crux of the matter is the internal structure of a single concept […]”.

"A necessary tool in the construction of an adequately explicit science of knowing"

I’ll finish by talking about a Lawvere paper. It’s the following reference — which, I’m pleased to note, is public at Google Books — and I mention it because I found it in a book on cognition:

Here’s how Lawvere concludes the paper:

Despite some simplification in the above, needed for rapid description, I hope that I have made clear that there is a great deal of useful precision lying behind my illustrations, and a great deal to be developed on the same basis. Thus I believe to have demonstrated the plausibility of my thesis that category theory will be a necessary tool in the construction of an adequately explicit science of knowing.

Now, I don’t know enough to describe these “illustrations” succinctly, but they include his notion of unity-and-identity-of-opposites, which I discovered from this reference:

is also known as an adjoint cylinder. I mention this only because it sounds like something wonderful.

Lawvere’s illustrations also include the notion of an adequacy comonad. Now, I hadn’t come across adequacy comonads before, but they sound as though they might be very useful in understanding approximations to concepts. Moreover, the adequacy comonad sounds like an exquisite and intricate piece of mathematical machinery, and I hope someone will give me a working model. One I can pick up, prod, peer into, pull to pieces, and generally play with. Steam or electric.

However, I’ve another reason for mentioning the notion. Lawvere uses it to show how the study of closed categories contributes to the study of metric spaces (in this case, via the use of adequacy comonads in the geodesic remetrisation of a metric space). He then says that, conversely, we can take notions such as “radius” and “engulfing” from the study of metric spaces and create “precise analogues [of these notions] for concrete generals enriched in any (even non-posetal) closed categories V”.

I don’t need to fear the strange phrase “concrete general”, because John Baez explains it in:

A “concrete general” is the category of all the models of a theory. (Is there more to it than that?)

Now, I don’t know whether Lawvere is using the words “radius” and “engulfing” in some sense far removed from their everyday meaning. But if not, can his remark also apply to other spatial and dynamic notions? Such as “orbit” (around a concept); “attractive zone” (of a concept); “migrate” (from concept to concept). In other words, to the metaphors that Hofstadter uses, and that he says will need to be made precise before we can create a “chemistry of concepts”. If so, then category theory will be an invaluable tool for cognitive science.

Even if I’m completely wrong in that last paragraph, Lawvere says category theory “will be a necessary tool in the construction of an adequately explicit science of knowing”, which is good enough for me. So I did an experiment. I searched Google and CiteseerX on his main title, Tools for the Advancement of Objective Logic.

Apart from citations in library catalogues, and papers on philosophy or pure category theory, I found only two authors whose papers seemed, on a quick skim, to depend in some important way on Lawvere’s paper. For interest, I give the references below. (The second author, Zippora Arzi Gonczarowski, had one paper on a pay-to-view site that I couldn’t read, but the abstract shows me that it’s about the same topic as the paper below):

Lawvere’s paper seems filled with crystals of compressed insight, glittering through the surface of the text like quartz through a covering of dry pine-needles. I find them intensely satisfying to contemplate, in the same way I do some El Lissitzky posters and De Stijl typography. Perhaps, high in my brain’s abstraction hierarchy, the same neurons are activated by both: category theory as the ultimate abstract sculpture.

But, aesthetics aside, how can we bring such power to more researchers than the two who cited it? I’m using Lawvere’s paper as a representative of category theory generally, of course. Ask a first-year LISP hacker to iterate over a list, and they’ll reach naturally for recursion or a MAP function. Can we bring up a generation of category-theory hackers, who, when you ask them to model putting something together, will reach naturally for a colimit?

Assuming that category theory really does have something to offer AI and cognitive science.

By the way — spreadsheets!

I thought I’d add one final reason for my interest in category theory. Spreadsheets!

Excel spreadsheets are a big source of errors. One reason is that Excel doesn’t have the notion of “modules”: chunks of program that can be written, tested and debugged one at a time, then used in as many different spreadsheets as you want. If you develop a spreadsheet in Excel, and want to use part of it in another spreadsheet, you have to copy and paste cells from the first spreadsheet to the second. If you then need to change these cells — perhaps because they have bugs, or perhaps to make the formulae in them faster — you have to make the same changes to both spreadsheets. You can’t just change one component, and have the changes propagated to every use of it. So Excel programmers tend to accumulate numerous almost-identical copies of their code. Version-control becomes a nightmare.

I’ve been experimenting with ways to overcome this, via various schemes for modularising spreadsheets. And below are references to a demo of one such scheme — a prototype component library for Google’s online spreadsheet, which lets you add code to a spreadsheet much as you would bar-charts and other displays — and to a paper about the component library for Google’s spreadsheet and for Excel. Both are for spreadsheet users, so not very technical:

Both these were inspired by Joseph Goguen’s work on sheaf semantics, and his dogma that the behaviour of a system is the limit of the behaviours of its components, explained in:

I said a bit about this in a posting here. It seems that his work also inspired research into the algebraic specification languages for software.

It illustrates the power of category theory to project down into unsuspected regions of computing. Even if we will never see a book on Category Theory with Excel.

19th November 2009