[ Jocelyn Ireson-Paine's Home Page | Publications | Make Category Theory Intuitive! ]

In this document, which I am writing as an internal report of the Department of Informatics, University of Minho, I want to explore an intuition that, by using category theory, generalisation can be formalised as an adjunction. I believe that this can be made to apply to all forms of generalisation, including logical induction, statistical curve-fitting, and learning in neural nets. But it remains to see whether this is true, and if true, whether it is useful. If true, it could be very interesting indeed, because it would provide a way to unify different kinds of machine learning.

I'll write this main line of the report assuming a knowledge of category theory. But if I find time, which I don't have much of at the moment, I'll hyperlink to explanations of some of the concepts.

By generalisation, I mean taking a set of specimen cases and constructing a rule which is sufficient to "explain" all these, and to interpolate, extrapolate, or otherwise predict new cases which fit the rule. (I say "specimen" rather than "example", because I also need to talk about examples of the formalisation described here, and want to avoid ambiguity.) Instances of this include:

- The specimens are two-dimensional points (members of
). Generalisation is least-squares fitting. The rule is a line giving the best least-squares fit to the points.**R**^{2} - The specimens are logical sentences, classified as positive or negative. Generalisation is logical induction. The rule is a new sentence from which we can infer as many of the positive specimens as possible (hopefully all) and as few of the negative specimens (hopefully none) as possible.
- The specimens are pairs
*<v,w>*associating an element of the vector spacewith an element of the vector space**V**. Generalisation is training a linear-associator neural net. The rule is a linear transformation which, if possible, maps all the**W***v*to the*w*, and if that's not possible, does the best it can by finding an optimum transform which minimises interference between different*<v,w>*associations. - The specimens are pairs
*<p,l>*where*p*is a point inand**R**^{n}*l*is a member of some setof symbolic labels. For instance, the**L***p*might be elements of some space whose dimensions measure a voter's preference for various policies; the*l*could then be the possible parties,*{green,labour,libdem,tory}*. Generalisation is nearest-neighbour prediction. The rule is a function which extends the set of pairs*<p,l>*, mapping every point into a label. It can therefore be used to predict the preferred party of other voters. I've taken this example from**R**^{n}*Truth from Trash*by Chris Thornton. - The specimens are the instances of generalisation given in this report. Generalisation is fitting each to a notion in category theory. The rule is a categorical construct...!

My intuition is that the generalisation of a set of specimens is, in some sense, a limiting case or least upper bound of them. Ideally, it should contain just enough information to reconstruct them and the other cases to be predicted, but no more. Category theory provides several notions which can be used to formalise this, of which the simplest is that of limit. (Actually, I suppose product is even simpler.)

Here is one of the examples that inspired this intuition. Consider the
category * Ex* whose objects are the atomic propositions

It seems reasonable to say that forming the conjunction of a set of propositions is one possible (and very crude) way of generalising from them. And informally speaking, the conjunction contains just enough information to imply them all, but none of the others in the category (unless they were implied by the originals anyway). Now, we also know that in this category, their conjunction is their limit. (More correctly, it's a limit of the diagram containing the propositions and the implication arrows between them.) But this formalises the notion expressed in the "just enough information" sentence, because of the universal mapping property of the limit. (That is, any other proposition which implies the originals also implies their conjunction.)

Though this was one of the examples behind the original intuition, it is very simple, and we need more mathematical equipment to deal with other kinds of generalisation. One problem is that the language in which the generalisations are expressed will usually be different from that of the specimens.

As an aside (I'm not sure how relevant this is to the main point of the report), this isn't unreasonable. It's reasonable to argue that generalisation, in order to be useful, must achieve some compression. The generalisation formed from a set of specimens should take up less space, consume less computing time, or otherwise be easier to handle, than the original data, otherwise there's no point in having it. (Compression won't always be possible: see David Marr's distinction between type-1 and type-2 theories. Also, it's interesting to note that Nick Chater has argued that the saving of cognitive resources can be seen as a reason for the evolution of generalisation in animals.)

Anyway, one way to force compression is by having the language for generalisations more compact than that of the specimens. But this will probably be at a cost: the generalisation language won't be able to express all the information in the specimens, so some detail will be lost.

I'm going to try formulating this using more than one category, and functors.
Let's now assume there are two categories. * Ex* is the
category whose objects are sets of specimens.

Consider the category * Ex* whose objects are the
non-empty sets of sentences

Let * Gen* be the category whose objects are sentences:
either the atomic sentences

Now introduce the functor *G: Ex ->
Gen*, defined as follows.

We could say that *G* implements a simple form of logical induction,
rather more interesting than the earlier
one. And there are two languages, that of * Gen*
restricted compared to that of

Another example with a similar structure follows. Let
* Ex* be the category whose objects are the non-empty sets
of colinear elements of

Let * Gen* be the category whose objects are either the
singletons

Then let the functor *G* map each singleton to itself, and map each set
*S* with more than one element to the line going through all *S*'s
points. *G* maps inclusions to inclusions. As with the previous instance,
*G* flattens most of * Ex* into

All the instances above can be formalised as adjunctions. *G* then
becomes a functor which maps a set of specimens to an object which is in some
sense the "completion" of that set. It acquires a right adjoint *E* which
maps this object back to the set of all possible specimens derivable from this
completion object.

The best way to think about what this means in each example may be to think
in terms of the units and counits: an adjunction * Ex* to

In the other direction, given any object *e* in
* Ex*, the functor

(I need to relate this to the notion of limiting amount of information, showing how that arises from the adjunction.)

In the least-squares
example, I stipulated that the sets of points in * Ex*
must be colinear. This isn't very realistic: in real life, most sets of
specimens will not exactly fit a line.

In general, given any useful generalisation method, some sets of specimens will not have an exact generalisation. You can say that this is due to experimental error, or to an inadequate statistical model, or to an insufficiently powerful description language, or to restricted storage capacity, or whatever.

In the line-fitting example, we can try fixing this by extending the
generalisation functor *G* so that it maps each set of non-colinear points
to the line with the least least-squares distance from them. [What if, in this
or other examples, there is no unique best generalisation?] The problem with
this is that *G* then no longer respects the arrows between the sets which
are the objects in * Ex*. Originally, if

One might argue that the way to fix this is not to decide a priori that the
morphisms in * Ex* should be inclusions, but to let them
be determined by the morphisms in

This report is incomplete in several ways, bibliography and proofs amongst them. I would also like to add explanations for those who don't know category theory. Apart from that, this is still very early in the research. I need to deal with noise, and impose sensible structures on the categories. It will then be necessary to show that enough instances of generalisation can be formalised as adjunction.

I also need to demonstrate that the formalisation is useful. One way to try using it would be to compare different generalisation methods. If we're using category theory correctly, it should make this quite easy. An interesting possibility would be to take non-symbolic learning methods such as neural nets, and to try laying them alongside some derived symbolic system, showing how symbols can emerge. Good examples to try this with might be those used by Stephen Harnad in his work on categorical perception.

Another possibility is to try deriving learning algorithms from the definition. Algorithms have been derived from categorical definitions in other areas of computer science, for example by Rod Burstall ...

I also need to restrict the definition. I do not claim that all adjunctions
correspond to generalisations - it's hard to see how the functor mapping sets to
their free groups would, for instance. So one thing we need to do is to restrict
the definition so as to capture the essential properties of generalisation. Part
of this may be that all elements of an object in * Gen*
are "on the same level" in the sense that the elements of a free group or monoid
are not. From the set

It would be interesting to connect this research with that described in
Goguen's paper *What is
Unification?*. There, he formalises unification in terms of equalisers
and limits. The least-general generalisation of two terms, as used in logical
induction by inverse resolution is in some sense dual to unification. So we
could try dualising his formalisation to model "anti-unification", and if this
works, try broadening it to other kinds of generalisation. (Perhaps doing this
will end up with a categorical formulation equivalent to the one I'm working on
here.)

- Burstall,
Rod,
*...*

- Goguen,
Joseph. Possibly in a paper called
*Objects*. Most of the important content, the sheaf semantics, can be found in this later paper, although the remark about constructing an object description from partial observations cannot. - Goguen,
Joseph,
*What is Unification?*in*Resolution of Equations in Algebraic Structures, Volume 1: Algebraic Techniques*, edited by Maurice Nivat and Hassan Ait-Kaci, Academic Press, 1989, pages 217-261.

Link to Postscript version here. - Goguen,
Joseph,
*Sheaf Semantics for Concurrent Interacting Objects*. In*Mathematical Structures in Computer Science*, Volume 2, 1992, pages 159-191.

Link to PostScript version here. - Harnad,
Stephen,
*The induction and representation of categories*. In*Categorical Perception: The Groundwork of Cognition*, 1987, New York: Cambridge University Press.

Link to plain text version here. - Marr, David,
*Vision : A Computational Investigation into the Human Representation and Processing of Visual Information*, 1983, W.H.Freeman and Co. - Thornton,
Chris,
*Truth from Trash: How Learning makes Sense*, 2000, MIT Press.

Publisher's blurb here, links to plain text, HTML and PostScript versions here.

25th June 2000.

[ Jocelyn Ireson-Paine's Home Page | Publications | Make Category Theory Intuitive! ]