AI Expert Newsletter

AI - The art and science of making computers do interesting things that are not in their nature.

February 2005

Introduction

Nano-engineered robots crawl on muscle-powered silicon legs
- Links

The Anthill as Computer
Artificial Ants, Mindstorms, and StarLogo
- Links
Mathematics, Creativity, and Strong Black Coffee
Computational Creativity I - solving analogy problems with Copycat
Mathematics and Alchemy
Computational Creativity II - machine learning with Inductive Logic Programming and Aleph

Introduction

Welcome to the February AI Expert. As promised last month, I have the do-it-yourself on Inductive Logic Programming, and another do-it-yourself on the Copycat analogical-reasoning program, which explores the emergent subcognitive behaviour that its authors believe to be the crux of creativity. If it is, I believe it must also be the essence of aesthetics. To complement these, there are some quotes on creativity and emergence, and a pointer to StarLogo, a language for teaching about parallel and emergent behaviour. But first, some abnormally muscular robots.

Jocelyn Ireson-Paine

Nano-engineered robots crawl on muscle-powered silicon legs

From the number of times I've heard it reported, my first piece of news has grabbed media attention as science rarely does. Perhaps that's because it is claimed to use nanotechnology. In the UK, what we mainly hear of nanotechnology is the dangers. Grey goo will eat the Earth! Prince Charles warns of risks! Tiny particles will rot our lungs! (This last may be true, but is also a long way from the precision atomic-scale engineering that nanotechnology is really about.)

As reported by BBC News Online, University of California researchers have created robot "legs" less than a millimetre long, consisting of rat heart-muscle cells grown onto silicon or plastic "bones". Nano-scale engineering controls where the cells attach to one another as they grow, so that they form a muscle rather than some random blob of tissue; and the skeleton too is precision-engineered, with tiny hinges that allow it to move and bend.

A bit of Web searching turned up more information. The research was done by Carlo Montemagno and colleagues at the UCLA California NanoSystems Institute. From this New Scientist feature and the papers I've linked at the end, First self-assembled microrobots powered by muscle and Constructing Biological Motor Powered Nanomechanical Devices, it seems the crucial advance is in growing the muscles and attaching them to their skeleton. Previous muscle-mechanical systems have used muscle bundles extracted intact from animals; but here, the researchers persuaded the muscles to assemble themselves. The first step was to fabricate the skeleton from a silicon wafer, forming a 50-micrometre wide arch. The researchers coated this with a special polymer into which finely spaced patterns were etched, and a gold film deposited on top. Then flesh was put onto these bones. The silicon-polymer-gold combination was placed into a cell culture medium. The polymer took up water, forming a gel in which the cells could grow and differentiate, lining up along the patterns to form complete muscle bundles attached to the skeleton. And, as New Scientist says, when the researchers looked into their microscopes, they were amazed to see their musclebot crawling around.

Once we can build complete robots in this way, they will pose bizarre challenges to AI. How much autonomy will the muscle-drive have? What's the most appropriate control architecture? (That would make an interesting simulation project.) Must we still worry about inverse joint kinematics, or can we just hook up a handful of touch-sensory neurons and let the system learn to walk for itself? Most importantly, how do we wire in a sense of self-preservation? For, as student fees increase...

I'm terribly sorry, Professor, but I was working late last night, I'd missed lunch, and the vending machine was out of order. I've eaten your robot.

Links

news.bbc.co.uk/1/hi/sci/tech/4181197.stm - 'Living' robots powered by muscle. The BBC News feature.

news.bbc.co.uk/1/hi/sci/tech/3883749.stm - Prince warns of science 'risks', also from the BBC.

www.newscientist.com/article.ns?id=dn4714 - First robot moved by muscle power, from New Scientist.

http://www.sciencentral.com/articles/view.php3?article_id=218391960 - ScienCentral's popular-science page on biobots and molecular motors, including Montemagno's work. Some nice links.

www.cnsi.ucla.edu/faculty/montemagno_c.html - Carlo Montemagno's home page. The UCLA California NanoSystems Institute is at www.cnsi.ucla.edu/mainpage.html.

www.spie.org/paper/FirstSelf.pdf - First self-assembled microrobots powered by muscle, by Jianzhong Xi, Jacob Schmidt, and Carlo Montemagno. Technical descriptions of how the biobots were built; assumes knowledge of the nano-engineering techniques.

www.foresight.org/Conferences/MNT6/Papers/Montemagno/index.html - Constructing Biological Motor Powered Nanomechanical Devices by Carlo Montemagno, George Bachand, Scott Stelick, and Marlene Bachand. A draft paper for the Sixth Foresight Conference on Molecular Nanotechnology. Most of the paper is about molecular motors, but it also describes patterning techniques similar to those used in guiding the biobots' muscle growth. Quite apart from that, it's amazing to see serious plans to use single proteins as motors. I love the authors' phrase "Despite the superb performance of the F₁-ATPase motor protein ..." - it sounds like something from a review of motorbike engines.

The Anthill as Computer

A solitary ant, afield, cannot be considered to have much of anything on his mind; indeed, with only a few neurons strung together by fibers, he can't be imagined to have a mind at all, much less a thought. He is more like a ganglion on legs. Four ants together, or ten, encircling a dead moth on a path, begin to look more like an idea. They fumble and shove, gradually moving the food toward the Hill, but as though by blind chance. It is only when you watch the dense mass of thousands of ants, crowded together around the Hill, blackening the ground, that you begin to see the whole beast, and now you observe it thinking, planning, calculating. It is an intelligence, a kind of live computer, with crawling bits for its wits.

At a stage in the construction, twigs of a certain size are needed, and all the members forage obsessively for twigs of just this size. Later, when outer walls are to be finished, the size must change, and as though given orders by telephone, all the workers shift the search to the new twigs. If you disturb the arrangement of a part of the Hill, hundreds of ants will set it vibrating, shifting, until it is put right again. Distant sources of food are somehow sensed, and long lines, like tentacles, reach out over the ground, up over walls, behind boulders, to fetch it in.

From The Lives of a Cell by Lewis Thomas.

Artificial Ants, Mindstorms, and StarLogo

This can't be math I'm doing. It's fun and math ain't. So says one of the children quoted in Seymour Papert's book Mindstorms. Papert, co-founder with Marvin Minsky of the MIT Artificial Intelligence Lab, invented Logo, the Basic-like language in which children can write programs to control a computer-driven turtle, and hence learn about mathematical concepts such as polygons and interior-angle sums, as well as meta-cognitive concepts such as the need to debug one's learnt knowledge. He developed this into the idea of educational microworlds, a theme explored in Mindstorms.

I recently discovered a parellel implementation of Logo, StarLogo, through a mention in John Hiler's Weblog-related Weblog Microcontent News. He describes experiments with a StarLogo ant simulator, complete with food and pheremone trails. The behaviour generated by the simulator is complex, but emerges from four simple rules - and Hiler proposes that you could model blogging behaviour with this, for example, by replacing the rule "look for food" by "look for news".

Microcontent News appears to be no longer active, but the ant simulator is still available. StarLogo was designed as a tool for teaching children, by experts in teaching, and it's a simple way to demonstrate emergent behaviour, including ideas from Artificial Life.

Links

www.microcontentnews.com/entries/20021220-2589.htm - Microcontent News page for December 20, 2002, John Hiler's entry on the ant simulator. The simulator source is at www.cs.ucc.ie/~dgb/courses/ai/notes/myants.slogo.

education.mit.edu/starlogo/ - StarLogo.

web.media.mit.edu/~mres/ - Mitchel Resnick, developer of StarLogo and author of Turtles, Termites, and Traffic Jams: An Exploration in Massively Parallel Microworlds.

www.thepangburns.com/jesse/projects/ant_simulation.htm - Another StarLogo ant simulator, by Jesse Pangburn.

www.papert.org/ - Seymour Papert. Links to the LCSI company for constructivist educational technology using Logo microworlds, the Logo Foundation, LEGO Mindstorms, and many papers on topics such as school reform and how computers help children learn.

Mathematics, Creativity, and Strong Black Coffee

For fifteen days I strove to prove that there could not be any functions like those I have since called Fuchsian functions. I was then very ignorant; every day I seated myself at my work table, stayed an hour or two, tried a great number of combinations and reached no results. One evening, contrary to my custom, I drank black coffee and could not sleep. Ideas rose in crowds; I felt them collide until pairs interlocked, so to speak, making a stable combination. By the next morning I had established the existence of a class of Fuchsian functions, those which come from the hypergeometric series; I had only to write out the results, which took but a few hours. ...

... Permit me a rough comparison. Figure the future elements of our combinations [full-fledged ideas] as something like the hooked atoms of Epicurus. During the complete repose of the mind, these atoms are motionless, they are, so to speak, hooked to the wall; so this complete rest may be indefinitely prolonged without the atoms meeting, and consequently without any combination between them.

On the other hand, during a period of apparent rest and unconscious work, certain of them are detached from the wall and put in motion. They flash in every direction through the space (I was about to say the room) where they are enclosed, as would, for example, a swarm of gnats or, if you prefer a more learned comparison, like the molecules of gas in the kinematic theory of gases. Then their mutual impacts may produce new combinations...

Now our will did not choose them at random; it pursued a perfectly determined aim. The modified atoms are therefore not any atoms whatsoever; they are those from which we might reasonably expect the desired solution. Then the mobilised atoms undergo impacts which make them enter into combinations among themselves or with other atoms at rest which they struck against in their course. Again I beg pardon, my comparison is very rough, but I scarcely know how otherwise to make my thoughts understood.

By Henri Poincaré; quoted in The Psychology of Invention in the Mathematical Field by Jaques Hadamard.

Computational Creativity I - solving analogy problems with Copycat

Strange though it may seem, nondeliberate yet nonaccidental slippage permeates our mental processes, and is the very crux of fluid thought.

In the early 1980s, Douglas Hofstadter took over the Mathematical Games column of Scientific American, and wrote a number of essays under the heading Metamagical Themas, later collecting them into a book published under the same name. He covered many topics, from the arbitrariness of the genetic code to the threat of nuclear war; for me, the most interesting are about creativity, where he mentions ideas which he and colleagues at the Fluid Analogy Research Group went on to implement in their programs.

The quote above is from one of Hofstadter's essays, Variations on a Theme as the Crux of Creativity. Scott Bolland has written a nifty applet reimplementation of Hofstadter and Melanie Mitchell's Copycat program; it's easy to run, and I'd like to use it to introduce you to some of these ideas.

Conceptual slippage

Consider the following questions:

What is to ijk as abd is to abc?
What is to iijjkk as abd is to abc?
What is to xyz as abd is to abc?

These are all simple letter-string problems. To the first, you'd probably say the answer is ijl, since the c in abc has gone to a d, and to "do the same thing" to the k, we move it one letter forward too, making it an l.

The second problem is less straightforward, because iijjkk has no immediately obvious mapping onto abc, making it harder to see to which part of it we should "do the same thing" to as we did to the c. One possibility is to map the final k onto the c and move it one letter forward, giving us iijjkl. However, it seems more elegant to map two letters each of one sequence onto one of the other, making the final kk correspond to the d. Then we move both letters forward, giving the answer iijjll. This is what Hofstadter calls conceptual slippage. The kk isn't actually the same kind of thing as the c, and we have to "slip" our notion of it to perceive it as such.

People give a variety of answers to the third problem. One I, as well as Hofstadter, find particularly elegant, is wyz. Here, the a has been mapped to the z and the c has been mapped to the x, so that one sequence is reversed with respect to the other. Then, to preserve symmetry, the relation of alphabetic successor between c and d is also reversed, so that to find the answer to the problem, "doing the same thing" to x takes its alphabetic predecessor, converting it to w. The conceptual slippage here applies more to the relation between letters than to the letters themselves as it did in the previous example.

Why is analogical thought so important?

These letter-string puzzles are fun, but have they really anything to do with the heavy-duty creative thought practised by an Einstein, a Shakespeare, or a Turing? To try and justify Hofstadter's assertion that they do, I created the following analogy problems:

What is to Prolog (or logic) as classes are to Java?
What is to reasoning by association as Boolean algebra is to reasoning by deduction?
What is to liquid methane as Terrestrial biochemistry is to water?
What is to integers as union is to sets?
What is to 5-8 as 1 is to 5-4?
What is to 10÷3 as 2 is to 10÷5?
What is to -9 as 3 (and -3) are to 9?
What is to groups as primes are to integers?
What is to homosexual love as marriage is to heterosexual?
What is to "mimsy" (or "wabe", or "borogroves", ...) as German is to English?
What is to line drawing as minor chords are to music?

Two in computing; one (inspired by the Huygens probe) in xenobiology; five in mathematics (one solved by Lenat's AM program, four well-known in the history of the number system, one finally resolved 22 years ago by by group theorists); one for law and religion; one (translation of Carroll's Jabberwocky into Der Jammerwoch) in literary translation; and one for visual art (answered by Paul Klee's Pedagogical Sketchbook). So we can certainly couch a lot of problems as analogies. Whether an extension of the mechanisms Hofstadter proposes is essential to solving them, and how much else is necessary, I don't know. But it's interesting and perhaps important, and that's why I'm introducing Copycat to you.

Incidentally, Hofstadter also proposes, more prosaically, that perceiving the right analogy is essential for survival. Consider someone who argues that killing a person is the same as breaking a window because both are nasty and can be done with a brick: such reasoning is somehow missing the essence of the situation. Minds so faulty will be weeded out by natural selection, leaving behind those with a deeper perception.

How Copycat works

Paint the letters of which the problem is made onto little Velcro balls and put these into a jar. Shake hard. From time to time, two related letter-balls will collide. They may be related because one follows the other in the alphabet, or because one stood next to the other in the problem, or even because they are the same letter.

Now, imagine that as the letter-balls collide, they sometimes stick and form bonds, the strength of the bond depending on how strongly the letters are related. An i-j bond is stronger than an i-k bond. Both are much much stronger than an i-u bond, which, since the i and the u are just too distant in alphabetic order to be related, is almost nonexistent. The shaking will break bonds as well as make them. But the stronger bonds will persist for longer, forming relatively more stable combinations.

Occasionally also, these pairs will themselves bond together. An a-b may bond with another a-b through the force of identity. But - and this is the crux of conceptual slippage - it may also bond with a z-y. This happens because the a-b bond is an "alphabetic successor" bond, and the z-y bond is an "alphabetic predecessor" bond, and an "alphabetic successor" bond is related to an "alphabetic predecessor" bond by the force of oppositeness.

Now, we just watch the jar. As molecules form, we shake less vigorously, and then even less vigorously. The rate at which we cease to shake must be carefully chosen. Big molecules are almost complete solutions to the problem and we don't want to destroy them by shaking too violently. On the other hand, we don't want to stop too soon; if we do, the mixture will freeze before enough combinations have been tested.

My description above was inspired by Poincaré's quote on mathematical creativity. Hofstadter cites it in his essay, and it probably inspired some aspects of Copycat's implementation. In more technical terms, Copycat does a simulated annealing over structures formed by linking letters by their possible relationships. This cools at a rate dependent on the size and number of structures formed, and implements a parallel breadth-first search over the possible combinations of relationships between elements of the analogy problem. Hofstadter calls this a "parallel terraced scan".

Incidentally, my point about the difference between an i-j bond versus an i-u bond reflects a distinction Hofstadter draws between "roles" and "literals". In the "What is to ijk as abd is to abc" problem, the c and d can be perceived as related. It therefore makes sense to imitate this relation - "do the same thing" - to the k. But if the problem had been "What is to ijk as abx is to abc", then the c and x would be just too far apart. A sensible answer would regard the x as fixed - a literal - rather than filling a role, that of alphabetic successor. So we would make the answer ijx. Shallow minds that miss the essence of a problem may do so because they see literals when they ought to see roles.

I highly recommend reading Analogies and Roles in Human and Machine Thinking, another essay from Metamagical Themas, to see theses ideas, including the distinction between roles and literals, developed further. Copycat is a highly subtle attempt to model how we balance aesthetic forces such as symmetry, uniformity, good substructure, and boundary strength. Try designing a typeface, drawing a comic strip, or whatever your favourite artistry is; feel them; and compare with the little letter problems.

Running Copycat

Bolland's Copycat is an applet, so needs no messy installation, just a Java-enabled Web browser. Point your browser at the applet, www.psy.uq.edu.au/CogPsych/Copycat. You should see a page starting "Copycat" and "Please wait while the Copycat applet loads". A grey applet rectangle will shortly appear above them, followed by a separate pop-up containing the applet's user interface. This is initially blank, before changing to display its controls. On a fast University line, this took my system less than a minute: very good for a download from Queensland to Oxford, half-way round the world.

Once the applet is running, you may want to expand the window, which will also expand the contents, making it easier to read. Be warned that when I changed my browser to another page, the applet window went blank. Returning to the applet URL didn't restore it, and I had to close my browser, then open a new one and start right from the beginning.

Demonstration 1: a simple analogy problem

I'm always sceptical when running new software; so many things can go wrong. So let's see straight away whether this one will work on an analogy problem. Click on the menu bar to display the "File" menu, and select "New". This will pop up a little "New problem" window, with three text fields, for "Initial string", "Modified string", and "Target string". The fields should contain the strings abc, abd, and ijk. Click the "OK" button to enter this as the current problem.

The window will disappear, and your strings will reappear in the main window, under the heading "Workspace". This holds the problem that Copycat is going to think about next, and is roughly equivalent to human long-term memory. We now tell Copycat to begin. The grey bar to the right of the workspace contains four control symbols, of which the second is a right-pointing triangle. This stands for "Play". Click it, and Copycat will start its problem solving. Within a few moments, the solution will appear in the bottom-right box in the Workspace. Copycat is non-determininistic, which means that it can come up with different answers from run to run. However, in this case, the answer is very likely to be ijl. So Copycat has decided that it is ijl which is to as ijk as abc is to abd.

Once Copycat has displayed a solution, doing another run is easy. Click the leftmost of the four buttons, the square, to reset. A question mark should replace the previous answer. Then press the "Play" triangle again, and wait as before.

Instead of pressing "Play", you can single-step through a problem by repeatedly clicking the right-hand button, a triangle with a bar. Finally, pressing the third button, "||", interrupts Copycat's thoughts. Press Play or Single-Step to continue.

As Copycat runs, you will see the reading on the big red thermometer gradually decrease. This corresponds to the amount of shaking in my how Copycat works analogy, i.e the temperature in the simulated annealing. In the main window, you see the letter strings, with links forming and re-forming between them. These links are the "bonds" or relationships which Copycat currently perceives. You will also see a counter for "codelets". These are pattern recognisers which look for the relationships - the equivalent of bumping letter-balls together in the jar.

Demonstration 2: group runs and non-determinism

I said above that Copycat may think up different answers to the same problem in different runs. We can investigate this non-determinism by using group runs. Click "Run" on the menu bar, and select "Group Run". This will pop up a "New group run" window, which will invite you to type the name of the run. Any name will do - it's just an arbitrary identifier. My window had to be expanded vertically before I could see the field that holds the name.

Having typed a name, click OK, and select New from the File menu as we did above. Enter a problem. A good thing to try is to leave the initial string at abc and the modified string at abd, but to make the target string be xyz, as in my section on conceptual slippage.

After entering the problem, click OK in the problem window. This time, the main "Workspace" window will remain blank, except for four control buttons at its top right, to the left of the original four buttons we have already used. Press the second, which is another "Play" button. You have initiated a whole group of runs, and what appears this time is a bar chart whose bars grow to depict the frequency of each answer Copycat has found. How well does it manage conceptual slippage in the "What is to xyz as abd is to abc" problem?

If you got this far, you've managed to run Copycat, and I hope you're intrigued by these ideas. I am therefore going, abruptly, to stop. As well as his applet, Bolland has written an excellent tutorial on Copycat, which it would be silly for me to duplicate. For other reading, see the links below, and the essays in Metamagical Themas. And don't underestimate the subject's importance. As Hofstadter says at the end of Analogies and Roles in Human and Machine Thinking:

I feel confident that this tiny alphabetic world allows all of the key features of analogy to make their appearences. In fact I would go further and claim: Not only does the Copycat domain allow all the central features of analogy to emerge, but they emerge in a more crystal-clear way than in any other domain I've yet come across, precisely because of its stripped-down-ness. Paradoxically, Copycat's conceptual richness and beauty emanate directly from its apparent impoverishedness, just as the richness of the "ideal gas" metaphor emanates from its absolute simplicity. Time will tell if this limb I am out on is solid.

Links

www.psy.uq.edu.au/CogPsych/Copycat - Scott Bolland's Copycat applet. His tutorial, which assumes no previous knowledge, is at www2.psy.uq.edu.au/CogPsych/Copycat/Tutorial/.

www.itee.uq.edu.au/~cogs2010/Schedule2003.html - Bolland's timetable for a Queensland University course on models in cognitive science. The entry for Week 10 links to practical notes on Copycat, including exercises.

www.itee.uq.edu.au/events/seminars/archive/sem-0015.html - Using Copycat for context-dependent visual object recognition. Abstract of Bolland's Ph.D confirmation seminar on his object-recognition program VICOR, which uses analogical matching to recognise the same object in apparently unrelated visual data.

www.cigital.com/~gem/lspirit.html - The Letter Spirit project. See also the entry on Science, Simplification, and Special Somersaults at www.j-paine.org/ainewsletter/jan2005.html#s in the January Newsletter.

www.cs.pomona.edu/~marshall/metacat/ - James Marshall's Metacat Home Page, with links to his thesis and source code, and instructions on running the code. Metacat extends Copycat by being able to introspect, recognising patterns in its problem-solving processes as well as in its perceptions.

www.ulg.ac.be/cogsci/rfrench/elephants.pdf - When coffee cups are like old elephants or Why representation modules don't make sense. Preprint of Robert French's paper from Proceedings of the 1997 International Conference on New Trends in Cognitive Science, edited by A Riegler and M Peschl, Austrian Society for Cognitive Science. French argues that we can't separate the creation of perceptual representations from their use, and hence for programs and cognitive models that intertwine them as Copycat does.

www.ulg.ac.be/cogsci/rfrench/analogy.tics.pdf - The Computational Modeling of Analogy-Making. Preprint of French's paper from Trends in Cognitive Sciences, Volume 6, Number 5, 2002. A survey of previous programs and models of analogical reasoning.

www.aaai.org/AITopics/html/analogy.html - AAAI analogical reasoning page.

Mathematics and alchemy

Mathematicians often cultivate a certain smug attitude of superiority over the experimental sciences. There is, after all, a higher form of truth in mathematics which is impervious to the whims of time. A theorem proved 500 years ago is still true today, and while such theorems may be elementary, we often find them useful (and sometimes we even find them interesting). We still teach Newton to our undergraduates; a chemist who reached that far back would be teaching alchemy.

Yet mathematicians are rather miserable scientists; they have no sense of empirical classification or experimental control. In technique, mathematics and alchemy are virtually indistinguishable. Consider the two. The alchemists mixed together whatever was on hand and then observed the results which were, by the way, often quite ordinary. Of course they could repeat the process with the new products (again and again), but the further such experiments proceeded the more confused and uninteresting was the product. Their experiments were largely uncontrolled meanderings which only rarely proveded a glimmer of revelation. And modern mathematics? The alchemists would feel right at home.

From Mathematics and Alchemy, editorial by John Ewing in The Mathematical Intelligencer Volume 3, Number 4, 1981.

Computational Creativity II - machine learning with Inductive Logic Programming and Aleph

Pervez Musharraf could one day be taking advice from an Induced Logic Program. Inductive Logic Programming is a form of symbolic machine learning which learns, from examples expressed in Prolog or similar logic-based notation, logical rules stating what the examples have in common. Oxford University Computing Lab is one of many centres researching into ILP, and its Web site lists an abundance of applications. Amongst these - which include learning rules to identify over-performing stocks, diabetics at risk from kidney damage, the best embryos for transfer in in vitro fertilisation, and the best mesh-resolution for finite-element mesh designs - is Jamal Abdul-Nasir's work on classifying questions posed to the Speaker of the National Assembly of Pakistan.

Every day, about 100 parliamentary questions are submitted to the Questions Branch of the National Assembly, in the hope they will be put to ministers during Question Hour. Some questions are not permitted - such as those asking for publicly-available information that members could find out elsewhere - so the clerks of the Questions Branch have to sort the permissible from the impermissible. This takes about 5 minutes per question. That's not a lot, but because of the number of questions, any computational assistance would be welcome. Abdul-Nasir's research used ILP to learn permissible-vs-impermissible rules from a database of previously-classified questions, and concluded that this could save a substantial amount of time.

I can't imagine Tony Blair accepting advice from a computer program; but this example, and the others listed above, show how versatile ILP can be. Because of this versatility, the human-readability of its output, and its ability to handle complicated structures such as molecules and documents, ILP deserves to be better known. Several free implementations now exist, and I'm going to use one of these to work through some examples that you can try for yourself.

ILP, readable rules, and drug discovery

Before I launch into the downloads and demonstrations, I want to show you one great advantage of ILP, the readability of its rules. I'm going to illustrate this with an example from drug design. First, a bit about the subject.

Pharmacology - the study of drugs and their effects - is difficult. We may know that some compound lowers blood pressure, induces anaesthesia, or lifts depression, but have little idea why. Perhaps its effect was first discovered by accident, as legend has it happened with coffee when, one day, Kaldi the shepherd noticed his goats jumping around in the field. After eating the red berries from the shiny dark-leaved shrub the goats had been nibbling, Kaldi was jumping around too, and the rest is history: London coffee houses; Sartre and Hemingway scribbling away in Les Deux Magots; take-away cappuccino ventis on every street corner; and me bashing away at this feature in the Excelsior, Oxford.

To explain such effects, pharmacologists invented the "pharmacophore". An often-used analogy is that of lock and key. Designed drugs, and other active molecules such as caffeine, affect enzymes by slotting into regions of a particular size and shape, like a key fitting into a keyhole. It's only the key's teeth whose shape is important; it doesn't matter whether the head is oval or hexagonal, or threaded onto a keyring. Similarly, only a small part of the "ligand" - the drug or other molecule - interacts with the enzyme. The rest is scaffolding. The part that does interact needs to be a particular shape (though shape is a fuzzy concept when you're as small as an atom), as well as a particular distribution of electrons and charge. These properties are the pharmacophore. The pharmacophore for an enzyme determine the ligands that bind to it, as the tumblers in my bedroom lock determine the keys that open it.

One class of drugs very important for treating high blood pressure is the ACE-inhibitors. These block the action of Angiotensin-Converting Enzyme or ACE, which generates a protein that constricts blood vessels and raises blood pressure. It's not surprising that, in their quest for better treatments, pharmacologists have tried to determine the pharmacophore of the ACE-inhibitors.

And so has ILP. There is a paper, Pharmacophore Discovery using the Inductive Logic Programming System PROGOL, which describe a blindfold trial where ILP was applied to ACE-inhibitors for which earlier experimenters (not using machine learning) had already worked out a pharmacophore. The paper is, incidentally, a good introduction to both ILP and drug discovery, requiring little knowledge of either. As the authors say, this trial was nearly as straightforward as "(1) run Progol over the earlier experimenters' data and assumptions; (2) compare the output with their results". It was a success. Progol did rediscover the pharmacophore.

Moreover, its rediscovery was easy to read. Progol represented its rule internally as a Prolog clause. But it can translate these into a stylised English. Doing so here gave this output:

  Molecule A is an ACE inhibitor if: 
    molecule A can bind to zinc at a site B, and 
    molecule A contains a hydrogen acceptor C, and 
    the distance between B and C is 7.9 +/� 1.0 Angstroms, and 
    molecule A contains a hydrogen acceptor D, and 
    the distance between B and D is 8.5 +/� 1.0 Angstroms, and 
    the distance between C and D is 2.1 +/� 1.0 Angstroms, and 
    molecule A contains a hydrogen acceptor E, and 
    the distance between B and E is 4.9 +/� 1.0 Angstroms, and 
    the distance between C and E is 3.1 +/� 1.0 Angstroms, and 
    the distance between D and E is 3.8 +/� 1.0 Angstroms.

Other applications

I have a fondness for ILP in pharmacology, through working on this myself with one of the authors of the paper above. However, that's not the only reason I chose it to illustrate ILP. Molecules are messy, complex, structured. They branch, fold and tangle in three dimensions. They are built from atoms which we must specify geometrically (size), chemically (is it a hydrogen acceptor?), and in relation to one another (bonding). Even without the rest, the geometry can be daunting: those who work on protein folding know that you have to think about the sequence in which the amino-acid building blocks are connected (primary structure), and the way these sequences curve to form helices and strands (secondary structure), and the way these helices and strands then coil and loop to make the entire protein (tertiary structure). If ILP can cope with all this, surely little else can frighten it.

As the Oxford University applications page with which I introduced ILP shows, ILP is indeed not limited to pharmacology: its applications range from medical screening to learning about stocks and shares. Others will be found via my links below.

I should give two warnings. First, ILP still has shortcomings. Some of these are discussed in the recent paper ILP: A Short Look Back and a Longer Look Forward.

Second, you must know some theory to use ILP correctly; otherwise, its results may be subtly wrong without you realising. Drug designers prefer large pharmacophores to small, because they indicate that the molecules being tested have more in common, a result which is less likely to occur by chance. In ILP terms, this means the designers would prefer a large rule to a small one. ILP programs, however, prefer short rules to large. This is because they're simpler and thus more interesting as theories of what distinguishes examples of a concept from non-examples. (Einstein would not have been so popular had e equalled 0.0000000513 + mc^2.000056). The authors of our ACE paper had to correct for this mismatch; and had to know how ILP worked in order to do so.

Installing and running Prolog

I'm now going to start the demonstrations. The first step is to obtain a suitable Prolog. Aleph will run under SWI and YAP Prologs; I am using SWI, because that's my Prolog of choice anyway. Prolog experts may wish to try porting; otherwise, follow my instructions, and you should be able to run the demos even if you don't know Prolog. So go to Jan Wielemaker's SWI Prolog site at www.swi-prolog.org/, and then to the "Download" link in the left-hand column. This brings you to the "SWI-Prolog downloads" page; go to the "Development release" link and choose one of the top three downloads, depending on your operating system. On mine, Windows XP, the self-installing executable installs with no trouble: just press "Accept", "Next" or "Finish" at the appropriate questions, as you would when installing any other program.

If you've not used Prolog before, a bit of practice will help. I shall give instructions in Windowese; adapt them to your system as appropriate. Installing SWI should have left a Prolog icon on your desktop. Double-click it. This will bring up a white Prolog window displaying something like:

  Welcome to SWI-Prolog (Multi-threaded, Version 5.5.3)
  Copyright (c) 1990-2005 University of Amsterdam.
  [etc]
  For help, use ?- help(Topic). or ?- apropos(Word).

  1 ?-

Click in the window to get the cursor into it, and type the query

  1<2.

after the ?- prompt. Don't omit the dot at the end, otherwise Prolog won't know the query is complete. Hit RETURN. Prolog should reply

  Yes
  2 ?-

Prolog has interpreted your input as a query about the truth of the statement "1<2". Since the statement is true, it replied with a Yes.

Now type the query

  1>2.

You should see output similar to the above, but with a No instead of a Yes, indicating that the statement is false.

If you got this far, your Prolog is working. Should you wish to exit, you can close the window, but you may as well leave it awaiting the first demonstration. Before returning to the demo, I shall explain how to download the Aleph ILP program and the data to be learnt about, and then show what the data means.

Installing Aleph

Now you need Aleph. Aim your browser at Ashwin Srinivasan's Aleph site, web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/. This will get you the Aleph manual.

Aleph itself lives at www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/aleph.pl, which is linked from the "How to obtain Aleph" section of the manual. Download and save it, which should give you an aleph.pl file. Don't accept if your browser offers to execute the file: it's probably been set up to think the .pl extension denotes Perl.

The example datasets live at www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip. This zip file contains a top level examples directory, with under it a number of subdirectories and a README with a one-line description of each. Each subdirectory contains .b, .f and .n files, the inputs to Aleph. There is usually also a "typescript" file showing a run, and possibly other files containing more information. In the paths I specify to Prolog, I shall assume you've unzipped the file so that its examples directory lives in the same directory in which you've put Aleph.

Ryszard Michalski's train challenge to machine learning

The first demonstration is a Prolog version of Ryszard Michalski's notorious trains challenge, posed to machine learning in 1974. You are given data describing ten trains, and told that five are eastbound and five are westbound. The challenge is to induce, from descriptions of the trains, a rule that distinguishes those in the "westbound" set from those in the "eastbound". These labels are arbitrary, by the way - there are no destination indicators on the trains, or anything else connected with directions, and the challenge would work as well if you were told that five trains were squamous and five were rugose. The point is just to find a rule that passes one set of trains and fails the others.

The reason this was such a challenge is that the trains are fairly complex structured objects. They consist of carriages - different trains have different numbers of carriages - which can be short or long, have open or closed roofs and differing numbers of wheels, and carry stylised loads which may be rectangles, triangles, or other shapes. The learning program must, therefore, be able to represent this structure and work with it when constructing its generalisations. At the time, such "relational" learning was a young subject, research in which had not progressed further than Patrick Winston's arch-learning program, which learnt from examples of simple geometric structures consisting of only three or four components each.

Describing trains in Prolog

Let's now see how the trains look in Prolog. Examine the train.f file in the trains subdirectory of the example datasets. The task which these datasets pose is to find a rule which is true of all the eastbound trains and false of all the westbound. That is, we want Aleph to find a concept of which all the eastbound trains are positive examples and all the westbound are negative examples. The positive examples are in train.f.

The first line of the file says that the predicate eastbound is true of a train called east1. Said more simply, train east1 is eastbound. Similarly, the second line tells us that train east2 is eastbound. And so on:

  eastbound(east1).
  eastbound(east2).
  eastbound(east3).
  eastbound(east4).
  eastbound(east5).

This is all very well, but so far, we only know these trains' names, but nothing of their make-up.This is described in the train.b file. Starting at line 58, we see:

  short(car_12).	
  closed(car_12).	
  long(car_11).	
  long(car_13).
  short(car_14).
  open_car(car_11).
  open_car(car_13).
  open_car(car_14).
  shape(car_11,rectangle).
  shape(car_12,rectangle).
  shape(car_13,rectangle).
  shape(car_14,rectangle).
  load(car_11,rectangle,3). 
  load(car_12,triangle,1).
  load(car_13,hexagon,1).
  load(car_14,circle,1).
  wheels(car_11,2).	  
  wheels(car_12,2).
  wheels(car_13,3).
  wheels(car_14,2).
  has_car(east1,car_11).
  has_car(east1,car_12).
  has_car(east1,car_13).
  has_car(east1,car_14).

These are facts about parts of trains. For example, the first line shown above says that the predicate short is true of car_12 - that is, carriage 12 is short. Similarly, carriage 12 is closed. Carriage 11 is long. And so on.

The predicates with two or more arguments, such as shape and load, work the same way. Carriage 11 has a rectangular shape. Carriage 12 carries 1 hexagon. Carriage 14 has 2 wheels.

The final four lines above associate a train with its carriages. Although we've told Prolog lots of stuff about the carriages' properties, we haven't told it which train these belong to. So the final four lines do this, saying that the train east1 has carriages car_11 to car_14; and similarly for the other trains.

I haven't yet mentioned the trains.n file. This contains "negative examples", i.e. descriptions of trains which are not eastbound. It's always easier to learn the boundaries of a concept if you have examples that lie outside them - as when little Johnny opens his copy of the Three Little Pigs and says "Look Mummy! Nice doggie!", pointing at a toothily grinning wolf. ILP has developed methods for doing without negative examples, but my two demonstrations will use them.

Demonstration 1: learning about the structure of trains

Let's now return to Prolog, which we left at the end of the section on Installing and running Prolog. The example datasets are nicely set up, and running them is easy.

Start Prolog again if you exited it. Then change its current directory to wherever you put Aleph. Do so by typing

  cd('d:/aleph').

but with the appropriate path for your files. The quotes round the path are single quotes, and the directory separators are forward slashes, even in Windows. Remember the final dot.

Then give the command

  [aleph].

(The square brackets round the filename are an idiosyncratic abbreviation for the file-loading command.) Your Prolog should reply something along these lines:

  3 ?- cd('d:/aleph').

  Yes
  4 ?- [aleph].
  %   library(pce) loaded into pce 0.05 sec, 170,024 bytes
  %  library(broadcast) compiled into broadcast 0.06 sec, 197,144 bytes
  %  library(time) compiled into time 0.00 sec, 2,956 bytes

  A L E P H
  Version 5
  Last modified: Sun Oct 10 06:59:50 BST 2004

  Manual: http://www.comlab.ox.ac.uk/oucl/groups/machlearn/Aleph/index.html

  % aleph compiled 0.26 sec, 690,528 bytes

  Yes
  5 ?-

Now you can run Aleph. Assuming that your examples are under the Aleph directory, change to their trainssubdirectory; then tell Aleph to read the files and learn!

  cd('examples/trains').
  read_all(train).
  induce.

The calls to read_all and induce generate a lot of output, ending with a list of the trial rules Aleph generated, the best rule (i.e. the smallest that describes all the positive examples but none of the negative), and a summary of the number of examples that passed or failed.

This is what you should see:

  eastbound(A) :-
    has_car(A, B), short(B), closed(B).

This translates into English as:

  For any train A, A is eastbound if
    A has a carriage B, and
    B is short, and
    B is closed.

In Prolog, names starting with capital letters are logical variables, which can stand for anything, as long as it's the same thing each time the variable occurs in a single rule. The ":-" means "if", and the comma means "and". My translation demonstrates how this works, but if you're going to use ILP seriously, you'll probably need to learn some Prolog.

Describing molecules in Prolog

To see how we could describe molecules to ILP, it's only necessary to think of them as trains. Instead of describing the properties of carriages, the facts could equally well be about the properties of atoms. Add new facts to describe the couplings - i.e. whether the atoms are singly, doubly or triply bonded - and allow the train to loop back on itself, and you have molecules. The classification into eastbound versus westbound can, in the simplest cases, be replaced by biologically-active versus biologically-inactive, where we know from experimental data which class each molecule goes in. The links at the end point to more information on this and other applications.

Demonstration 2: learning recursive programs

The name behind the acronym ILP is "Inductive Logic Programming". Not "Learning", not "Derivation", but "Programming". That's because the rules ILP learns are, in fact, programs. Aleph's eastbound trains rule is a train-recognising program. What I am shall now do is make Aleph learn a list-handling program. Even better, it will be a recursive list-handling program.

Go to the recursion subdirectory of the Aleph datasets, and examine mem.f. From its extension, it contains the positive examples. Some of these are:

  mem(1,[1]).
  mem(0,[0,0]).
  mem(0,[0,1]).
  mem(2,[2,3]).
  mem(3,[2,3]).
  mem(3,[4,2,3]).

Like many other languages, Prolog has lists - structures for representing sequences - built in. It writes them inside square brackets. Thus [1] is a list with the single element 1, and [4,2,3] is a list of three elements. Given this notation, we can interpret the facts as examples of list membership: in each, the first argument is a member of the list given as second argument.

Now compare mem.n, the negative examples:

  mem(0,[1,2]).
  mem(0,[1]).
  mem(3,[]).
  mem(2,[3]).

If we continue to interpret the predicate mem as list membership, then all these facts are clearly false, since the first argument does not occur in the second.

To run ILP over these, change Prolog's directory to the same recursion subdirectory. The following should work, if you left Prolog running after the trains demo:

  cd('../recursion').

Then type:

  read_all(mem).
  induce.

The output will be similar to that for the trains: a short "I've read the files" message from read_all, and then a long sequence of trial rules from induce, followed by the one it considers best:

  [theory]

  [Rule 1] [Pos cover = 12 Neg cover = 0]
  mem(A, B) :-
     B=[A|C].

  [Rule 2] [Pos cover = 10 Neg cover = 0]
  mem(A, B) :-
     B=[C|D], mem(A, D).

In this, the notation [ | ] is Prolog's way of splitting a list into its first element and the rest (its tail).

This time, Aleph needed two rules to describe the examples. Together, these form a recursive definition of list membership, the first being the base case and the second the recursive step:

  For any item A and list B, A is a member of B if 
    B is equal to A followed by something else, C.

  For any item A and list B, A is a member of B if
    B is equal to something followed by some list which we'll call D, and
    A is a member of D.

So here, Aleph has learnt a recursive program.

Getting non-Prolog data into Aleph

Both the trains and the recursive-programs datasets were written in Prolog. However, although what goes into Aleph has to be Prolog, our raw data does not, as long as we can translate it. Since I hope you'll try Aleph on some real-world applications, I'll show how we might translate data from Web forms and from Excel into Prolog.

Suppose we want to train on data typed into Web input forms. Each field in a form has its own name - the HTML "NAME" attribute. When the form is submitted, the browser sends its contents to the server as a sequence of name-value pairs, which, using the utilities provided with Perl, Python and other scripting languages, can be extracted and dumped into a database or converted immediately to Prolog. Thus, data on a user's finances might be converted to

  age(u_173,50).
  sex(u_173,f).
  children(u_173,4).
  owns_credit_card(u_173,visa).
  owns_credit_card(u_173,amex).
  annual_income(u_173,55000).

where the age, sex, etc. come from corresponding input fields, and u_173 is a unique ID generated for that particular submission. Compare the trains, where east1 could be regarded as an ID for a train.

Now suppose instead that we want to train on data entered into Excel - a plausible choice for data entry because of its handy tabular layout and ease of use. Recent Excels can save spreadsheets as XML, which is easier to unscramble than a .xls file, and for which there exists a range of parsers. Thus the spreadsheet

age sex children credit cards annual income

50 f 4 visa,amex 55000

34 m 0 mastercard 32500

age	sex	children	credit cards	annual income
50	f	4	visa,amex	55000
34	m	0	mastercard	32500

could also be translated into the facts shown just above. I've experimented with this using my spreadsheet compilation and analysis tool Model Master to pull data from Excel XML, and it works pretty well. Sometimes, Excel's simpler comma-separated value format suffices, but XML means you needn't avoid commas (or other separators) in strings, and it gives more information about data types.

Where next?

I hope these demonstrations showed how easy it now is to use ILP. There are several other examples provided with Aleph, and I do recommend running them all and working right through the Aleph manual. This will give you a good feel for the kinds of data ILP can learn from and the kinds of rule it can learn, as well as showing how many variations exist on the ILP algorithms. Information on how ILP works and on its applications can be found via the links.

If you want to use ILP on a real-world problem, read "On the appropriateness of Aleph" in the manual. There are many ILP programs; Aleph is easy to install and run, and flexible enough to implement a number of learning methods, but this flexibility makes it slower than more specialised programs. Consider also whether ILP itself is best for you. Srinivasan recommends asking whether other techniques such as statistical programs, genetic algorithms, neural networks, Bayesian nets, or tree-learners would do instead. Each has its own pros and cons - speed, storage, number and kind of training data, intelligibility of learnt knowledge, difficulty of understanding the technique - and ILP isn't always the best trade-off. But sometimes it is; and its successes on biological molecules, with all their structural complexity, are a compelling demonstration of its power.

Links

www.swi-prolog.org/ - SWI Prolog.

www.ncc.up.pt/~vsc/Yap/ - Yap Prolog.

web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/ - Ashwin Srinivasan's Aleph manual at Oxford University Computing Lab. This is the contents page, and links to the rest of the manual. The Aleph download is at www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/aleph.pl, and the examples are at www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip.

www.cs.wisc.edu/~dpage/cs731/mlj_page.ps - Pharmacophore Discovery using the Inductive Logic Programming System PROGOL. Paper by P Finn, S Muggleton, D Page, and A Srinivasan, from Special issue of Machine Learning on Applications and the Knowledge Discovery Process, Kohavi and Provost (Guest Editors), Volume 30, Numbers 1 and 2, 1998.

www.doc.ic.ac.uk/~shm/progol.html - The site for Progol, including downloads. As with Aleph, contact the author if you want to use it commercially.

capra.iespana.es/capra/ingles/cafe/cafe.htm - A Spanish goat site's story of The coffee and the goats. According to an article by Canadian chemistry professor and writer Joe Schwarcz, spent coffee grounds are excellent for removing the smell of elephant urine. That article is no longer on the Web, but - returning to pharmacology - Joe's archives at www.tvo.org/yourhealth/joeschwarcz.html describe the effects of an assortment of plants, from cabbage and cranberries to chamomile and kava kava. Drug design sometimes begins from observations of such effects.

www.newdrugdesign.com/Rachel_Theory_02.html - The biochemistry of drugs, from the RACHEL Software Innovations site. Introduction to pharmacophores and the way drugs bind to biological molecules.

www.itakura.toyo.ac.jp/~chiekon/papers/wflp01.pdf - Inducing Differences among Documents using Aleph with Construction of Background Knowledge. Paper by Chieko Nakabasami describing preliminary research on Aleph and case-based reasoning for learning to extract information about the content of scientific papers.

www.cs.wisc.edu/~dpage/ - David Page. Contains links to his work on data mining and machine learning (especially ILP and other techniques for learning about complicated structures) applied to bioinformatics, chemoinformatics, and health sciences.

www.doc.ic.ac.uk/~shm/cbl.html - Imperial College Computational Bioinformatics Laboratory. Has an overview for non-experts of ILP-based knowledge discovery for pharmaceutical research, including diagrams and a sample hypothesis for protein folding. Also describes Progol's impressive performance in the National Toxicology Program's carcinogenicity prediction competition.

jmlr.csail.mit.edu/papers/volume4/page03a/page03a.pdf - ILP: A Short Look Back and a Longer Look Forward. Paper by Page and Srinivasan from Journal of Machine Learning Research, Volume 4, 2003. Looks at the most challenging applications of ILP - bioinformatics and natural language - and how its remaining shortcomings might be overcome.

web.comlab.ox.ac.uk/oucl/research/areas/machlearn/applications.html - Oxford University Computing Lab pages on applications of ILP.

www-users.cs.york.ac.uk/~jc/teaching/GSLT/ilp-goth.html - James Cussens's Day Course on inductive logic programming and natural-language processing for the Graduate School in Language Technology, Göteborg, with practicals on various examples with Aleph. Solutions to exercises can be found at www.sics.se/~fredriko/courses/ml03/. Other ILP links are available from James's home page at www-users.cs.york.ac.uk/~jc/.

web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/basic.html - Basic Ideas Relevant to ILP by Ashwin Srinivasan.

www.sciencemag.org/feature/data/compsci/machine_learning.shtml - David Aha's very comprehensive Machine Learning Resources page. Includes ILP resource sites, datasets for experimenting with machine learning, and links for genetic algorithms and neural nets.

robotics.stanford.edu/people/nilsson/mlbook.html - Online draft of Nils Nilsson's book on Machine Learning. Written in 1996, the notes are becoming outdated - there have been many advances - but are still a good introduction to the important ideas.

www.aaai.org/AITopics/html/machine.html - AAAI machine learning page.

www.aaai.org/Library/Magazine/Vol04/04-03/Papers/AIMag04-03-007.pdf - Machine Learning: A Historical and Methodological Analysis by Jaime Carbonell, Ryszard Michalski and Tom Mitchell. For historical perspective, this shows where machine learning was 20 years ago.

www.rci.rutgers.edu/~cfs/472_html/Learn/LearningToc.html - An Early Machine Learning approach to Concept Learning. Charles Schmidt's explanation of WInston's arch-learning program.

[ Jocelyn Ireson-Paine's Home Page ]