I'm Jocelyn. I'm a software consultant, and I've worked for the University. I used to teach artificial intelligence to undergraduates at Experimental Psychology, and wrote numerous Prolog and Pop-11 programs to help students with AI. One of my programs implemented virtual creatures which could understand simple English commands and plan how to obey them. I helped Joseph Goguen and his students in the Computing Lab, writing Lisp to implement a programming language called Eqlog. And, since 2008, I've worked with Andrew Moore of the Oxford Pain Research Group, using R and Weka to analyse trials of pain-relief drugs.
In the list below, I've written about some of the work that most interests me and that I have the most experience in.
If you're used to "imperative" languages such as Fortran, Java and R, Prolog is unusual. It's a "logic programming" language, meaning that you program by writing a database of logical facts. To run a program, you give Prolog a "goal": one or more statements that you'd like it to try and prove from those facts. A standard example is for the facts to be "all humans are mortal" and "Socrates is a human", and for the goal to be "Prove Socrates is mortal". Prolog will search its database for facts that chain together to prove the goal.
Why use Prolog? It's useful for coding heterogeneous facts about things, in a way that's fairly easy for non-programmers to read. For example, molecules. You can represent the positions of atoms, which element they are, what they're connected to, their bonding state, what functional group they're part of. The things don't need to be scientific: one English teacher-turned-computer scientist used Prolog to represent events in William Faulkner's story "A Rose for Emily" and sort them into chronological order, making this confusingly flashback-laden story easier for students to understand. I have considered using Prolog to represent societies as treated by different sociological theories. For example, describing Lord of the Flies in Prolog in terms of functionalism, conflict theory, and symbolic interactionism. That ought to explain the differences and similarities between these much more clearly than the literature usually seems to. If you want to see more of Prolog, I blogged a tutorial for the online computer magazine Dr Dobbs at "The Prolog Lightbulb Joke".
As mentioned above, I wrote numerous Prolog programs when teaching AI. One of these implemented virtual creatures which could understand simple English commands and plan how to obey them. This included a natural-language parser and semantic analyser, and a planner, and used their outputs to demonstrate concepts such as different kinds of representation, predicates, means-end planning, and the symbol-grounding problem. There's a writeup for students at "AI and PopBeast".
Whereas Prolog is a logic-programming language, Pyret is "functional". You think of your program as a big function that returns the result you want, and refine it by breaking this up into smaller functions. Much older than Pyret, and also a functional language (though with many other bits and pieces too), is Pop-11. I have used this in teaching students about the semantics of programming languages. One concept that some find confusing at first is "continuations", a way of representing the semantics of labels and jumps as functions. I've used Pop-11 to implement these as actual tangible functions that students can manipulate. I blogged about this for Dr Dobbs, at "Poplog, continuations, Eliza, AI education, and Prolog".
I used to work with the Institute for Fiscal Studies, an independent research group which analyses government financial policy and educates the public about it. We did several projects to put economic models onto the Web. The most ambitious was "Be Your Own Chancellor". Users could act as Chancellor, setting taxes and benefits via an input form. BYOC, running on our servers, would send back graphs and tables forecasting how these changes would affect macroeconomic variables such as unemployment and inflation, and a selection of people representing the microeconomy.
The IFS works with the BBC to help analyse each year's Budget, and when we launched BYOC, they asked us to make a special version to help people understand the Budget. We continued doing this for about eight years, and even had accounts on the BBC's Web servers. Aided by a Nuffield grant, we also extended BYOC to make "Virtual Economy", a big system for teaching economics, complete with online economics lessons for beginners. Since then, I and colleagues have worked on many other simulations, including ones of the Russian and Flemish economies. There's more about all this on the Virtual Worlds Research site.
I haven't only built economic-modelling Web sites. For example, Traveller is a simple buying-and-selling AI game which I've made into an interactive Web page. As I'll mention later, I've also written an interactive page for teaching a branch of maths called category theory.
Spreadsheets are dangerous. Formulae are hard to read because you have to use A1-style notation instead of meaningful names; it's easy to mistype a row or column number, or to slip and put data in the wrong cell; and Excel gives no way to build spreadsheets one piece at a time. You have to build them all at once, and everything leaks.
This is something I've tried to solve. I said I'd helped Joseph Goguen implement a language called Eqlog. This is one member of the family of "OBJ algebraic specification languages" Goguen created, where you program by writing equations that specify the behaviour of whatever you're modelling. All OBJ languages have elegant module systems, which enable you to build big complicated programs from parts that are small enough to be understood in their entirety, and that can be tested, debugged and documented one at a time. I used these ideas to develop module systems for building spreadsheets from parts that can likewise be tested, debugged and documented one at a time. At "Spreadsheet Components, Google Spreadsheets, and Code Reuse", there's a demo I blogged for Dr Dobbs. This shows a form-based interface for inserting modules into existing spreadsheets much as one inserts charts. And the mathematical physicist John Baez wrote an article about my work, followed by comments from readers, at "CATEGORY THEORY TO THE RESCUE!".
My module system goes along with "Excelsior": a language and compiler that let you build spreadsheets by specifying them as layout-independent equations over arrays. Such programs are much easier to read and get an overview of than is raw Excel. Unlike in Excel, my language lets you use meaningful names.
I've also experimented with two ideas for improving how spreadsheets are documented: coding them as semantic wikis, and using "literate programming". Literate programming is an idea from computer scientist Donald Knuth, where you write a program as if it were an expository essay. The main part of your program, which you should write first, is explanatory text. Code is thought of as an insert, like equations in a mathematical paper. For an example, see "Kaprekar's Constant in Excel and Excelsior". There's a bigger, and hopefully amusing, program at "'Earth falls toward a black hole and everyone dies'". This is a science-fiction plot generator coded in Excelsior, which manages to do recursion entirely in Excel, without using Visual Basic. To run it, just follow the instructions in the article. There's more on my experiments, including the semantic-wiki stuff, in "It Ain't What You View, But The Way That You View It: documenting spreadsheets with Excelsior, semantic wikis, and literate programming".
Because Excelsior is easier to read than Excel, I've also implemented a decompiler which translates existing spreadsheets into this language. This is useful when struggling to make sense of "legacy" spreadsheets. There's a short demonstration at "How to Reveal Implicit Structure in Spreadsheets".
Category theory is a branch of mathematics not much known outside maths departments, but important in — amongst other things — cosmology, quantum physics, and topology. It's also used in computing, and is what my spreadsheet module system is based on, hence the title of John Baez's article. I wanted to encourage computer scientists to learn it, and coded an interactive Web page which demonstrates common category-theoretic operations such as product, equaliser, limit and colimit. Via an HTML form, students input the sets and functions which the operations act on. A Prolog program running on my hosting company's server then calculates the result and sends it back as a "diagram". This is a network whose nodes are sets, and whose edges represent functions between them. To try the program, visit "Category Theory Demonstrations". If you don't know any category theory and just want to see it run, you can let it chose its own inputs: merely leave the default text
in the input fields. My program was based on an earlier
one that compiles descriptions of (very) simple
systems of interacting objects into category-theoretic
terms, then executes the result: that
can be seen at
"System Limit Programming".
Internally, my program represents diagrams as acyclic graphs. To draw them, it runs the open-source graph visualisation program Graphviz, generating a GIF file from a Graphviz representation of the graph. I also made Graphviz emit the diagrams as VRML, meaning that people equipped with a suitable browser can pick them up and turn them around. Perhaps that will help make category theory seem more tangible. There's more about how my program works, and a discussion about improvements, at the "Graphical Category Theory Demonstrations" thread guest-posted for me by Urs Schreiber on the The n-Category Café blog. The thread includes a really generous comment from John Baez:
I applaud your Java applets, Jocelyn! This is the sort of thing where it's very easy for mathematician so dream big — but only a few brave people dare to actually do something.
I've collaborated on compilers for Fortran, Basic, and Prolog; I've written programs that translated expert-system languages into Prolog; and as mentioned above, I've developed a compiler that translates programs into Excel. And I've used that experience in writing compilers for "little languages". What does that mean? A "little language" is one that's not a complete programming language, but a language for describing some particular kind of data, or for giving a small set of specialised commands. The languages that some plotting packages use to specify graphs are little languages; I think regular expressions and HTML would also count.
One of my customers, who I've had for over fifteen years, runs a market-research company. I built him a Web site over which he could run all his programs for analysing answers to questionnaires, using HTML forms to specify data filenames and other parameters. He also wanted to put questionnaires online — and to design them online. I started by writing him a visual questionnaire-builder, using forms to specify questions, and drag-and-drop to put them in the right place in the questionnaire. But we both found this cumbersome, and designed a text-based language for describing questionnaires instead. So I wrote a compiler that translated this language to Java data structures, using the JavaCC parser-generator. There's an article about this, which I blogged for Dr Dobbs, at "An Online Budget Questionnaire, JavaCC, and the Three Ways of Putting Together".
analysing drug trials
Between 2008 and 2015, I collaborated part-time with Andrew Moore of the Oxford Pain Research Group. He works in evidence-based medicine: finding simple ways to help doctors understand when a drug is likely to benefit a patient, and, if there's a choice of drug or dose, which one will do so best. One easy-to-understand measure of a drug's efficacy is "number needed to treat" (NNT): how many patients do you need to treat in order for one of them to benefit? The smaller the NNT, the better the drug. I used Java and R, to calculate and display NNT's from spreadsheet data which Andrew obtained from drug companies. An early account of our work, co-blogged by me and our colleague Sebastian Straube for Dr Dobbs, is "Trials and tribulations: measuring drug efficacy in clinical trials, plotting graphs in Java with gnuplot, and reading Excel with JExcelAPI".
I used Java at first, but switched to R because
it was less verbose and much more versatile. The data given us by
the drug companies was typically between 500 and 10,000 rows in
a spreadsheet. Each patient was represented by one row for each time
they were given a dose of drug. Patients would sometimes miss a dose
or be given extra doses; they might not continue to the
end of their treatment; sometimes we had to piece together data from
several sheets; sometimes the spreadsheets would contain repeated rows,
patient IDs which didn't match up between sheets, unexpected codes
in numeric fields, and other errors. I had to
clean these up, then normalise the data into a single row
per patient, with successive columns representing successive doses.
Here, I was greatly helped by
data-reshaping functions such as
These and other functions
for manipulating aggregates also
helped with calculating measures of efficacy from our normalised
data. Not just NNTs, but also — for example — cluster
analyses, to group similar patients together, and counts
of how many patients had their pain reduced below a given threshold.
And finally, R also has a variety of plotting packages, which
we used to generate violin plots, histograms, and Venn diagrams,
as well as straightforward 2D and 3D scattergrams and line plots.
We also experimented with Weka, the machine-learning package, so as well as running various machine-learning algorithms on that, used R to reshape the data into a suitable input format.
One final thing that I'm interested in, because of my cartooning, is how line drawings represent their meaning. I gave a short talk on this last year in Athens for Workshop Thales: "Semantics of line drawings". This ties up with recent work at Princeton and elsewhere on getting computers to draw, for example "Where Do People Draw Lines". The potential for helping artists learn to draw is obvious, and something I'd like to explore. One interesting idea is to ask artists to state the purpose of each line in a selection of drawings, then use machine learning to look for commonalities. By purpose, I don't just mean what a line says about the geometry of an object, but also its emotional content.