[ Jocelyn Ireson-Paine's Main Home Page ]

Understanding object-oriented programming by using CafeOBJ


Some time ago, I was thinking about implementing a program-transformation system to operate on Java. There are many high-level transformations that would be useful and that existing tools completely ignore. For example, wrapping one class inside another; composing two classes. But before we can implement these, we need to understand what the classes and their methods denote. This little report contains some thoughts on specifying this in CafeOBJ.

Start with this class (all examples are in Java):
  class Pair { int x; int y; }
Having no methods, this very simple example is similar to a record; the class is a product sort, so it seems reasonable to specify it thus:
  module! Pair
    [ Pair ]
    ops (_.x) (_.y) : Pair -> Int
    op new Pair() : -> Pair
where I'm mapping the field names (aka instance variables)onto operations. In Java, given an instance p of Pair, you'd write p.x or p.y to get its fields. You'd create the instance by writing new Pair(): if a class doesn't contain a constructor, Java supplies a default no-argument one which doesn't initialise any of the fields.

We could extend Int with an 'undefined' value and add equations to say that (new Pair()).x and .y are undefined, but that's not what I'm interested in now.

It's useful to provide the class with a constructor that makes a Pair from a pair of integers. And it's conventional not to access the fields directly (at least, not from outside the class), but to provide access methods. With these, it becomes

  class Pair
    int x; int y;

    Pair( int x, int y )
    { this.x = x; this.y = y; }    // Constructor.

    int getX() { return this.x; }  // Get the x field.

    int getY() { return this.y; }  // Get the y field.
and we can augment the specification by making it
  module! Pair
    [ Pair ]
    ops (_.getX()) (_.getY()) : Pair -> Int
    op new Pair(_,_) : Int Int -> Pair
    eq (new Pair(X:Int,Y:Int)).getX() = X .
    eq (new Pair(X:Int,Y:Int)).getY() = Y .
Since users of a class almost always see it through its access methods rather than its fields, I've omitted the .x and .y operations.

That's all obvious. In general, a class C which is intended to be instantiated will be a glorified record, containing n fields, where the field named Fi is of sort Si; n access methods, named getFi; and a constructor taking n arguments. So this becomes a module with one principal sort (the class), and n projection operations;

  module C! principal-sort C
    pr( S1 + S2 + .... + Sn )
    [ C ]
    ... op (_.getFi()) : C -> Si ...
    op new C(_,_,....,_) : S1 S2 ... Sn -> C
    ... and the selector equations ...
The module needs to import one module for each of the different field sorts (I've written the sum of all of them, since repeated modules will be shared), and if these also specify classes, they'll have one principal sort just as this does.

So far, this is nice. It looks as though classes map neatly onto theories. Can we continue doing so as we add more features?

One feature to try is methods that don't just select fields, but perform some computation on them. If we augment class Pair above with

    int doubleX() { return this.x * 2 }
    int sum() { return this.x + this.y; }
they become operations on the principal sort
    op doubleX : Pair -> Int
    op sum : Pair -> Int
with the obvious equations. So there's still a nice mapping from classes to theories.

But this breaks down, as I'll show. Suppose we want to combine two Pairs:

    op add : Pair Pair -> Pair
    eq add( Pair(X:Int,Y:Int), Pair(X':Int,Y':Int) ) =
         new Pair( X + X', Y + Y' ) .
To define a method, you have to define it inside a class; when called, it's textually attached to the instance on which it operates - a kind of implicit argument. So to implement the add method above in Java, the nearest I can get is
    class Pair
      ... as above ...
      Pair add( Pair arg )
      { return new Pair( this.x + arg.x, this.y + arg.y ); }
which would be called like this
    Pair p1 = new Pair( 1, 2 );
    Pair p2 = new Pair( 3, 4 );
    System.out.println( p1.add(p2) );

So to implement a perfectly reasonable and simple specification, you are forced to introduce a nasty asymmetry. I believe there are some OOP languages, such as CLOS (Lisp), which allow what's known as multiple dispatch, where each method can in effect be attached to more than one class. But this isn't very common.

In single-dispatch languages like Java, you need to use static methods to overcome this asymmetry. A static method is one that can be called without instantiating the class inside which it was defined:

  class Maths
    static int max( int a, int b )
    { return ( a > b ? a : b ); }
This can be called thus:
  System.out.println( Maths.max(1,2) );
and corresponds to a module which (since a class can't define globally acessible classes) introduces only new operations, but not sorts:
  module! Maths
    pr( INT )
    op max: Int Int -> Int

To use this trick for the add operation on pairs, you'd write

  class Pairs
    static Pair add( Pair a, Pair b )
    { return new Pair( a.getX()+b.getX(), a.getY()+b.getY() ); }

But if you're going to do that, why not do it for the no-argument methods like doubleX() as well?

    static int doubleX( Pair a )
    { return a.getX() * 2; }

So I conclude that there is no neat mapping between classes and theories. A theory which defines only one sort, with no operations having more than one occurrence of it in their arity, may map reasonable well to a class implementing that sort, its methods implementing the operations. But operations whose arity contains more than one occurrence of such a sort are best defined, to avoid asymmetrical notation in their calls, as static methods in a different "library" class. Attribute methods whose arities contain only one occurrence of the sort could be defined in the class implementing the sort, or in the library class.

I tend to define the methods which implement the projection operations in the class implementing the sort; to put input and output methods and very simple transformations (such as getting the size of a field) in this class, and to put everything else in a library class whose name is the plural of the principal sort. This is definitely not the way most people do OOP, and probably runs counter to all the recipe books on design through patterns, but it keeps things clean.

Why does OOP (if you ignore multiple dispatch) insist on attaching methods to classes? Probably because the original designers thought in terms of objects, each object having procedures which either implemented state transitions on it, or returned one of its attribute values. For both of those, it is natural to attach the procedures to the object and make it this privileged argument whose state is being updated or interrogated. But the approach fails elsewhere. Unfortunately, designers overgeneralised, trying to treat everything as an object (i.e. class).

One reason for this is information hiding. Objects can hide information about their fields; and modules (now I'm using the word in the conventional programming sense, not the CafeOBJ sense) need to hide information about unexported data. So OOP researchers decided you could implement modules as classes. You can see that already in the Simula-67 books. This misguided approach got a boost because some modules need to preserve hidden state. For example a module that exports a random number generator has to hide the seed in some internal variable that the caller can't access directly. So it would seem a good idea to implement the module as a class, the internal variable as an instance variable, and the exported functions as methods. This appeared particularly elegant because it enabled you to have more than one copy of the module, each with its own seed, and to create these copies dynamically. Pascal-Plus made a big thing of this. (And let's be fair: in CafeOBJ, you'd have to pass the seed and any other state around in an extra argument, which gets verbose.)

If your exported functions don't depend on any hidden state, you could define them as static methods, as I did earlier.

So perhaps that's why the OOP people made classes usurp modules. In some languages, Java being one, they also made them usurp all data types (with the exception, in Java, of primitive machine types like integer). I'm not sure why this was. Smalltalk did it, I think, and that was in 1979 or earlier. Perhaps partly laziness, not wanting two different storage and parameter-passing mechanisms. The hardware encourages it anyway, because it's cheaper to copy pointers to aggregate values than it is to copy the values, and once you've done that, you immediately allow multiple copies of aggregates that have the same value, but are nevertheless not identical and can be updated in place.

Of course, this discussion neglects lots of other features of OOP. I've ignored methods which update the fields of their class, and what happens if youre semantics want to regard different instances containing identical fields as different objects.

28th June 2000.

[ Jocelyn Ireson-Paine's Main Home Page ]