Equality in Java - Pragmatic Software Engineering

We talked a bit about some of the quirks associated with object identity. Today I’d like to investigate some of the potential pitfalls associated with another form of equality.

In Java, Object.equals() is the mechanism used to compare two objects for equality. The default implementation, defined on Object, implements object identity but the method is defined as overrideable and so authors are free to implement alternative behaviour in their own classes.

Avoiding surprises is one of the first rules of software engineering so we should always start with what makes sense for any implementation of the equals() method. The mathematical properties which apply to equality are:

Transitivity – if a.equals(b) and b.equals(c) then a.equals(c)
Commutativity – if a.equals(b) then b.equals(a)

Java has a specific further convention about equals() which its worth remembering, known as the EqualsHashcode contract

EqualsHashcode Contract – if a.equals(b) then a.hashcode() == b.hashcode()

With that in mind, let’s go through some of the guidelines which you should follow in custom implementations of equals().

Implement equals() in terms of immutable instance fields

The EqualsHashcode contract exists to allow objects to be stored and retrieved quickly in hash-based collections – HashSet and HashMap. When an object is inserted into a HashMap its key’s hashcode is calculated and used to assign the entry to a hashbucket. Keys which have the same hashcode will end up in the same bucket. For retrieval then the HashMap implementation can go straight to the right bucket using the key’s hashcode, and thus reduces the explicit calls to Object.equals() which are needed to locate the correct entry. If a mutable value is used for calculating the key’s hashcode then the hashcode may change between calls to ‘put’ and ‘get’ – essentially items can become abandoned in the collection. Similarly, if the equals() method can change over time then it may appear that a key value is no longer ‘in’ the collection even though it is.

This might seem like quite a specific problem with how the JDK works but it’s a demonstration of a more general issue around the principle of least surprise. If the equality of two instances can change over time then there’s no way to make any decisions based on comparing those instances. Essentially the method becomes unusable if it can return different values over the lifetime of the object.

Consider the following code..

    Object o1 = someotherRef;
    Object o2 = someref;
    if (o1.equals(o2)) {
         doSomethingWith(o1);
     }

With this we make a check on o1 versus o2 for equality. It seems reasonable that if o1 and o2 are equal then the ‘doSomethingWith(…)’ line could just as easily have o1 or o2 passed into it. But if o1 and o2 are able to vary in whether they’re equal that may no longer be true in a multi-threaded environment. And, these days, all environments are multi-threaded.

Declare implementations as `final`

This may seem strange given that the JDK implementation is overrideable. But it really follows from the mathematical properties of equals that any objects which are being compared for equality will use the same mechanism for evaluating equality or, at least, a compatible mechanism. The implementation of equals() in Object is identity-based so it’s always going to obey the laws of transitivity and commutativity – if ‘a’ and ‘b’ refer to idential objects then they’ll always be using exactly the same equals method.

Things become a bit more complicated when you have an inheritance hierarchy. Consider the following for example:

class A {
     protected final int a1;
     public boolean equals(Object O) {
         if (!(o instanceof A)) {
             return false;
         }
         final A other = (A)o;
         return (this.a1 == other.a1);
      }
}

class B extends A {
     private final int b1;

     public boolean equals(Object O) {
         if (!(o instanceof A)) {
             return false;
         }
         final B other = (B)o;
         return super.equals(o) && this.b1 == other.b1;
      }
}

Both A and B implement equals(), A using just fields a1 and a2 but B ensuring that field b1 is equal in both objects as well. If I have an instance ‘a’ of type ‘A’ and ‘b’ of type ‘B’, then it’s entirely possible that a.equals(b) is true (if their a1 and a2 fields match), but b.equals(a) is false (a has no b1 field and is not of type B).

By making, instead, the equals method on A final all ‘a’ and ‘b’ objects are ‘compatible’ in terms of their equality and the commutativity relationship is saved. The caveat here is that you could still have objects of type B which are then equal to objects of type A – that’s quite an odd situation and should be investigated. Typically, it makes sense to think that two objects of two different types aren’t equal. Where it can make sense is when the top of the hierarchy (‘A’ in this case) is an entity type, which are typically equal based on some kind of identifier and where sub-classes typically represent different implementation concerns (e.g. in-memory rather than in a DB) rather than different types of business entity.

Only instances of the same type should be equal

Consider this code

class A {
     protected final int a1;
     protected final int a2;
}

class B extends A {
     public final boolean equals(Object O) {
         if (!(o instanceof A)) {
             return false;
         }
         final A other = (A)o;
         return (this.a1 == other.a1) && (this.a2 == other.a2);
      }
}

class C extends A {
     public final boolean equals(Object O) {
         if (!(o instanceof A)) {
             return false;
         }
         final A other = (A)o;
         return (this.a1 == other.a1);
      }
}

Now we can have a situation where b.equals(c) may be true but not c.equals(b). The problem really is that, here, objects of type C should *never* be considered equal to objects of type B. That just shouldn’t be possible given that their equals methods are incompatible. It’s possible you *could* implement ‘equals’ methods on C and B which were compatible but it would be impossible (at least incredibly difficult) to ensure that they remain compatible, or even that new implementations remained so.

Value types should implement equals() on concrete rather than abstract types

This is more of a guideline than a rule per se. Generally, when you’ve got some value types modelling some domain concept they’re concrete types. For these types you shouldn’t find a need to have an equals method on any abstract class. When a type becomes more complex, perhaps one modelling a domain entity, then you need to take a look at what you’re trying to achieve with the ‘equals()’ implementation. It can be tempting to put ‘equals’ in an abstract base class, implemented in terms of some kind of identifier, to allow different implementations (e.g. one read from the DB and one constructed in-memory for some reason) to be compared for equality. It’s usually not as useful as it sounds and the tradeoff is some added complexity which is arguably unexpected in an equals method. When such functionality *is* valuable, it may be better modeled as some form of comparison rather than real equality.

Summary

It seems like a really simple little method but getting equality right is actually quite tricky. They key things to remember are that our intuitive notion of equality is founded on its mathematical properties of associativity and commutativity. Ignore them at your peril. Also, the principle of least surprise suggests we should avoid trying to do anything ‘clever’ with equality.