Why not?: Why You Can Throw Away "If", and Why You Shouldn't

Thursday, August 14, 2008

Why You Can Throw Away "If", and Why You Shouldn't

Introduction

Most reasonably experienced object-oriented programmers have probably stumbled upon the same realization; namely, that it's possible to replace if statements with polymorphism. Polymorphism is simply a way to delay a decision until runtime. The if statement does the same thing. In fact, procedural programmers need to resort to things like if and switch statements because they have no other tool. Functional programmers, on the other hand, simply toss functions around willy nilly.

This realization can be powerful. It can also really hurt a code base (I know - I've smashed my share of algorithmic china with this hammer). I recently ran into a place on a project where it was a great idea, and I thought I would share why I thought it worked so well.

Current Implementation

Suppose you have 2 methods:

public void storeNewItem() {
    Item item = new Item();
    item.name = request["name"];
    item.description = request["description"];
    item.quantity = request["quantity"];
    item.value = someComplexCalculation();
    item.totalValue = item.quantity * item.value;
    // calculate and store some more fields here
    items.addNewItem(item);
}

public void storeExistingItem() {
    Item item = items.get(request["itemId"]);
    item.name = request["name"];
    item.description = request["description"];
    item.quantity = request["quantity"];
    item.value = someComplexCalculation();
    item.totalValue = item.quantity * item.value;
    // calculate and store some more fields here
    item.update();
}

These two functions should look pretty similar. In fact, they are nearly identical. Both acquire an item, populate it with data, and then store it. The only difference is the way that the item is acquired and the way that the item is stored.

First Attempt

I wanted to merge these methods, and this was my first attempt.

public void storeItem() {
    Item item;
    if (request["itemId"] == null) {
        item = new Item();
    } else {
        item = items.get(request["itemId"]);
    }

    item.name = request["name"];
    item.description = request["description"];
    item.quantity = request["quantity"];
    item.value = someComplexCalculation();
    item.totalValue = item.quantity * item.value;
    // calculate and store some more fields here

    if (request["itemId"] == null) {
        items.addNewItem(item);
    } else {
        item.update();
    }
}

This works, but is obvious crap. I found myself saying "I wish Item were able to handle those details by itself".

Second Attempt

Well, I wasn't brave enough to change Item, so I instead wrapped it.

public void storeItem() {
    Persister persister;
    if (request["itemId"] == null) {
        persister = new NewItemPersister();
    } else {
        persister = new ExistingItemPersister(request["itemId"]);
    }

    Item item = persister.getItem();
    item.name = request["name"];
    item.description = request["description"];
    item.quantity = request["quantity"];
    item.value = someComplexCalculation();
    item.totalValue = item.quantity * item.value;
    // calculate and store some more fields here
    persister.persist();
}

interface Persister {
    Item getItem();
    void persist();
}

class NewItemPersister implements Persister {
    private Item item = new Item();
    
    public Item getItem() { return item; }
    
    public void persist() { items.addNewItem(item); }
}

class ExistingItemPersister implements Persister {
    private Item item;
    
    public ExistingItemPersister(String itemId) {
        item = items.get(request["itemId"]);
    }
    
    public Item getItem() { return item; }
    
    public void persist() { item.update(); }
}

We still have an ugly if at the top of the function, and we have certainly ballooned the code. I still think that this is better than what we started with.

There is less duplication, which will make maintenance here much easier.
The Persister interface could be made into a generic class, and the implementations could be re-used all over the system. Some reflection here could really simplify your life.
A good web framework would allow you to remove that pesky initial if statement. In a less good framework, you could hide this behind some sort of object that knows how to generate a persister from an itemId (or null).

The practical upshot is that these changes should make it easier to apply metaprogramming techniques to this chunk of code. The only code that can't really be made declarative is some of the code which assigns values to fields.

There is one thing that bothers me, though. We have made the Persister implementors responsible for the lifetime of the Item. That's not at all clear from the interface, but it is obvious from the use. The tell-tale sign is that we have a getItem() method. Getters that expose their class' internals like this are evil, and if you don't believe me, you're just plain wrong. I won't try to justify that statement in this post, but trust me.

Third Attempt

To solve this, we could change the interface yet again (and I will switch to Javascript, because Java doesn't have any convenient lambda syntax).

function storeItem() {
    if (request["itemId"] == null) {
        var persister = newItemPersister;
    } else {
        var persister = new ExistingItemPersister(request["itemId"]);
    }
    
    persister.update(function(item) {
        item.name = request["name"];
        item.description = request["description"];
        item.quantity = request["quantity"];
        item.value = someComplexCalculation();
        item.totalValue = item.quantity * item.value;
        // calculate and store some more fields here
    });
}

var newItemPersister {
    update:function(f) {
        var item = new Item();
        f(item);
        items.addNewItem(item);
    }
}

function ExistingItemPersister(itemId) {
    this.itemId = itemId;
}

ExistingItemPersister.prototype.update = function(f) {
    var item = items.get(request["itemId"]);
    f(item);
    item.update();
}

Now, the item's lifetime is only as long as a call to update() is on the stack. This is a common idiom in Ruby, as well.

Conclusion

In the end, I wasn't completely happy with any of these solutions. I think that things are better than they were before. There are also a number of other permutations that will get it marginally closer to ideal. I think that the real solution is to update Item so that you can create a new item and save an existing item with a single method. After that, the code to choose whether to create a new object or fetch an existing object should be minimal and extractable.

I did learn a rule of thumb for deciding when to replace an if statement with polymorphism. If you find yourself saying "I don't want to deal with this, I wish it were handled for me," there's a good chance that you could benefit from some polymorphism. Also, if you find yourself checking the same condition multiple times in a function (as we had in the original implementation), you might want to consider whether polymorphism will help you.

9 comments:

Anonymous said...: Uhm.. Why not just:

public void storeItem()
{
item = items.get(request["_itemid"]) || items.addNewItem(new Item);

item.name = request["name"];
item.description = request["description"];
item.quantity = request["quantity"];
item.value = someComplexCalculation();
item.totalValue = item.quantity * item.value;
// calculate and store some more fields here

item.update();
}; 8/15/2008 4:27 AM
Dan said...: Let me start by saying that your approach is certainly simpler. I like that. The primary difference is that you are immediately storing a new item in the database. After the first line, there will be an item in the database for sure, so everything after that line can be written from the point of view of updating an item, rather than storing a new item.

I see two potential problems. The first is that the first line wouldn't compile in Java. You can use || with arbitrary objects in Javascript or Ruby, but Java would complain because it's not operating on booleans. (From the JLS: "Each operand of || must be of type boolean, or a compile-time error occurs. The type of a conditional-or expression is always boolean."). This is not a big deal, since you can replace it with something that will work in Java. It's just not going to be quite so concise.

The other problem is that this code assumes that you can add empty items to the 'items' object. Something that may not have been clear in my original example is that items.addNewItem and item.update actually do eventually get into database persistence code. If you database has NOT NULL constraints, or if for some other reason an empty object is illegal in your schema, this code won't work at runtime.

In my case, I was taking some existing code and trying to change its structure while maintaining its behavior. I don't know what would happen if I tried to save an empty object, and that wasn't a can or worms that I wanted to open.

Finally, there's something that I like about making the code behave in a simple way, even if it may make the code itself more complex. Suppose this were the real world, and instead of populating an item, I were filling out a tax form. In the real world, it would not make sense for me to submit an empty form, then ask for the form back so that I can fill it out. I would fill it out before submitting it the first time. This is purely an aesthetic choice, and in this case it's a pretty weak argument for making the code overly complicated. However, when code behaves in a simple, intuitive way, it will be easier for new developers (or, in fact, your future self) to work with the code base.; 8/15/2008 9:20 AM
Anonymous said...: Haha yeah that was what I was thinking--except without the super-slick ||. I would have wrapped the items cache in a class, not just the "persisting" operation. Maybe this would have changed the lifetime of an item or whatever, but you don't explain the significance of that issue in your post. Does the change in item lifetime hurt something? Or is it likely to start hurting something in a way that will be hard to track down when it happens? If not, don't worry about it.; 8/15/2008 9:21 AM
Anonymous said...: Hm well okay you're dealing with database crap. Well then your code will be at least a little bit ugly. QED; 8/15/2008 9:25 AM
Anonymous said...: public void storeNewItem() {
Item item = new Item();
updateItemFields(item);
items.addNewItem(item);
}

public void storeExistingItem() {
Item item = items.get(request["itemId"]);
updateItemFields(item);
item.update();
}

void updateItemFields(Item item)
{
item.name = request["name"];
item.description = request["description"];
item.quantity = request["quantity"];
item.value = someComplexCalculation();
item.totalValue = item.quantity * item.value;
// calculate and store some more fields here
}; 8/15/2008 9:57 AM
Dan said...: I had been trying to unify both methods. I didn't want to force the JSP to decide which URL (and, thus, which method) to use. Instead, I wanted to push that decision down as far as possible. Given that, which of the following is better?

public void storeItem() {
    if (request["itemId"] == null) {
        Item item = new Item();
        updateItemFields(item);
        items.addNewItem(item);
    } else {
        Item item = items.get(request["itemId"]);
        updateItemFields(item);
        item.update();
    }
}

void updateItemFields(Item item) {
    item.name = request["name"];
    item.description = request["description"];
    item.quantity = request["quantity"];
    item.value = someComplexCalculation();
    item.totalValue = item.quantity * item.value;
    // calculate and store some more fields here
}

or this

function storeItem() {
    //this first section could actually be replaced with custom parameter binding rules
    if (request["itemId"] == null) {
        var persister = newItemPersister;
    } else {
        var persister = new ExistingItemPersister(request["itemId"]);
    }

    persister.update(function(item) {
        item.name = request["name"];
        item.description = request["description"];
        item.quantity = request["quantity"];
        item.value = someComplexCalculation();
        item.totalValue = item.quantity * item.value;
        // calculate and store some more fields here
    });
}

I find the second to be easier to understand. Everybody has their preference, but neither of these is absolutely better than the other.; 8/16/2008 4:24 PM
Anonymous said...: Do you have something like an update.jsp and a createnew.jsp, or do both operations share one jsp?

I find that trying to keep your update and create new views/controllers the same tends to cause all sorts of problems since there will inevitably be display and logic differences between creating and updating, and there is little reason they can't just be separated out. This is a huge advantage actually, since they have domain knowledge that the rest of the system would have to try to figure out, and this information can easily be passed down into the deeper layers of the system.

Your storeItem function may then look like:

function storeItem(persister) {
persister.update(function(item) {
item.name = request["name"];
item.description = request["description"];
item.quantity = request["quantity"];
item.value = someComplexCalculation();
item.totalValue = item.quantity * item.value;
// calculate and store some more fields here
});
}

Your various controllers just pass in an appropriate persister, a la:

createnew -
...
storeItem(new NewItemPersister());
...

update -
...
storeItem(new ExistingItemPersister());
...

This is more modular. For example, if particular items need to be persisted in several locations, you can aggregate the persister thusly:

storeItem(new MultiPersister(new DatabasePersister(), new FileSystemPersister(), new SecretOffsitePersonalInfomationPersister()));; 9/01/2008 2:55 PM
Dan said...: In my case, I had one JSP that was using an inline if statment to choose which URL it should use to post. In addition, it used further conditionals to figure out what form fields to render into the form (sometimes it would include various hidden form fields, sometimes it would not). In the end, I ended up getting rid of all the conditions except one around the hidden form field itemId. That way, the various pieces of data are always posted to the action, and the action uses the presence of the itemId field to differentiate between a create and an update.

I don't yet quite get REST, but I think that the RESTful approach would be to make the update URL different from the create URL. In particular, the item ID should be part of the update URL, whereas it can't be part of the create URL.

Maybe it's just me, but I see a lot of similarity between creating and updating. Both take mostly the same data, and do mostly the same things with that data. In most interfaces that I've seen, the user is presented with the same fields. That's not universally true, but it's pretty close. This is why a lot of Rails apps use a shared partial.

But you're right, it probably makes sense to keep them separate, and re-use code from the inside-out (i.e. by calling shared methods, rather than using a shared wrapper). No matter what, there is a lot less duplication now than there was before. That's the thing that really matters.; 9/02/2008 7:26 PM
Anonymous said...: Every create new and update view I've ever seen has had a few differences, even if it was as trivial as a title/header change. Reuse is a good thing of course if the forms really are displaying the exact same data, and I don't mean to discourage that at all, except when I see a developer using a lot of conditionals to handle the differences (I'd prefer two duplicated pages to one unreadable page cluttered with if statements -- really it's just a trade off between two different forms of duplication, but the duplicated ifs are harder to read, modify, and change independently).

My preference would be to reduce the page to a single if (or zero ifs using something like a rails partial), as you did, but to have that if surround the form url so that create went to a create link and update went to an update link that takes an id. That way, the decision can be made once, on page rendering time, that we are doing an update, not twice (when detecting the presence or absence of an id), or multiple times (as it was before your refactoring). This may avoid problems as well; for example, what happens if a bug causes the id to not appear, or if the id's field's name changed in the view but not the request handler? Would the code mistake an update for a create?

But it sounds like you already vastly improved the code, so my suggestions might not be a big win now compared to other needs that the codebase may have.; 9/04/2008 12:58 AM