Why not?: August 2008

Sunday, August 31, 2008

Terseness for Terseness' Sake

I've been reading up on Scala, since it seems like it may be a better Java than Java itself. As I was reading through the pre-release PDF of Programming in Scala, I came across something goofy.

Scala, like (as I understand it) F#, tries to nestle itself comfortably between the functional and imperative camps. It has a syntax that supports both schools of though. So, as you might expect, some functions in Scala will behave nicely and will return a value without any side effects. Other functions will be executed solely for the side effects, and will return nothing (the Unit type in Scala). To further the Functional mindset, Scala does not require an explicit return statement at the end of a function. Instead, the last value in the function is used as the value of the function. Programming in Scala is quick to point out that, if you want, you can just as easily use explicit return statements (if that floats your boat).

The functional and imperative worlds collide in a shower of fireworks. From Programming in Scala:

One puzzler to watch out for is that whenever you leave off the equals sign before the body of a function, its result type will definitely be Unit. This is true no matter what the body contains, because the Scala compiler can convert any type to Unit. For example, if the last result of a method is a String, but the method’s result type is declared to be Unit, the String will be converted to Unit and its value lost.

The book then goes on to provide an example where a function's value is accidentally lost.

Now, I'm all for shortening my programs. The less I have to type, the better. This is, in fact, one of the big advantages Scala has over Java. But wait just a minute! I thought that our compilers were supposed to help us, not trip us up! Here's a situation where 2 different things (a function's return value and the function's return statement) are optional. If they are not specified, they are inferred. In that case, the only difference between retaining and losing your return value is a single character - a '='.

To get all concrete, here are a pair of Scala programs that do different things.

package org.balefrost.demo

object Sample {
  def foo {
    "bar"
  }
  
  def main(args : Array[String]) : Unit = {
    val baz = foo
    println(baz)
  }
}

=> ()

package org.balefrost.demo

object Sample {
  def foo = {
    "bar"
  }
  
  def main(args : Array[String]) : Unit = {
    val baz = foo
    println(baz)
  }
}

=> "bar"

I don't know. To me, that's goofy. Other people might find it completely reasonable. Of course, you can protect yourself with explicit types.

package org.balefrost.demo

object Sample {
  def foo {
    "bar"
  }
  
  def main(args : Array[String]) : Unit = {
    val baz:String = foo  //compiler error: can't assign Unit to String
    println(baz)
  }
}

Anyway, kudos to Programming in Scala for pointing out the potential cause of a hair-yankingly-frustrating bug. Now that I understand what's going on, I will probably be better able to handle it when it comes up in a real program.

Thursday, August 14, 2008

Should I Squash Underhanded Corporate Comments or Let Them Live?

At this point, my Memeo Autosync post has gotten a few comments that clearly originate from somebody who works for(or otherwise has a stake in) Memeo. On one hand, I really dislike this corporate intrusion in an otherwise pristine blog. They have masqueraded as a genuine user, which is misleading and underhanded. On the other hand, it appears that they have offered a discount on Memeo software.

What do other bloggers do with these situations? Do they squash comments that are subversive like this? Do they just allow them, realizing that blog readers are intelligent individuals and will notice the obvious deception? What do you think?

Why You Can Throw Away "If", and Why You Shouldn't

Introduction

Most reasonably experienced object-oriented programmers have probably stumbled upon the same realization; namely, that it's possible to replace if statements with polymorphism. Polymorphism is simply a way to delay a decision until runtime. The if statement does the same thing. In fact, procedural programmers need to resort to things like if and switch statements because they have no other tool. Functional programmers, on the other hand, simply toss functions around willy nilly.

This realization can be powerful. It can also really hurt a code base (I know - I've smashed my share of algorithmic china with this hammer). I recently ran into a place on a project where it was a great idea, and I thought I would share why I thought it worked so well.

Current Implementation

Suppose you have 2 methods:

public void storeNewItem() {
    Item item = new Item();
    item.name = request["name"];
    item.description = request["description"];
    item.quantity = request["quantity"];
    item.value = someComplexCalculation();
    item.totalValue = item.quantity * item.value;
    // calculate and store some more fields here
    items.addNewItem(item);
}

public void storeExistingItem() {
    Item item = items.get(request["itemId"]);
    item.name = request["name"];
    item.description = request["description"];
    item.quantity = request["quantity"];
    item.value = someComplexCalculation();
    item.totalValue = item.quantity * item.value;
    // calculate and store some more fields here
    item.update();
}

These two functions should look pretty similar. In fact, they are nearly identical. Both acquire an item, populate it with data, and then store it. The only difference is the way that the item is acquired and the way that the item is stored.

First Attempt

I wanted to merge these methods, and this was my first attempt.

public void storeItem() {
    Item item;
    if (request["itemId"] == null) {
        item = new Item();
    } else {
        item = items.get(request["itemId"]);
    }

    item.name = request["name"];
    item.description = request["description"];
    item.quantity = request["quantity"];
    item.value = someComplexCalculation();
    item.totalValue = item.quantity * item.value;
    // calculate and store some more fields here

    if (request["itemId"] == null) {
        items.addNewItem(item);
    } else {
        item.update();
    }
}

This works, but is obvious crap. I found myself saying "I wish Item were able to handle those details by itself".

Second Attempt

Well, I wasn't brave enough to change Item, so I instead wrapped it.

public void storeItem() {
    Persister persister;
    if (request["itemId"] == null) {
        persister = new NewItemPersister();
    } else {
        persister = new ExistingItemPersister(request["itemId"]);
    }

    Item item = persister.getItem();
    item.name = request["name"];
    item.description = request["description"];
    item.quantity = request["quantity"];
    item.value = someComplexCalculation();
    item.totalValue = item.quantity * item.value;
    // calculate and store some more fields here
    persister.persist();
}

interface Persister {
    Item getItem();
    void persist();
}

class NewItemPersister implements Persister {
    private Item item = new Item();
    
    public Item getItem() { return item; }
    
    public void persist() { items.addNewItem(item); }
}

class ExistingItemPersister implements Persister {
    private Item item;
    
    public ExistingItemPersister(String itemId) {
        item = items.get(request["itemId"]);
    }
    
    public Item getItem() { return item; }
    
    public void persist() { item.update(); }
}

We still have an ugly if at the top of the function, and we have certainly ballooned the code. I still think that this is better than what we started with.

There is less duplication, which will make maintenance here much easier.
The Persister interface could be made into a generic class, and the implementations could be re-used all over the system. Some reflection here could really simplify your life.
A good web framework would allow you to remove that pesky initial if statement. In a less good framework, you could hide this behind some sort of object that knows how to generate a persister from an itemId (or null).

The practical upshot is that these changes should make it easier to apply metaprogramming techniques to this chunk of code. The only code that can't really be made declarative is some of the code which assigns values to fields.

There is one thing that bothers me, though. We have made the Persister implementors responsible for the lifetime of the Item. That's not at all clear from the interface, but it is obvious from the use. The tell-tale sign is that we have a getItem() method. Getters that expose their class' internals like this are evil, and if you don't believe me, you're just plain wrong. I won't try to justify that statement in this post, but trust me.

Third Attempt

To solve this, we could change the interface yet again (and I will switch to Javascript, because Java doesn't have any convenient lambda syntax).

function storeItem() {
    if (request["itemId"] == null) {
        var persister = newItemPersister;
    } else {
        var persister = new ExistingItemPersister(request["itemId"]);
    }
    
    persister.update(function(item) {
        item.name = request["name"];
        item.description = request["description"];
        item.quantity = request["quantity"];
        item.value = someComplexCalculation();
        item.totalValue = item.quantity * item.value;
        // calculate and store some more fields here
    });
}

var newItemPersister {
    update:function(f) {
        var item = new Item();
        f(item);
        items.addNewItem(item);
    }
}

function ExistingItemPersister(itemId) {
    this.itemId = itemId;
}

ExistingItemPersister.prototype.update = function(f) {
    var item = items.get(request["itemId"]);
    f(item);
    item.update();
}

Now, the item's lifetime is only as long as a call to update() is on the stack. This is a common idiom in Ruby, as well.

Conclusion

In the end, I wasn't completely happy with any of these solutions. I think that things are better than they were before. There are also a number of other permutations that will get it marginally closer to ideal. I think that the real solution is to update Item so that you can create a new item and save an existing item with a single method. After that, the code to choose whether to create a new object or fetch an existing object should be minimal and extractable.

I did learn a rule of thumb for deciding when to replace an if statement with polymorphism. If you find yourself saying "I don't want to deal with this, I wish it were handled for me," there's a good chance that you could benefit from some polymorphism. Also, if you find yourself checking the same condition multiple times in a function (as we had in the original implementation), you might want to consider whether polymorphism will help you.

Thursday, August 07, 2008

Experiments in Firmware Hacking

I got a new ethernet-ready printer today, and wanted to add it to my existing wireless network. This is not the usual home wireless network use case - most people want to share an upstream connection with a bunch of wireless clients. I wanted to connect a wired device to a wireless network. I first tried using a spare Airport Express. That worked perfectly. Then, I decided to try getting my Linksys WRT54G v2 to work. When I realized that the stock firmware was definitely not up to the task, I grabbed Tomato. It claims to be solid, fast, and AJAX-y with realtime, SVG charts. How could I resist. However, the settings that I needed weren't obvious at first. After some fiddling, I think I've made it work. I'll share them here in case they're useful to somebody.

The most important setting is Wireless Mode (under Basic/Network). Here's my current best understanding of these modes:

Access Point	This is what the Linksys router would do with the default firmware. It allows wireless clients to connect to it in infrastructure mode, and will route packets between the wireless network, the LAN network, and the WAN network (with NAT).
Access Point + WDS	I think this may work like the Airport Express' WDS Remote mode. That would mean that it can accept wireless clients and simultaneously connect to a WDS network.
Wireless Client	This mode appeared to work like the Wireless Ethernet Bridge mode, except with NAT. It appeared that the router will can run a DHCP server on the LAN interfaces. It also requires that the WAN port be configured, which seems very strange to me.
Wireless Ethernet Bridge	This is the one that ended up doing what I need. As far as I can tell, the WAN port is disabled. The device connects to an existing wireless network. It will then route packets between the wired and wireless network without NAT. Furthermore, contrary to other reports, it appears that you can connect devices to more than one of the LAN ports. I had both my printer and my laptop connected to LAN ports, and things still seemed to be working.
WDS	I think this may be similar to the Airport Express' "WDS Relay" mode.

It might be fun to pick up a WRT54GL. (The 54G has been simplified and will no longer work with most custom firmware. The 54GL restores the missing features. It appears to have been created specifically so that people can continue to use alternative firmware.)

Sunday, August 03, 2008

Date Formatting in Javascript

Problem

I found myself doing some stuff in Javascript. In particular, I needed to be able to turn a Date into HTML specific to the application that I'm working on. Let's say that the server sends us a task's creation date. We need to format it:

var fromServer = Date.parse("Sun Aug 03 2008 23:12:52 GMT-0400 (EDT)");

td.innerHTML = format_date(fromServer);

<td>
    <span class="date">2008-08-03</span><span class="time">11:12 PM</span>
</td>

However, and I have no guarantee that it will always be a parseable date. In fact, the server sends '-' for dates that don't exist. We still need to output something.

var fromServer = "-";

td.innerHTML = format_date(fromServer);

<td>-</td>

I would like format_date to accept either a Date that should be formatted, or a string that should be passed along verbatim (with only necessary HTML character entity escaping). How can we do this in an object-oriented fashion?

Attempt 1

The default object-oriented mindset would encourage us to use polymorphism. We have an object, and we want to be able to call a method on that object. Well, we have a Date object.

Since this is Javascript, we could stick an extra method onto Date.prototype that would let us do this. While we're at it, we can put a similar function onto String.prototype:

Date.prototype.formatAsAppSpecificHTML = function() {
    return "<span>" + this.getFullYear() + ... + "</span><span>" + this.getHours() ... + "</span>";
}

String.prototype.formatAsAppSpecificHTML = function() {
    return this;
}

function format_date(o) {
    return o.formatAsAppSpecificHTML();
}

There are some problems with this.

We have an ugly, ugly method name. This is because we're mixing abstractions. A Date, in general, shouldn't know how to format itself in this way. Why not? Because it probably doesn't apply to most usages of Date. It may be a common behavior in my application, but its a nonsensical behavior in your application. Since the method is very context-specific, the name has to be equally specific.
This only works in an "open" language that lets us add methods to an existing class/prototype (Ruby and Lua (and arguably C#) fall into this camp, Java and C++ do not). Even if your language has the necessary support, you still have to wonder whether it's a good idea to handle stuff this magical.
It's not obvious. You don't normally connect Date and String. You don't expect to see methods shared between them. They are very orthogonal primitives. Yet we've tied them together in an unnatural way. In order for somebody to discover this, they need to think to look in two (potentially distant) places in the code.

Attempt 2

Polymorphism is a form of runtime decision making. Rather than use language constructs (such as 'if' and '? :'), polymorphism leverages the power of pointers. Since Polymorphism created some problems in this example, what if we switch to use a more traditional (i.e. not object-oriented) solution?

function format_date(o) {
    if (o.constructor === Date) {
        return "<span>" + o.getFullYear() + ... + "</span><span>" + o.getHours() ... + "</span>";
    } else {
        return o.toString();
    }
}

This approach also creates some problems. We've simply moved the complexity further up the ladder. Before, the knowledge that a task may or may not have a creation date was strewn across 2 types: Date and String. Looking at either type in isolation, you only see half of the picture. You might not realize that you can take anything that comes from the server and format it correctly. Now, however, that knowledge is pushed in your face. We're making explicit decisions about concrete types wherever we need to. It's easy to get it right once. So far, we're only handling formatting. What if we also want to draw a timeline? What if we want to find the earliest task in a list of tasks? What if you want to relate a task to source control submissions that occurred while the task was active. In all of these cases, you will need to deal with the fact that a task might or might not have a starting date. At some point, somebody's going to forget that they need to check this, and there's going to be a bug.

Attempt 3

If object-oriented didn't work, and procedural didn't work, what are we going to do? Well, actually, I lied. The first attempt used one form of object-oriented abstraction. There are many more. Both of the attempts so far have suffered from primitive obsession. They dealt with both Strings and Dates. In actuality, we don't want to concern ourselves with either of these. We actually have something different - we have an OptionalDate. An OptionalDate knows whether it represents an actual date or whether it represents no date at all. It can format itself correctly in either case, and can be compared to other OptionalDates for sorting purposes. In fact, OptionalDate handles any operation that needs to work with both actual dates and "not dates".

function format_date(o) {
    return o.format();
}

function OptionalDate(d) {
    this.d = d;
}

OptionalDate.prototype.format() {
    if (d) {
        return "<span>" + this.d.getFullYear() + ... + "</span><span>" + this..getHours() ... + "</span>"
    } else {
        return "-";
    }
}

What makes this a better solution? After all, the code looks very similar to the code in attempt 2.

It localizes the code better than attempt 2. Rather than checking for the presence of a date all throughout your code base, you can collect all those if/else statements in one place. You also get the chance for some pretty cool higher-order programming, where OptionalDate has a method that takes a function to be called if the OptionalDate actually has a date.
It also gives us a better place to hang domain-specific code. Hey, business logic has to live somewhere. It never seems right to put it on the primitive objects, and it also doesn't make sense to put it at the highest level of abstraction. Business logic is the foundation upon which you build an application. As a foundation, it needs a separate place to live.
It makes more sense. When a new programmer is brought onto the team, they will be able to better understand just what is going on. This is extremely important. I believe that, if a person ever has a question about the code, it's a good sign that the code should change. That doesn't mean that you actually take the time to refactor the code, but it's a sign that this is a place that could use some attention.

My Solution

In the end, I went with something close to Attempt 2. This was actually my first choice; Attempt 1 was purely synthetic. I'm working with a legacy code base, and I'm a little wary about introducing big changes just yet. I'm also very conscious about time. In any case, this is an improvement over what was there before (it just treated the date/time as an opaque string, which wouldn't work at all for my requirements).

Conclusion

There are definitely some problems for which the object-oriented noun/verb paradigm breaks down. Or, perhaps stated more precisely, there are problems where that paradigm confuses more than it helps. However, that isn't true in this case. We were able to introduce some good refactorings even while staying true to the spirit of good design.

You may wonder why I care so much. I mean, any of the attempts would have solved the problem perfectly well. Why spend time even thinking about it? I believe that pragmatism is an important trait in programmers, but so too is learning. Whenever you start working on a problem, you need to choose a direction to pursue. Until you start walking, you'll never make progress. You may find yourself at a dead-end, but you wouldn't have known if you hadn't gone that way. My goal is to develop a strong enough design sense that the path that I choose with little thought tends to be one that will work out in the long run. Programming is both tactical and strategic. Most programmers develop their tactical skill as a natural part of writing code. I'm trying to sharpen my strategic skill.

Blogger Timestamping

Interesting thing about Blogger - it looks like the "Posted by... at..." clause uses the time the post was started, not the time that you eventually push the big Publish button.

F is for Fail

It's really sad to see code copied-and-pasted. I feel like a teacher who is grading a test, only to find that two kids have exactly the same answers. It's even sadder to see copied-and-pasted comments. That's like seeing that two kids have exactly the same answers, and both kids' answers are wrong.