Monday, March 12, 2007

Say what you mean, mean what you say

Reg Braithwaite wrote an interesting tribute to John Hughes' paper Why Functional Programming Matters. In it, Reg points out the difficulty of using iterative methods to express underlying intent:
int numberOfOldTimers = 0;
for (Employee emp: employeeList) {
for (Department dept: departmentsInCompany) {
if (emp.getDepartmentId() == dept.getId()
&& emp.getYearsOfService() > dept.getAge()) {
++numberOfOldTimers;
}
}
}
This just goes to show that there is simply too much code necessary to solve this simple problem in an imperative fashion. It's not an indication that the code is poorly written, but rather, that imperative techniques are a bad fit because they obscure the overall intent.

The nested for loop isn't the goal. The goal is finding the number of employees who have been in the company longer than their department (i.e. "old timers"). As I read this code, I kept re-reading it because I thought I found a bug. I thought the code was counting all employees who have been in the company longer than any department, but that was my misreading. I kept glossing over this clause, which focuses the search to employees who have been with the company longer than their own department:
emp.getDepartmentId() == dept.getId()
Sadly, it's too easy to fall into this trap. There's a lot of stuff to read in this small block of code, and I certainly found it easy to gloss over an important detail. Undoubtedly, the opposite problem is also common: neglecting an important detail in the middle of a series of nested loops; finding those bugs also takes skill and effort to identify and fix.

Reg does a good job explaining why the goal isn't a nested for loop, and why using functional programming techniques (like generating a cartesian product, and filtering the result) bring the code closer to the goal -- counting the number of old timers.

Reg makes the point that what really matters are the functional techniques, not the language or the syntax used. Here's his revised solution, in Ruby (with an extension to support cartesian products on lazy lists):
old_timers = (employees * departments).select do |emp, dept|
emp.department_id == dept.id && emp.years_of_service > dept.age
end
number_of_old_timers = old_timers.size
Generating cartesian products just might be one of the fundamental ideas of computing. In an imperative language, they usually take the form of nested loops. In a language that supports list comprehensions (like Haskell or Python) it becomes a basic idea that melts away with just a little syntax:
[(x, y) | x <- [1..5], y <- [1..10]]
Not only are the products computed as a list, but in Haskell, they're computed lazily, so you only compute what you need.

Here's a revised solution to the old timers problem using list comprehensions:
length [1 | e <- emps, d <- depts, 
(empDeptId e) == (deptId d),
(yearsOfService e) > (age d)]
Note that the actual values returned in this list are inconsequential; the only thing that matters is counting the number of matches. Using length on this list comprehension highlights the fact that the goal here is to produce a count of matching elements. This intention is muddled within the nested for loops, and isn't precisely clear within the Ruby version.

6 comments:

jaydonnell said...

Isn't that imperative code akin to a strawman? Wouldn't it most likely be written something like this?

int numberOfOldTimers = 0;
for(Employee emp: employeeList){
if(dept = emp.getDepartment() && emp.getYearsOfService() > dept.getAge()){
++numberOfOldTimers;
}
}

The select is cleaner, but imperative code isn't as bad as the example would make it seem.

Anonymous said...

I think this is simply bias.

I find the iterative one more readable. The for loops you sort of skip by; you recognize them virtually immediatly as the 'framing' of the logic, exactly analogous to how the 'select' operation in the FP example does a similar thing; it separtes the WHAT from the HOW. As long as this is easy (and it seems to be in both) we're in good shape.

Then the math: This is exactly the same in both samples; an equals and a numeric compare joined together in logical and. The syntax is virtually identical, in fact.

Finally, there's the act of displaying the total. This is arguably nicer in the FP example, but further inspection shows this is a red herring: There's absolutely no reason why the imperative code counts instead of adds, so in theory both could have finished with some sort of 'selectedEmployees.size()' with no problem.

Thus, I conclude this is simply bias: I find the imperative example more readable because I tend to work more with imperative code, and I'm betting you find the FP example more readable because that's how your brain prefers to interpret code.

This example doesn't strike me as a particularly appropriate showcase for inverting the logic order, by the way; there are plenty of examples where a select() operation really does come across as much nicer generally, even to a brain that is pre-disposed to reading for loops more than selections.

Of course, the opposite is true as well, and thus it helps if a language can do both, but on the whole this doesn't really make or break it for me; it's far too trivial and irregardless easily duplicated in any language that has some form of 'function block' notation, and I don't know any short of older variants of basic that don't have that.

Unknown said...

Python list comprehensions won't allow cartesian products AFAIK. [(x, y) for x, y in cartesian (xs, ys)] is as good as it gets but only if you write the cartesian function (as a lazy generator probably).

Anonymous said...

@ Jakub:

>>> [(x, y) for x in range(2) for y in range(3)]
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]

Anonymous said...

jaydonnell: "Isn't that imperative code akin to a strawman?"

Yep, that's why functional programming matters!

Basically, this guy takes a silly solution to a silly problem, shows how that solution is wordy in an imperative language and less so in a functional language, and pulls out his desired conclusion: that Functional Programming is The Solution. Gah!

Frédéric said...

The link to John Hughes' paper is broken.
His paper can now be found here (it is not hosted by the "math" department anymore but the "cse" dpt.)