Tuesday, December 25, 2007

Taxicab Numbers

Futurama is one of my favorite TV shows. It's the quintessential show about nerds, produced by nerds, for nerds, and it's chock full of inside jokes in math, science, sci-fi and pop culture.

I'm happy to note that although Futurama, the series, was cancelled after the fourth season, the show is back with a set of direct-to-DVD features. The first one, Bender's Big Score, is available now, and includes lecture by Sarah Greenwald as a special feature on the various math jokes that have appeared in the show. Some are silly, like π-in-one oil, or Klein's Beer (in a Klein bottle, of course). Others are subtle, like the repeated appearance of taxicab numbers.

Taxicab numbers are an interesting story in and of themselves. The story comes to us from G. H. Hardy, who was visiting Srinivasa Ramanujan:

I remember once going to see him when he was lying ill at Putney. I had ridden in taxi-cab No. 1729, and remarked that the number seemed to be rather a dull one, and that I hoped it was not an unfavourable omen. "No", he replied, "it is a very interesting number; it is the smallest number expressible as the sum of two [positive] cubes in two different ways."

Many of the staff involved in creating Futurama received advanced degrees in math, computer science and physics, so they enjoy making these obscure references, because they know their fans will find them. It shouldn't be surprising that the number 1729 makes many appearances in Futurama, or that the value 87539319 appears as an actuall taxicab number (87539319 is the sum of 2 cubes in 3 different ways).

Of course, the idea of taxicab numbers is interesting, the details less so. After watching the new feature, I wondered what Hardy's taxicab number was, and what two pairs of cubes add up to that value. If I were more mathematically inclined, I'd probably sit down with pencil and paper and just do the math until I found something. If I were truly lazy, I'd just Google for the answer. Instead, I sat down, fired up vim and typed this code up, pretty much as you see it here:

cube x = x * x * x

taxicab n = [(cube a + cube b, (a, b), (c, d))
| a <- [1..n],
b <- [(a+1)..n],
c <- [(a+1)..n],
d <- [(c+1)..n],
(cube a + cube b) == (cube c + cube d)]

I want to find the smallest taxicab number, but searching an unbounded range will not lead to a result. Therefore, it's necessary to search a bounded range. If there are any taxicab numbers among the integers [1..n], then this function will find them. Moreover, if there are many solutions to this problem in that range, this function will find all of them.

I could go into details, but my explanation would not be as clear and precise as the code above.

Here are some results:

*Main> taxicab 10
*Main> taxicab 12
*Main> taxicab 20

Ordinarily, I wouldn't blog about something so trivial, but as I stared at the code and the output from ghci, it reminded me of an incident in college. The Chudnovsky brothers came up with a new algorithm to compute π that was both very precise and very quick. A professor of mine wrote a quick, throwaway implementation of this algorithm in Maple to demonstrate it. Most of my work in those days was in C. Maple wasn't a tool I'd normally use, but it was easily the best tool for the job.

The lesson my professor was trying to teach me wasn't to use computer algebra systems to indulge a fixation with transcendental numbers. Instead, he was trying to show me that a true computer scientist is presented with a problem, writes a program to solve it, and moves onto the next problem.

I could have written my taxicab number search in C or Java, but I would have lost interest somewhere between #include <stdio.h> or public static void main(String args[]) and firing up the compiler. I could have written the code in Perl, but I would have likely lost interest around the time I saw a nested for loop, and contemplated the four levels of nesting.

Perl programmers know Larry Wall's vision for Perl - to make the easy things easy and the hard things possible. As I stared at this code, I realized that Larry's dictum misses an important cornerstone: to make the trivial things trivial.

Haskell may get a bad rap as making hard things easy and easy things hard, but it does make trivial things trivial. Whether it's list comprehensions solve a problem in a single statement, or parser combinators that make parsing code read like the grammar it implements, there's a lot to be said for a language that gets out of your way and lets you say precisely what you mean.

Wednesday, December 19, 2007

More thoughts on rewriting software

I started writing about when it is acceptable to consider rewriting a software project a few months back (part one and two). I remain convinced that it is occasionally economically sound to toss a problematic codebase and start anew. There are dozens of pesky little details to consider, like:

  • Is the code horrifically overcomplicated, or just aesthetically unpleasing?

  • How soon before the new system can replace the old?

  • How soon before there's a net benefit to the rewrite?

  • Are you rewriting to address fundamental flaws, or adopt this week's fashionable language/library/architecture/buzzword?

  • Do you understand the problem the code attempts to solve? Really? Really?

  • Since the project began, is there a new piece of off-the-shelf piece of software that makes a lot of this pain simply go away?

Steve Yegge looks at the problem from a different angle in his essay Code's Worst Enemy. (See Reginald Braithwaite's synopsis if you want to read just the choice quotes.) Steve argues that code size is easily the riskiest element in a project:

My minority opinion is that a mountain of code is the worst thing that can befall a person, a team, a company. I believe that code weight wrecks projects and companies, that it forces rewrites after a certain size, and that smart teams will do everything in their power to keep their code base from becoming a mountain. Tools or no tools. That's what I believe.

Steve's essay focuses on behaviors endemic among many Java programmers that lead to large codebases that only get larger. As codebases grow, they become difficult to manage, both in terms of the tools we use to build software from the code (IDEs and the like), as well as the cognitive load needed to keep track of what 20,000 pages of code do on a 1 million line project.

The issues aren't specific to Java, or even Java programmers. Rather, they stem from an idea that the size of a code base has no impact on the overall health of a project.

The solution? Be concise. Do what it takes to keep a project small and manageable.

Of course, there are many ways to achieve this goal. One way is to rewrite large, overgrown projects and rebuild them from scratch. This could be done in the same language (building upon the skills of the developers currently on the project), or porting the project to a new language. Another strategy involves splitting a large project into two or more isolated projects with clearly defined interfaces and responsibilities. Many more alternatives exist that don't involve ignoring the problem and letting a 1 million line project fester into a 5 million line project.

Interestingly, Jeff Atwood touched upon the issue as well, in his essay Nobody Cares What Your Code Looks Like. Jeff re-examines the great Bugzilla rewrite, when the team at Netscape decided in 1998 to convert the code from Tcl to Perl. The goal was ostensibly to encourage more contributions, since few people would be interested in learning Tcl to make contributions to Bugzilla. Nine years later, and the Bugzilla team considers Perl to be its biggest liability, because Perl is stagnating, and Perl culture values the ability of every Perl programmer to speak a slightly different dialect of Perl. Newer projects written in PHP, Python, Java and Ruby outcompete Bugzilla because (in a couple cases) they are comparable to Bugzilla today (without taking nine years of development to reach that plateau), and can deliver new features faster than the Bugzilla team can.

Nevertheless, Jeff stresses that although it may be ugly, harder to customize and extend, Bugzilla works. And it has worked for nearly a decade. And numerous large projects use it, and have no intentions to switch to the new bug tracker of the week anytime soon. So what if the code is ugly?

There's a lesson here, and I don't think it's Jeff's idea that nobody cares what your code looks like. Jeff is right that delivering value to customers is always most important, whether they are paying customers that keep your company in business, or users who rely on your open source project to supply a critical piece of infrastructure. He's also right that customers couldn't care less if your code uses Factories, Decorators and Iterators, or Closures, Catamorphisms and Currying. It is a disservice to your customers if you force them to wait while you rewrite your software from a static language (like Java) to a dynamic language (like Ruby), because dynamic languages are all the rage this year. Are you going to go through the same song and dance when type inferencing becomes popular? Massive concurrency?

While customers don't care about how a product is written, they do care about the effects of that choice. If your software crashes a lot, needs frequent security patches and occasionally corrupts data, well, maybe you shouldn't have written your timesheet application in C. If your rendering application is slow, uses ludicrous amounts of memory, then maybe you shouldn't have written it in Ruby. If your application provides a key piece of infrastructure, and there's literally no one who could replace you if you get hit by a bus, then maybe you shouldn't have written it in APL.

And if your application is written in a manner that leads to a huge codebase that makes it hard to find and fix bugs, costly to extend and requires a floor full of programmers to maintain, maybe you owe it to your customer to find a way to reduce the size of your codebase and deliver more value. Because sooner or later, that customer just might find a similar application, built with a smaller codebase and less internal complexity that's cheaper own and faster to customize and fix. And when it becomes too costly or too painful to maintain your application, they will switch.