Notes on Haskell: August 2007

Tuesday, August 21, 2007

Rewriting Software

Ovid is wondering about rewrite projects. It’s a frequent topic in software, and there’s no one answer that fits all situations.

One of the clearest opinions is from Joel Spolsky, who says rewrites are “the single worst strategic mistake that any software company can make”. His essay is seven years old, and in it, he takes Netscape to task for open sourcing Mozilla, and immediately throwing all the code away and rewriting it from scratch. Joel was right, and for a few years Mozilla was a festering wasteland of nothingness, wrapped up in abstractions, with an unhealthy dose of gratuitous complexity sprinkled on top. But this is open source, and open source projects have a habit of over-estimating the short term and under-estimating the long term. Adam Wiggins revisited the big Mozilla rewrite issue earlier this year when he said:

[W]hat Joel called a huge mistake turned into Firefox, which is the best thing that ever happened to the web, maybe even the best thing that’s ever happened to software in general. Some “mistake.”

What’s missing from the discussion is an idea from Brian Eno about the differences between the “small here” vs. the “big here”, and the “short now” vs. the “long now”. Capsule summary: we can either live in a “small here” (a great apartment in a crappy part of town) or a “big here” (a beautiful city in a great location with perfect weather and amazing vistas), and we can live in a “short now” (my deadline is my life) or a “long now” (how does this project change the company, the industry or the planet?).

On the one hand, Joel’s logic is irrefutable. If you’re dealing with a small here and a short now, then there is no time to rewrite software. There are revenue goals to meet, and time spent redoing work is retrograde, and in nearly every case poses a risk to the bottom line because it doesn’t deliver end user value in a timely fashion.

On the other hand, Joel’s logic has got more holes in it than a fishing net. If you’re dealing with a big here and a long now, whatever work you do right now is completely inconsequential compared to where the project will be five years from today or five million users from now. Requirements change, platforms go away, and yesterday’s baggage has negative value — it leads to hard-to-diagnose bugs in obscure edge cases everyone has forgotten about. The best way to deal with this code is to rewrite, refactor or remove it.

Joel Spolsky is arguing that the Great Mozilla rewrite was a horrible decision in the short term, while Adam Wiggins is arguing that the same project was a wild success in the long term. Note that these positions do not contradict each other. Clearly, there is no one rule that fits all situations.

The key to estimating whether a rewrite project is likely to succeed is to first understand when it needs to succeed. If it will be evaluated in the short term (because the team lives in a small here and a short now), then a rewrite project is quite likely to fail horribly. On the other hand, if the rewrite will be evaluated in the long term (because the team lives in a big here and a long now), then a large rewrite project just might succeed and be a healthy move for the project.

Finally, there’s the “right here” and “right now” kind of project. Ovid talks about them briefly:

If something is a tiny project, refactoring is often trivial and if you want to do a rewrite, so be it.

In my experience, there are plenty of small projects discussed in meetings where the number of man hours discussing a change or rewrite far exceeds the amount of time to perform the work, often by a factor of ten or more. Here, the answer is clear — just do the work, keep a back up for when you screw up, and forget the dogma about rewriting code. If it was a mistake, rolling back the change will also take less time than the post mortem discussion.

Ovid raises another interesting point: large projects start out from smaller ones, so if it’s OK to rewrite small projects, and small projects slowly turn into large projects, when does it become unwise to rewrite a project?

The answer here isn’t to extrapolate based on project size, but rather on the horizon. A quick little hack that slowly morphs into something like MS Word will eventually become rewrite-averse due to short term pressures. A quick little hack that slowly morphs into something like Firefox will remain somewhat malleable, so long as it can take a long time to succeed.

Monday, August 20, 2007

Universal Floating Point Errors

Steve Holden writes about Euler’s Identity, and how Python can’t quite calculate it correctly. Specifically,

e^{i π} + 1 = 0

However, in Python, this isn’t quite true:

>>> import math
>>> math.e**(math.pi*1j) + 1
1.2246063538223773e-16j

If you note, the imaginary component is quite small: -1 x 10^-16.

Python is Steve’s tool of choice, so it’s possible to misread his post and believe that python got the answer wrong. However, the error is fundamental. Witness:

$ ghci
Prelude> :m + Data.Complex 
Prelude Data.Complex> let e = exp 1 :+ 0
Prelude Data.Complex> let ipi = 0 :+ pi
Prelude Data.Complex> e
2.718281828459045 :+ 0.0
Prelude Data.Complex> ipi
0.0 :+ 3.141592653589793
Prelude Data.Complex> e ** ipi + 1
0.0 :+ 1.2246063538223773e-16

As I said, it would be possible to misread Steve’s post as a complaint against Python. It is not. As he says:

I believe the results would be just as disappointing in any other language

And indeed they are, thanks to irrational numbers like π and the limitations of IEEE doubles.

Updated: corrected uses of -iπ with the proper exponent, iπ.

Wednesday, August 15, 2007

Does Syntax Matter?

An anonymous commenter on yesterday’s post posits that Haskell won’t become mainstream because of the familiar pair of leg irons:

I think one of the biggest problems in Haskell, aside from it not being very easy (whats a monad?), is syntax.

There are many reasons why Haskell may not become mainstream, but syntax and monads aren’t two of them. I’m a populist, so I get offended when a language designer builds something that’s explicitly designed to drag masses of dumb, lumbering programmers about half way to Lisp, Smalltalk, or some other great language. I want to use a language built by great language designers that they themselves not only want to use, but want to invite others to use.

I could be wrong here. Maybe being a ‘mainstream programming language’ is means designing something down to the level of the great unwashed. I hope not. I really hope not. But it could be so. And if it is, that’s probably the one and only reason why Haskell won’t be the next big boost in programming language productivity. That would also disqualify O’Caml, Erlang and perhaps Scala as well. Time will tell.

But syntax? Sorry, not a huge issue.

Sure, C and its descendants have a stranglehold on what a programming language should look like to most programmers, but that’s the least important feature a language provides. Functional programmers, especially Lisp hackers have been saying this for decades. Decades.

A language’s syntax is a halfway point between simplifying the job of the compiler writer and simplifying the job of the programmer. No one is going back and borrowing syntax from COBOL, because it’s just too damn verbose and painful to type. C is a crisp, minimal, elegant set of constructs for ordering statements and expressions, compared to its predecessors.

Twenty years ago, the clean syntax like C provided made programming in all caps in Pascal, Fortran, Basic or COBOL seem quaint. Twenty years from now, programming with curly braces and semicolons could be just as quaint. Curly braces and semicolons aren't necessary, they're just a crutch for the compiler writer.

To prove that syntax doesn’t matter, I offer 3 similar looking languages: C, Java (or C#, if you prefer) and JavaScript. They all use a syntax derives from C, but they are completely separate languages. C is a straight forward procedural language, Java is a full blown object oriented language (with some annoying edge cases), and JavaScript is a dynamic, prototype-based object oriented language. Just because a for loop looks the same in these three languages means absolutely nothing.

Knowing C doesn’t help you navigate the public static final nonsense in Java, nor does it help you understand annotations, inner classes, interfaces, polymorphism, or design patterns. Going backward from Java to C doesn’t help you write const-correct code, or understand memory allocation patterns.

Knowing C or Java doesn’t help much when trying to use JavaScript to its full potential. Neither language has anything resembling JavaScript’s dynamic, monkeypatch everything at runtime behavior. And even if you have a deep background in class-based object oriented languages, JavaScript’s use of prototypes will strike you as something between downright lovely and outright weird.

If that doesn’t convince you, consider the fact that any programmer worthy of the title already uses multiple languages with multiple syntaxes. These typically include their language of choice, some SQL, various XML vocabularies, a few config file syntaxes, a couple of template syntaxes, some level of perl-compatible regular expressions, a shell or two, and perhaps a version or two of make or a similar utility (like Ant or Maven).

Add that up, and a programmer can easily come across two dozen different syntaxes in a single project. If they can’t count that high, it’s not because they do all their work in a single syntax[1], but because it takes too much effort to stop and count all of the inconsequential little syntaxes. (Do Apache pseudo-XML config files count as a separate syntax? Yeah, I guess they do. It took that Elbonian consultant a day to track down a misconfigured directive last year…)

So, no, Mr. Anonymous. Haskell’s syntax isn’t a stumbling block. You can learn the basics in an afternoon, get comfortable within a week, and learn the corner cases in a month or two.

Now, as for monads - the problem with monads is that they seem harder to understand than they really are. That is, it is more difficult to explain what a monad is than it is to gain a visceral understanding of what they do. (I had this same problem when I was learning C — it was hard to believe that it was really that simple.)

If you caught my introduction to Haskell on ONLamp (parts 1, 2 and 3), you may have seen this tidbit right before the end of part 3:

[M]onads enforce an order of execution on their statements. With pure functions, sub-expressions may be evaluated in any order without changing their meaning. With monadic functions, the order of execution is very important.

That is, monads allow easy function composition that also ensures linear execution, much like you would expect from writing a series of statements within a function in C, a method in Java, or a block of Javascript. There are other interesting properties of monads, but this is the most fundamental.

[1]: Lisp and Smalltalk programmers might honestly count one single syntax for all their work. :-)

More SPJ

If, like me, you couldn’t make it to OSCon this year and missed Simon Peyton Jones’ presentations, then you may want to catch up with these videos:

A Taste of Haskell
- Part 1 (download), 1:18
- Part 2 (download), 1:51
- Slides

Nested Data Parallelism
(an earlier version, presented to λondon Haskell Users Group)
- Video, 1:21
- Slides

Enjoy!

Haskell: more than just hype?

Sometimes I wonder if the Haskell echo chamber is getting louder, or if programmers really are fed up with the status quo and slowly drifting towards functional programming. My hunch is that this is more than a mere echo chamber, and interest in Haskell and functional programming is for real.

I’m hardly an objective observer here, since I’m solidly within the echo chamber (note the name of this blog). Functional programming has been on my radar for at least a decade, when I first started playing with DSSSL. Coding with closures now feels more natural than building class hierarchies and reusing design patterns, regardless of the language I am currently using.

If you still aren’t convinced that Haskell is more than a shiny bauble lovingly praised by a lunatic fringe, here are some recent data points to consider, all involving Simon Peyton Jones:

Bryan O’Sullivan pointed out last week that the Simon’s talks are the most popular videos from OSCon:
“Simon’s Haskell language talks are the most popular of the OSCON videos, and have been viewed over 50% more times than the next ten most popular videos combined.”

In Simon’s keynote presentation at OSCon last month, he points out that threading as a concurrency model is decades old, easy enough for undergraduates to master in the small, but damn near impossible to scale up to tens, hundreds or thousands of nodes. The research on threads has stalled for decades, and it’s time to find a better concurrency model that can help us target multiprocessor and distributed systems. (Nested data parallelism and software transactional memory both help, and are both available for experimentation in Haskell today.)

An interview with Simon and Erik Meijer that introduces the topic of nirvana in programming.

Start by separating programming languages along two axes: pure/impure and useful/useless. The top left has impure and useful languages like C, Java and C#. The bottom right has pure and useless languages like Haskell (at least before monadic I/O). The sweet spot is the top right, where a pure, useful language would be found, if it existed.

However, the C# team is borrowing heavily from Haskell to produce LINQ, and the Haskell research community is folding the idea back into Haskell in the form of comprehensive comprehensions. Both languages are slowly converging on nirvana: C# is getting more pure, and Haskell is getting more useful.

The subtext in all of these tidbits is that computing is not a static discipline. We know that hardware improves at a predictable pace, in the form of faster CPUs, cheaper memory, denser storage and wider networks.

Software, or at least the way we construct software, also improves over time. One unscientific way to express this is through relative differences in productivity between programming languages. It’s reasonable to expect a project to take less time to create when using Java instead of C, and even less time when using Ruby or Python instead of Java or C#.

By extension, there should be a new language that is even more productive than Python or Ruby. I happen to think that language will be Haskell, or at least very similar to Haskell. But it might also be OCaml, Erlang or Scala. In any case, Simon Peyton Jones is right — the way forward involves functional programming, whether it means choosing a language like Haskell, or integrating ideas from Haskell into your language of choice.

Notes on Haskell