Notes on Haskell

Celebrating Ada Lovelace Day - Dr. Rebecca Mercuri

2010-03-24T22:47:00.002-05:00

In honor of Ada Lovelace Day, I want to shine a light on a very important woman in modern computing: Dr. Rebecca Mercuri.

I was lucky enough to have Dr. Mercuri (while she was working on her Ph. D.) as my professor for Programming Language Concepts, the first serious comp sci course in the curriculum when I was in school. This was the course after the intro programming courses where you learned the mechanics and basic data structures, but before the serious topics like system architecture, operating systems and artificial intelligence.

The course had a rather large footprint. It was the first time we used C as the language of instruction, and the first time we had to work on a project that spanned the entire term. The course was about all about parsing (specifically writing recursive descent parsers by hand), and slowly moved towards writing a small Scheme interpreter by the end of it all. That's a lot of material to cover in 10 weeks, especially for a sophomore.

Although the class was daunting, the goal wasn't to learn Scheme, C or basic parsing theory. Dr. Mercuri told us in as many words that this was "a class about learning how to learn a programming language". A good computer science curriculum isn't about learning today's set of fashionable skills, it's about learning the fundamentals. Languages come and go (how many have you used over the past 5 years? 10 years?), but the fundamentals don't change nearly as much.

Sometimes, I think I owe my career to Dr. Mercuri. Whenever I go back to the well and pull something out of my undergrad education, more often than not it's something I learned in that one course. Thank you, Dr. Mercuri.

While I feel privileged to have been one of her many students over the years, that's probably not why you should know about her work. Dr. Mercuri is one of the few people who have spent the last few decades speaking out about the evils of electronic voting machines. When shrouded in secrecy, these inscrutable proprietary systems subvert the very democratic processes they claim to promote. (HBO's documentary Hacking Democracy illustrates some of the problems with actual systems in use in the field.)

The issue isn't that electronic voting machines are an inherently bad idea. Electronic voting machines can simplify balloting and speed tabulation in a secure manner, if they offer voter-verified balloting (also known as the "Mercuri Method"). Current systems merely speed tabulation with no way to detect tampering or perform a recount, making their results worth less than the paper they're printed on.

The world needs more people like Rebecca Mercuri. And we all need to join her to help voter verified ballot systems sweep away egregiously bad electronic voting machines.

Closures and Scoping Rules

2008-09-16T22:39:00.002-05:00

Although the concept of a closure in a programming language construct is fairly clear, some of the details are a little fuzzy around the edges. The central ideas are that (a) closures are anonymous functions (b) created as first class values that (c) capture variables in the enclosing lexical scope. Unfortunately, the notions of "variable" and "enclosing lexical scope" vary subtly across languages. These lead to odd edge cases that can be demonstrated with contrived examples, but can also lead to unexpected behaviors in legitimate code in some languages.

Take, for example, the much maligned for loop in C:

    if (x) {
        for (int i = 0; i < 10; i++) {
            if (i == 2) 
                break;
            /* ... */
        }
     
        if (i < 10) {
            /* ... */
        }
    }

This snippet raises two important questions: What is the scope of int i? And what should it be? Early C compilers avoided the question entirely by forcing variables to be declared before the first statement. C++ and ANSI C relaxed those rules to allow variables to be declared within the body of a function.

In a language that doesn't support closures or variable capture, these questions are mostly meaningless, unless you're writing a compiler or trying to out-geek the nerd in the next cube. Now that closures are on the table, the finer points of language design have important and subtle consequences.

There are a few ways to interpret this snippet of C code. One way is to consider the for loop a lexical block, and the lifespan of int i is limited to the loop itself; the precondition, postcondition and increment clauses are merely syntactic sugar that are still tied to the loop itself.

Another way to look at the code is to interpret the int i declaration as appearing immediately before the loop, making it visible to any subsequent point in the same block (the if block in this example). This is useful if you want to inspect the value of the loop index after the loop terminates to check whether the loop exited prematurely.

A third way to look at the code is to have the declaration of int i be visible to any subsequent point in the body of the function that declares it. This would mean that although the declaration may appear within the if statement, the variable could be used after the if statement, even if the body of the if statement were never executed.

A fourth way would be to use the declaration of int i as a kind of guidance for the parser. That is to say, the parser would enforce the desired scoping rules, but the compiler would always allocate space for the variable that is visible for the entire body of the enclosing function.

There are other possible interpretations. Sometimes they are actively chosen as the desired way to handle variable scoping. Other times, they are merely a quirk of a buggy implementation. Ruby (pre-1.9), lets code from an inner block leak into an enclosing block:

     $ ruby --version
    ruby 1.8.6 (2008-03-03 patchlevel 114) [universal-darwin9.0]
    $ irb
    irb> i=0; [1,2,3].each {|i|}; i
    => 3

(from Sam Ruby's presentation on Ruby 1.9 from OSCon 2008)

There are many ways to get lexical scoping wrong. But how should it work?

One of the best examples comes from SICP, using Scheme, of course:

    (define (make-account balance)
      (define (withdraw amount)
        (if (>= balance amount)
            (begin (set! balance (- balance amount))
                   balance)
            "Insufficient funds"))
      (define (deposit amount)
        (set! balance (+ balance amount))
        balance)
      (define (dispatch m)
        (cond ((eq? m 'withdraw) withdraw)
              ((eq? m 'deposit) deposit)
              (else (error "Unknown request -- MAKE-ACCOUNT"
                           m))))
      dispatch)

Assigning a variable to the result of (make-account 100) creates a new object that contains a single variable balance with the value of 100, and two local functions which reference that same value. Passing the token 'deposit or 'withdraw to that object updates the shared reference to balance.

With this example in mind, the C# team at Microsoft wrote a specification for their language that defined a clear behavior for all of these dark corners, and got them right (or at least resolved the ambiguity in previous languages). Of course, they got lexical scoping right, so closures work properly.

Yet, even with properly implemented lexical scoping, there are still some odd cases programmers need to be aware of. Erik Meijer recently posted an example in C# that can be confusing:

    var DelayedActions = new List<Func<int>>();
     
    for (var i = 4; i < 7; i++)
    {
       DelayedActions.Add(delegate(){ return i; });
    }

When the contents of DelayedActions are evaluated, they will return the values 7, 7 and 7. And believe it or not, this is behavior is correct and proper, even if it is unexpected. Each closure captures a reference to the single variable i. Once the loop terminates, the variable persists, and each closure has a reference to the final value of that variable, 7.

A slight change to the code produces a different result:

    var DelayedActions = new List<Func<int>>();
     
    for (var i = 4; i < 7; i++)
    {
       int j = i;
       DelayedActions.Add(delegate(){ return j; });
    }

In this code, each iteration through the loop creates a new variable j, which copies the current value of i. The result of evaluating this set of closures are the values 4, 5, and 6. The difference is subtle, but important.

Erik Meijer noticed that with a few minor tweaks, the C# code could be run without translation as valid JavaScript. The first example, when translated, still produces the values 7, 7, 7. However, the second example produces the values 6, 6, 6.

Although JavaScript and C# have a C-like syntax in common, they have very different semantics. In JavaScript, declaring a variable inside a for loop does not create a new variable each time through the loop. Instead, it is allocated at the start of the function body! Oops.

All of a sudden, issues that used to be relegated to language lawyers are becoming relevant to everyday developers. And developers today frequently switch between languages, like, say, C# and JavaScript. You can blame Microsoft for doing a lot of things wrong, but their language research group isn't at fault here. If anything, JavaScript is exposing an optimization of dubious value, and letting it change the semantics of the language.

Incidentally, this kind of misunderstanding simply can't happen in Haskell. That's because closures (lambdas and curried functions) don't capture variables, but rather values. (Remember, variables don't vary in Haskell.) Here is a contrived example (using a Ruby-ish approach) that mimics the first C# snippet:

    each :: [a] -> (a -> b) -> [b]
    each = flip map
     
    example1 :: [a -> Int]
    example1 = [4, 5, 6] `each` (\i -> (\_ -> i))
     
    map (\f -> f ()) example1 -- [4, 5, 6]

This example seems to implement the behavior of the first C# loop, but actually implements the behavior of the second C# loop. That's because each delayed action is capturing a different unique value. To capture a shared reference takes more work, because shared references are less idiomatic:

    import Data.IORef
     
    eachM :: [a] -> (a -> IO b) -> IO [b]
    eachM = flip mapM
     
    example2 :: IO [t -> IO Int]
    example2 = do i <- newIORef 0
                  r <- ([4, 5, 6] `eachM` \j -> do writeIORef i j
                                                   return (\_ -> readIORef i))
                  modifyIORef i (+ 1)
                  return r

Note the odd type for example2. The outer IO is needed to actually construct the delayed actions using some form of shared state. Once the list of delayed actions is constructed, then pulling out the values requires doing something stateful. (The IO monad was used here for simplicity, but another stateful monad could be used here instead.)

Evaluating the list of delayed actions is also trickier:

    example2 >>= mapM (\f -> f ())  -- [7, 7, 7]

What's the lesson here? Closures are coming, but the rules around scoping still vary across languages. Programmers tend to hop around between languages today, much more than we did 10 or 20 years ago. As we adapt to use closures in our programs, we need to be aware of these arcane issues, and how to avoid them.

The Closures are Coming

2008-09-07T14:22:00.003-05:00

Last month, Chris Lattner at Apple announced support for closures in C for Clang, the llvm-based C compiler Apple has been working on as a potential replacement for the venerable gcc. Because Apple is working on this extension, it means that closures are coming to both C and Objective-C.

This is a sorely needed feature C has been missing for quite a number of years. The need may not have been pressing two decades ago, in the era of smaller programs on early 32-bit CPUs with kilobytes or maybe a couple of megabytes of RAM. But those days are long behind us, and constructing programs without closures feels clunky, cumbersome, and a throwback to the era of 8-bit home "computers" running some quirky BASIC dialect.

Nearly every major general purpose programming language in use today supports closures or soon will. There are a few exceptions, like PHP, Tcl, Cobol and Fortran that don't have them, and it's debatable whether these languages are either "major", or will get closures "soon". (Update: PHP 5.3 has/will have closures. See below.)

C# has them, and has made them easier to use in the most recent version the language by providing lambda expressions. Java is likely to get them, and the BGGA proposal looks pretty strong. And even if these languages didn't, F# (which is basically OCaml by another name) and Scala are merging the functional and object oriented disciplines on the CLR and JVM, bringing features like closures and currying to these platforms.

The only major language that doesn't have closures is C, and thanks to Chris Lattner and the llvm team, this is changing. Hopefully their changes will be accepted widely, and this feature will be added to the next C standardization.

Adding closures to C introduces exposes a few leaky abstractions, though. Local variables in C (including parameter values) live on the stack, which doesn't do much good if a closure doesn't live long enough to be invoked. Therefore, this change adds a few new features that allow closures to be copied to the heap (via persistent = _Block_copy(stack_closure)), reference counted while on the heap, and freed when no longer in use (via _Block_release(persistent)). There are a few other wrinkles, and the usage is slightly cumbersome, but, well, it's C, and you should be used to that by now.

However, the implications of this change are much more profound.

Adding closures to C is tantamount to saying it is unacceptable for a general purpose programming language to omit support for closures. All of the old arguments for withholding them are now void: it can be done in a compiled language. It can be done without using a virtual machine. It can be done in a mainstream language. Closures are useful beyond the functional ghetto where the started (and offering just pointers to functions simply doesn't cut it).

If closures are available in C, one of the oldest and most widely used languages without closures, what's your excuse for not providing them in your pet language? It may be hard to do, but the techniques are proven, well known, and open for inspection. The utility of providing programmers with the means to use idioms like map, filter and fold (or reduce) is too overwhelming to claim it's not worth the time, cost or complexity to implement.

The writing is on the wall. If you intend to develop software in the 21st century, you will be using closures, or going to great lengths to avoid them.

This may sound like a bold statement, but it really isn't. Programming languages have co-evolved in fits and starts for decades. Some feature, like recursion, proves itself in academia, but is wildly regarded as unnecessary in industry. Slowly, industry adapts, and that feature becomes a necessity in all future programming languages.

Recursion may sound like an esoteric feature to require in a programming language. After all, it's not something that you expect to use every day. But what about lexical scoping? Believe it or not, block scoped variables weren't standard practice until the mid-1980s or so; having only global variables or dynamically scoped variables was once deemed "sufficient" until the consensus formed around lexical scoping as simply the right way to do it. A similar history surrounds object oriented programming, and garbage collection.

What's next? Who knows. The jury is still out on static vs. dynamic programming environments, live code vs. dead code, metaprogramming, macros, virtual machines vs. native compilation, type inferencing, strong static typing, prototypes vs. metaclasses, and a dozen other features where languages differ. Closures, at least, are on their way to becoming a required feature for every important programming language.

Arrows in Javascript

2008-08-24T12:18:00.001-05:00

Last year, I mentioned a lesson I learned from one of my CS professors many years ago:

He began by recommending every project start by hiring a mathematician for a week or two to study the problem. If a mathematician could find an interesting property or algorithm, then it would be money well spent, and drastically reduce the time/money/effort needed to develop a system.

A few weeks ago, I came across a paper about Arrows in Javascript. It's an interesting adaptation of arrows to do asynchronous, event driven programming in Javascript. It makes an excellent example of this dictum.

Here's the problem in a nutshell. There's a very interesting language lurking underneath Javascript. It's an object oriented language with namespaces, closures, and dynamic typing. Doug Crockford has written and presented these concepts over the last few years.

As good as the Javascript is on its own merits (albeit abused and misunderstood), it is generally used in a pretty horrendous environment. Browsers have compatibility bugs in their language support and the cross platform APIs they expose. Most of all, browsers execute Javascript in a single-threaded manner. This makes asynchronous programming difficult, especially when there are two interwoven threads of execution, like drag and drop and dynamic page updates.

This paper by a team at UMD starts by bringing continuation passing to Javascript, and implementing arrows on top of that. Once arrows are available, building a library for asynchronous event programming is much easier than brute force chaining of event callbacks. For example, drag and drop programming can be specified common method chaining:


 ConstA(document.getElementById(”dragtarget”)) 
 .next(EventA(”mouseover”)) 
 .next(setupA) 
 .next(dragDropCancelA) 
 .run ()

The plumbing inside the library can be a little hairy, especially if you find CPS or arrows befuddling. But the exposed API is quite elegant and appears to be easy to use. It'll be interesting to watch the Javascript library space to see when and where it pops up.

Thought of the Day

2008-05-14T17:53:00.001-05:00

Also from Steve Yegge's talk:

[W]hen we talk about performance, it's all crap. The most important thing is that you have a small system. And then the performance will just fall out of it naturally.

Static vs. Dynamic Languages

2008-05-14T15:47:00.004-05:00

One permatopic across programming blogs is the good ol' static-vs-dynamic languages debate. Static languages (like C, C++, C#, C--, Java, etc.) require variables to have type declarations primarily to generate compiled code and secondarily to catch errors or emit warnings at compile time. Dynamic languages (like Perl, Python, PHP, Ruby, Tcl, Lisp, Scheme, Smalltalk, etc.) do away with all those needless keystrokes, allow programmers to be more productive, and generally aren't as dangerous as the C-programmers would have you believe. Furthermore, dynamic language programmers are quick to point out that any real program in a static language requires typecasting, which throws all of the "guarantees" out the window, and bring in all of the "dangers" of dynamic languages with none of the benefits.

However, this debate isn't limited to these opposite points of view. There's another perspective, which I heard from Mark-Jason Dominus about a decade ago: static typing in languages like C and Pascal isn't really static typing! (Correction: Mark actually said that "strong" typing in C and Pascal isn't really strong typing.)

The kind of static typing supported by C, its ancestors and its descendants, dates back to the 1950s and Fortran (or, rather FORTRAN, since lower case letters hadn't been invented yet). Type annotations in these languages are guides to the compiler in how much storage to allocate for each variable. Eventually, compilers began to emit warnings or compiler errors mixing variables of incompatible types (like floating point numbers and strings), because those kinds of operation don't make sense. If the irksome compiler gets in your way, you can always tell the it to shut up and do the operation anyway and maybe let the program crash at runtime. Or maybe silently corrupt data. Or maybe something completely different. None of which makes much sense, but there you go.

Modern static typing, on the other hand, uses a strong dose of the Hindley-Milner type inference algorithm to determine (a) what values a variable may contain and (b) prevent the use of incompatible values in expressions. Furthermore, Hindley-Milner prevents values being cast from one type to another on a whim (unlike C), and thus prevents odd runtime errors through abuse of the type system. This leads to an strong, unbreakable type system that is enforced by the compiler, and prevents a whole class of loopholes allowed by lesser type systems. Because types can be inferred, explicit type annotations aren't always needed, which leads to an interesting property: languages that are safer than "safe" static languages like C and Java, and programs that are free of those troublesome, noisy type declarations, just like dynamic languages.

Use of the Hindley-Milner type system comes to us through ML, and descendants like Haskell, OCaml, SML, F# and the like. It's such a good idea, that it's slowly making its way into new programming languages like Perl 6, Fortress, Factor, Scala, and future versions of C++, C#, and VB.Net.

So, really, the whole static-vs-dynamic languages debate is both misleading and moot. It's misleading, because it's not about static vs. dynamic typing, it's about weak compile-typing vs. runtime-typing. It's moot, because the writing is on the wall, and weakly typed languages are on the way out, to be replaced (eventually) by strongly-typed languages of one form or another. (It also tends to be a meaningless "debate", because it's typically a shouting match between C/C++/Java/etc. acolytes who are uncomfortable with Perl/Python/Ruby/etc., and Perl/Python/Ruby/etc. acolytes who have a strong distaste for C/C++/Java/etc., and neither group has a hope of persuading the other.)

Normally, I would avoid topic entirely, but today, I couldn't resist. Steve Yegge posted a transcription of his talk on dynamic languages, where he covers both sides of the debate, and moves beyond the two-sides-talking-past-each-other and uncovers the core issues. He covers the meat of the discussion, not the typical push-button issues that go nowhere and convince no one. (I think he misses the point behind Hindley-Milner, but that's an insignificant side issue.)

Most of the work in compiler optimization in the last few decades has focused on weakly-typed static languages.
Dynamic languages are, in general, slower, only because there has been relatively little research interest in making them faster.
There are some interesting approaches to optimizing dynamic languages that are now ready for prime time. Much of this comes from Strongtalk, an effort to optimize Smalltalk that led to Java's HotSpot VM.
Some of the "benefits" of "static" typing, like refactoring IDEs, aren't as good as they appear at first. Refactoring dynamic languages is different but IDEs could do about as good a job as they do for C++/C#/Java. Refactoring properly is a hard problem that no one has solved completely, yet.
"Fast" is relative. C++ used to be the "fast" language because it compiles down to native code. But in a multicore world, C/C++ is becoming slow, because C/C++ programs cannot take advantage of multiple CPUs easily, unlike Java.
Dynamic languages of the 1990s (Perl/Python/Ruby/Tcl) don't have a reasonable answer to scaling across CPUs, either.
Making Java bytecode run quickly takes more work than you probably realized.
Any sufficiently large system built with a static language will need to build in the kind of introspection that's available with a dynamic language and REPL. (Greenspun's Tenth Rule in action)
It was easy for the industry to switch programming languages every decade or so, up until 1995. Today, the bar is higher, and the amount of effort invested in existing code makes it much more difficult to migrate to a new language and platform.
Google has some of the most brilliant people and language designers in the industry, but they use C++, Java, Python and JavaScript, and none of the esoteric languages the smart kids love. This is because it's more important to standardize on a set of technologies and get stuff done than let the geeks run wild and free. But you probably knew that already. ;-)

Steve's talk is long and rambling, but certainly worth the time read or at least skim through.

Finger Exercises

2008-03-31T20:52:00.003-05:00

Wow. Three months, no update. Hm.

I've been working on a bunch of little projects recently that individually don't amount to much, yet. I should have something to post about that soon.

In the mean time, here is a cute little problem (via Uri to the fun with Perl list):

What is an easy way to get all possible combinations of Y/N of length 5? (i.e., "YYYYY", "YYYYN", etc.)

Coming up with a solution to this exact problem seemed to cry out for a list comprehension:

[[a,b,c,d,e] | let l <- "YN", 
               a <- l, b <- l, c <- l, d <- l, e <- l]

but as soon as I wrote that, I saw how unsatisfactory it was. A better solution would produce all possible combinations of Y and N for a list of any integer n > 0:

yn i = (iterate f [""]) !! i
  where f l = map ('Y':) l ++ map ('N':) l

This problem itself isn't very interesting, but it's good to have little diversions like this when you want to take a break while the coffee is steeping, waiting for the Guinness to settle, or whatnot. (Something that's more than a function from the Prelude, but less involved than something from Project Euler.)

Do you have any cute little "finger exercise" problems like these? I'd love to hear about them!

Taxicab Numbers

2007-12-25T16:12:00.000-05:00

Futurama is one of my favorite TV shows. It's the quintessential show about nerds, produced by nerds, for nerds, and it's chock full of inside jokes in math, science, sci-fi and pop culture.

I'm happy to note that although Futurama, the series, was cancelled after the fourth season, the show is back with a set of direct-to-DVD features. The first one, Bender's Big Score, is available now, and includes lecture by Sarah Greenwald as a special feature on the various math jokes that have appeared in the show. Some are silly, like π-in-one oil, or Klein's Beer (in a Klein bottle, of course). Others are subtle, like the repeated appearance of taxicab numbers.

Taxicab numbers are an interesting story in and of themselves. The story comes to us from G. H. Hardy, who was visiting Srinivasa Ramanujan:

I remember once going to see him when he was lying ill at Putney. I had ridden in taxi-cab No. 1729, and remarked that the number seemed to be rather a dull one, and that I hoped it was not an unfavourable omen. "No", he replied, "it is a very interesting number; it is the smallest number expressible as the sum of two [positive] cubes in two different ways."

Many of the staff involved in creating Futurama received advanced degrees in math, computer science and physics, so they enjoy making these obscure references, because they know their fans will find them. It shouldn't be surprising that the number 1729 makes many appearances in Futurama, or that the value 87539319 appears as an actuall taxicab number (87539319 is the sum of 2 cubes in 3 different ways).

Of course, the idea of taxicab numbers is interesting, the details less so. After watching the new feature, I wondered what Hardy's taxicab number was, and what two pairs of cubes add up to that value. If I were more mathematically inclined, I'd probably sit down with pencil and paper and just do the math until I found something. If I were truly lazy, I'd just Google for the answer. Instead, I sat down, fired up vim and typed this code up, pretty much as you see it here:

cube x = x * x * x

taxicab n = [(cube a + cube b, (a, b), (c, d))
            | a <- [1..n],
              b <- [(a+1)..n], 
              c <- [(a+1)..n], 
              d <- [(c+1)..n],
              (cube a + cube b) == (cube c + cube d)]

I want to find the smallest taxicab number, but searching an unbounded range will not lead to a result. Therefore, it's necessary to search a bounded range. If there are any taxicab numbers among the integers [1..n], then this function will find them. Moreover, if there are many solutions to this problem in that range, this function will find all of them.

I could go into details, but my explanation would not be as clear and precise as the code above.

Here are some results:

*Main> taxicab 10
[]
*Main> taxicab 12
[(1729,(1,12),(9,10))]
*Main> taxicab 20
[(1729,(1,12),(9,10)),(4104,(2,16),(9,15))]

Ordinarily, I wouldn't blog about something so trivial, but as I stared at the code and the output from ghci, it reminded me of an incident in college. The Chudnovsky brothers came up with a new algorithm to compute π that was both very precise and very quick. A professor of mine wrote a quick, throwaway implementation of this algorithm in Maple to demonstrate it. Most of my work in those days was in C. Maple wasn't a tool I'd normally use, but it was easily the best tool for the job.

The lesson my professor was trying to teach me wasn't to use computer algebra systems to indulge a fixation with transcendental numbers. Instead, he was trying to show me that a true computer scientist is presented with a problem, writes a program to solve it, and moves onto the next problem.

I could have written my taxicab number search in C or Java, but I would have lost interest somewhere between #include <stdio.h> or public static void main(String args[]) and firing up the compiler. I could have written the code in Perl, but I would have likely lost interest around the time I saw a nested for loop, and contemplated the four levels of nesting.

Perl programmers know Larry Wall's vision for Perl - to make the easy things easy and the hard things possible. As I stared at this code, I realized that Larry's dictum misses an important cornerstone: to make the trivial things trivial.

Haskell may get a bad rap as making hard things easy and easy things hard, but it does make trivial things trivial. Whether it's list comprehensions solve a problem in a single statement, or parser combinators that make parsing code read like the grammar it implements, there's a lot to be said for a language that gets out of your way and lets you say precisely what you mean.

More thoughts on rewriting software

2007-12-19T23:56:00.001-05:00

I started writing about when it is acceptable to consider rewriting a software project a few months back (part one and two). I remain convinced that it is occasionally economically sound to toss a problematic codebase and start anew. There are dozens of pesky little details to consider, like:

Is the code horrifically overcomplicated, or just aesthetically unpleasing?

How soon before the new system can replace the old?

How soon before there's a net benefit to the rewrite?

Are you rewriting to address fundamental flaws, or adopt this week's fashionable language/library/architecture/buzzword?

Do you understand the problem the code attempts to solve? Really? Really?

Since the project began, is there a new piece of off-the-shelf piece of software that makes a lot of this pain simply go away?

Steve Yegge looks at the problem from a different angle in his essay Code's Worst Enemy. (See Reginald Braithwaite's synopsis if you want to read just the choice quotes.) Steve argues that code size is easily the riskiest element in a project:

My minority opinion is that a mountain of code is the worst thing that can befall a person, a team, a company. I believe that code weight wrecks projects and companies, that it forces rewrites after a certain size, and that smart teams will do everything in their power to keep their code base from becoming a mountain. Tools or no tools. That's what I believe.

Steve's essay focuses on behaviors endemic among many Java programmers that lead to large codebases that only get larger. As codebases grow, they become difficult to manage, both in terms of the tools we use to build software from the code (IDEs and the like), as well as the cognitive load needed to keep track of what 20,000 pages of code do on a 1 million line project.

The issues aren't specific to Java, or even Java programmers. Rather, they stem from an idea that the size of a code base has no impact on the overall health of a project.

The solution? Be concise. Do what it takes to keep a project small and manageable.

Of course, there are many ways to achieve this goal. One way is to rewrite large, overgrown projects and rebuild them from scratch. This could be done in the same language (building upon the skills of the developers currently on the project), or porting the project to a new language. Another strategy involves splitting a large project into two or more isolated projects with clearly defined interfaces and responsibilities. Many more alternatives exist that don't involve ignoring the problem and letting a 1 million line project fester into a 5 million line project.

Interestingly, Jeff Atwood touched upon the issue as well, in his essay Nobody Cares What Your Code Looks Like. Jeff re-examines the great Bugzilla rewrite, when the team at Netscape decided in 1998 to convert the code from Tcl to Perl. The goal was ostensibly to encourage more contributions, since few people would be interested in learning Tcl to make contributions to Bugzilla. Nine years later, and the Bugzilla team considers Perl to be its biggest liability, because Perl is stagnating, and Perl culture values the ability of every Perl programmer to speak a slightly different dialect of Perl. Newer projects written in PHP, Python, Java and Ruby outcompete Bugzilla because (in a couple cases) they are comparable to Bugzilla today (without taking nine years of development to reach that plateau), and can deliver new features faster than the Bugzilla team can.

Nevertheless, Jeff stresses that although it may be ugly, harder to customize and extend, Bugzilla works. And it has worked for nearly a decade. And numerous large projects use it, and have no intentions to switch to the new bug tracker of the week anytime soon. So what if the code is ugly?

There's a lesson here, and I don't think it's Jeff's idea that nobody cares what your code looks like. Jeff is right that delivering value to customers is always most important, whether they are paying customers that keep your company in business, or users who rely on your open source project to supply a critical piece of infrastructure. He's also right that customers couldn't care less if your code uses Factories, Decorators and Iterators, or Closures, Catamorphisms and Currying. It is a disservice to your customers if you force them to wait while you rewrite your software from a static language (like Java) to a dynamic language (like Ruby), because dynamic languages are all the rage this year. Are you going to go through the same song and dance when type inferencing becomes popular? Massive concurrency?

While customers don't care about how a product is written, they do care about the effects of that choice. If your software crashes a lot, needs frequent security patches and occasionally corrupts data, well, maybe you shouldn't have written your timesheet application in C. If your rendering application is slow, uses ludicrous amounts of memory, then maybe you shouldn't have written it in Ruby. If your application provides a key piece of infrastructure, and there's literally no one who could replace you if you get hit by a bus, then maybe you shouldn't have written it in APL.

And if your application is written in a manner that leads to a huge codebase that makes it hard to find and fix bugs, costly to extend and requires a floor full of programmers to maintain, maybe you owe it to your customer to find a way to reduce the size of your codebase and deliver more value. Because sooner or later, that customer just might find a similar application, built with a smaller codebase and less internal complexity that's cheaper own and faster to customize and fix. And when it becomes too costly or too painful to maintain your application, they will switch.

Defanging the Multi-Core Werewolf

2007-10-03T23:43:00.000-05:00

Many developers are have a nagging sense of fear about the upcoming “multi-core apocalypse”. Most of the software we write and use is written in imperative languages, which are fundamentally serial in nature. Parallelizing a serialized program is painful, and usually involves semaphores, locks, and comes with problems like deadlock, livelock, starvation and irreproducible bugs.

If we’re doomed to live in a future where nearly every machine uses a multi-core architecture, then either (a) we have a lot of work ahead of us to convert our serialized programs into parallelized programs, or (b) we’re going to be wasting a lot of CPU power when our programs only use a single processor element on a 4-, 8-, or even 64-core CPU.

At least that’s the received wisdom. I don’t buy it.

Yes, software that knows how to exploit parallelism will be different. I don’t know that it will be much harder to write, given decent tools. And I certainly don’t expect it to be the norm.

Here is some evidence that the multi-core monster is more of a dust bunny than a werewolf.

First, there’s an offhand remark that Rob Pike made during his Google Tech Talk on Newsqueak, a programming language he created 20 years ago at Bell Labs to explore concurrent programming. It’s based on Tony Hoare’s CSP, the same model used in Erlang. During the talk, Rob mentioned that the fundamental concurrency primitives are semaphores and locks, which are necessary when adding concurrency to an operating system, but horrible to deal with in application code. A concurrent application really needs a better set of primitives that hide these low level details. Newsqueak and Erlang improve upon these troublesome primitives by offering channels and mailboxes, which make most of the pain of concurrent programming go away.

Then, there’s Timothy Mattson of Intel, who says that there are just too many languages, libraries and environments available today for writing parallelized software. Timothy is a researcher in the field of parallel computing, and when someone with such a deep background in the field says the tools are too complicated, I’ll take his word for it. The good news is that very few programmers work on the kinds of embarrassingly parallel problems that require these tools. Working on parallel machines isn’t going to change that for us, either. In the future, shell scripts will continue to execute one statement at a time, on a single CPU, regardless of how many CPUs are available, with or without libraries like Pthreads, PVM or MPI. Parallel programmers are still in a world of hurt, but at least most of us will continue to be spared that pain.

Then there’s Kevin Farnham, who posted an idea of wrapping existing computationally intensive libraries with Intel’s Thread Building Blocks, and loading those wrapped libraries into Parrot. If all goes well and the stars are properly aligned, this would allow computationally intensive libraries to be used from languages like Perl/Python/Ruby/etc. without the need to port M libraries to N languages. (Tim O’Reilly thought it was an important enough meme that he drew attention to it on the O’Reilly Radar.)

This sounds like a hard problem, but adding Parrot to the equation feels like replacing one bad problem with five worse problems. If we’re going to live in a world where CPUs are cheap and parallelism is the norm, then we need to think in those terms. If we need Perl/Python/Ruby/etc. programs to interact with parallelized libraries written in C/C++/Fortran, where’s the harm in spawning another process? Let the two halves of the program communicate over some IPC mechanism (sockets, or perhaps HTTP + REST). That model is well known, well tested, well-understood, widely deployed and has been shipping for decades. Plus, it is at least as language-agnostic as Parrot hopes to become. (+2 points if the solution uses JSON instead of XML.)

Fourth, there’s Patrick Logan, who rightly points out the issue simply isn’t about a multi-core future, but also a multi-node future. Some applications will run in parallel on a single machine, others will run across multiple nodes on a network, and still others will be a hybrid of both approaches. Running programs across a network of nodes is done today, with tools like MapReduce, Hadoop and their kin.

If you have a grid of dual-core machines today, and need to plan out how to best use the network of 64-core machines you will have a decade from now, here’s a very simple migration plan for you: run 32x as many processes on each node!

With that said, here is my recipe for taming the multi-core dust bunny:

Determine what kind of parallelism makes sense for you: none, flyweight, fine-grained or coarse grained.
Avoid troublesome low-level concurrency primitives wherever possible.
Use tools like GHC’s Nested Data Parallelism for flyweight concurrency (one program, lots of data, spread out over multiple CPUs on a single machine).
Use tools like GHC’s Software Transactional Memory for lightweight concurrency (many interoperating processes managing shared data on a single machine).
Use tools like MapReduce and friends for heavyweight concurrency (work spread out across multiple cooperating processes, running on one or many machines).

As Timothy Mattson points out, parallel programming is fundamentally hard, and no one language, tool or environment is going to slay the dragon. I cite NDP here not as a perfect solution, but as a placeholder for a whole class of tools that exhibit of SIMD parallelism. Similarly, STM is a placeholder for a whole class of tools that exhibit MIMD parallelism. Sometimes you need one, sometimes you need the other, sometimes you need both, and sometimes you need neither.

And then there is the issue of virtualization. Perhaps the best use of a multi-core system isn't to use it as a single multiprocessing computer, but as a host for a series of virtual machines. Such a usage sidesteps all of the thorny issues around parallelism entirely, focusing instead on cost savings that accrue from server consolidation and simplified management. This is a very old idea that becomes more and more important as power efficiency in our data centers becomes a hot button issue.

Finally, there’s a looming question about what to do about the desktop. If your laptop has 32 cores, what do you do with them? The simple answer is nothing. As CPUs get faster, they spend more and more of their time in an idle state. The only thing that changes in a multi-core world is that more CPUs are idle. Desktop programmers can spend a lot of time evenly distributing that idleness across all CPUs, or make very few changes, and use only as many CPUs as necessary. Operating systems and basic tools (emulators, compilers, VMs, database engines, web servers, etc.) will need to be multi-core aware and distribute their work across as many CPUs as are available. Some of that work is already done -- make -j has been around for years. Processing intensive applications, like audio/video codecs, image manipulation and the like, will also need to be multi-core aware. The vast majority of the programs we write will continue to be mostly serial, and rarely care about parallelism.

After all, authenticating a user doesn’t get 32x faster on a 32-core machine.

Rewriting Software, Part 2

2007-09-30T15:47:00.000-05:00

When I wrote my previous post about rewriting software, I had a couple of simple goals in mind. First was answering Ovid’s question on when rewriting software is wise, and when it is foolish. Second was to re-examine Joel Spolsky’s dictum, never rewrite software from scratch, and show how software is so complex that no one answer fits all circumstances. Finally, I wanted to highlight that there are many contexts to examine, ranging from the “small here and short now” to the “big here and long now” (to use Brian Eno’s terminology).

When I wrote that post, I though there would be enough material for two or three followup posts that would meander around the theme of “yes, it’s OK to rewrite software”, and move on. The more I wrote, the more I found to write about, and the longer it took to condense that material into something worth reading.

Rather than post long rambling screeds on the benefits of rewriting software, I decided to take some time to plan out a small series of articles, each limited to a few points. Unfortunately, I got distracted and I haven’t posted any material to this blog in over a month. My apologies to you, dear reader.

Of course, there’s something poetic about writing a blog post about rewriting software, and finishing about a month late because I couldn’t stop rewriting my post. There’s a lesson to learn here, also from Joel Spolsky. His essay Fire and Motion is certainly worth reading in times like these. I try to re-read it, or at least recall his lessons, whenever I get stuck in a quagmire and can’t see my way out.

In that spirit, here’s a small nugget to ponder.

If you are a writer, or have ever taken a writing class, you’ve probably come across John R. Trimble’s assertion that “all writing is rewriting.” Isn’t it funny that software is something developers write yet fear rewriting?

There’s a deep seated prejudice in this industry against taking a working piece of software and tinkering with it, except when it involves fixing a bug or adding a feature. It doesn’t matter if we’re talking about some small-scale refactoring, rewriting an entire project from scratch, or something in between. The prejudice probably comes from engineering — there’s no good reason to take a working watch or an engine apart because it looks “ugly” and you want to make it more “elegant.”

Software sits at the intersection of writing and engineering. Unlike pure writing, there are times when rewriting software is simply a bad idea. Unlike pure engineering, there are times when it is necessary and worthwhile to rewrite working code.

As Abelson and Sussman point out, “programs must be written for people to read, and only incidentally for machines to execute.” Rewriting software is necessary to keep code concise and easy to understand. Rewriting software to follow the herd or track the latest trend is pointless and a wasted effort.

Rewriting Software

2007-08-21T00:54:00.000-05:00

Ovid is wondering about rewrite projects. It’s a frequent topic in software, and there’s no one answer that fits all situations.

One of the clearest opinions is from Joel Spolsky, who says rewrites are “the single worst strategic mistake that any software company can make”. His essay is seven years old, and in it, he takes Netscape to task for open sourcing Mozilla, and immediately throwing all the code away and rewriting it from scratch. Joel was right, and for a few years Mozilla was a festering wasteland of nothingness, wrapped up in abstractions, with an unhealthy dose of gratuitous complexity sprinkled on top. But this is open source, and open source projects have a habit of over-estimating the short term and under-estimating the long term. Adam Wiggins revisited the big Mozilla rewrite issue earlier this year when he said:

[W]hat Joel called a huge mistake turned into Firefox, which is the best thing that ever happened to the web, maybe even the best thing that’s ever happened to software in general. Some “mistake.”

What’s missing from the discussion is an idea from Brian Eno about the differences between the “small here” vs. the “big here”, and the “short now” vs. the “long now”. Capsule summary: we can either live in a “small here” (a great apartment in a crappy part of town) or a “big here” (a beautiful city in a great location with perfect weather and amazing vistas), and we can live in a “short now” (my deadline is my life) or a “long now” (how does this project change the company, the industry or the planet?).

On the one hand, Joel’s logic is irrefutable. If you’re dealing with a small here and a short now, then there is no time to rewrite software. There are revenue goals to meet, and time spent redoing work is retrograde, and in nearly every case poses a risk to the bottom line because it doesn’t deliver end user value in a timely fashion.

On the other hand, Joel’s logic has got more holes in it than a fishing net. If you’re dealing with a big here and a long now, whatever work you do right now is completely inconsequential compared to where the project will be five years from today or five million users from now. Requirements change, platforms go away, and yesterday’s baggage has negative value — it leads to hard-to-diagnose bugs in obscure edge cases everyone has forgotten about. The best way to deal with this code is to rewrite, refactor or remove it.

Joel Spolsky is arguing that the Great Mozilla rewrite was a horrible decision in the short term, while Adam Wiggins is arguing that the same project was a wild success in the long term. Note that these positions do not contradict each other. Clearly, there is no one rule that fits all situations.

The key to estimating whether a rewrite project is likely to succeed is to first understand when it needs to succeed. If it will be evaluated in the short term (because the team lives in a small here and a short now), then a rewrite project is quite likely to fail horribly. On the other hand, if the rewrite will be evaluated in the long term (because the team lives in a big here and a long now), then a large rewrite project just might succeed and be a healthy move for the project.

Finally, there’s the “right here” and “right now” kind of project. Ovid talks about them briefly:

If something is a tiny project, refactoring is often trivial and if you want to do a rewrite, so be it.

In my experience, there are plenty of small projects discussed in meetings where the number of man hours discussing a change or rewrite far exceeds the amount of time to perform the work, often by a factor of ten or more. Here, the answer is clear — just do the work, keep a back up for when you screw up, and forget the dogma about rewriting code. If it was a mistake, rolling back the change will also take less time than the post mortem discussion.

Ovid raises another interesting point: large projects start out from smaller ones, so if it’s OK to rewrite small projects, and small projects slowly turn into large projects, when does it become unwise to rewrite a project?

The answer here isn’t to extrapolate based on project size, but rather on the horizon. A quick little hack that slowly morphs into something like MS Word will eventually become rewrite-averse due to short term pressures. A quick little hack that slowly morphs into something like Firefox will remain somewhat malleable, so long as it can take a long time to succeed.

Universal Floating Point Errors

2007-08-20T12:00:00.000-05:00

Steve Holden writes about Euler’s Identity, and how Python can’t quite calculate it correctly. Specifically,

e^{i π} + 1 = 0

However, in Python, this isn’t quite true:

>>> import math
>>> math.e**(math.pi*1j) + 1
1.2246063538223773e-16j

If you note, the imaginary component is quite small: -1 x 10^-16.

Python is Steve’s tool of choice, so it’s possible to misread his post and believe that python got the answer wrong. However, the error is fundamental. Witness:

$ ghci
Prelude> :m + Data.Complex 
Prelude Data.Complex> let e = exp 1 :+ 0
Prelude Data.Complex> let ipi = 0 :+ pi
Prelude Data.Complex> e
2.718281828459045 :+ 0.0
Prelude Data.Complex> ipi
0.0 :+ 3.141592653589793
Prelude Data.Complex> e ** ipi + 1
0.0 :+ 1.2246063538223773e-16

As I said, it would be possible to misread Steve’s post as a complaint against Python. It is not. As he says:

I believe the results would be just as disappointing in any other language

And indeed they are, thanks to irrational numbers like π and the limitations of IEEE doubles.

Updated: corrected uses of -iπ with the proper exponent, iπ.

Does Syntax Matter?

2007-08-15T21:55:00.000-05:00

An anonymous commenter on yesterday’s post posits that Haskell won’t become mainstream because of the familiar pair of leg irons:

I think one of the biggest problems in Haskell, aside from it not being very easy (whats a monad?), is syntax.

There are many reasons why Haskell may not become mainstream, but syntax and monads aren’t two of them. I’m a populist, so I get offended when a language designer builds something that’s explicitly designed to drag masses of dumb, lumbering programmers about half way to Lisp, Smalltalk, or some other great language. I want to use a language built by great language designers that they themselves not only want to use, but want to invite others to use.

I could be wrong here. Maybe being a ‘mainstream programming language’ is means designing something down to the level of the great unwashed. I hope not. I really hope not. But it could be so. And if it is, that’s probably the one and only reason why Haskell won’t be the next big boost in programming language productivity. That would also disqualify O’Caml, Erlang and perhaps Scala as well. Time will tell.

But syntax? Sorry, not a huge issue.

Sure, C and its descendants have a stranglehold on what a programming language should look like to most programmers, but that’s the least important feature a language provides. Functional programmers, especially Lisp hackers have been saying this for decades. Decades.

A language’s syntax is a halfway point between simplifying the job of the compiler writer and simplifying the job of the programmer. No one is going back and borrowing syntax from COBOL, because it’s just too damn verbose and painful to type. C is a crisp, minimal, elegant set of constructs for ordering statements and expressions, compared to its predecessors.

Twenty years ago, the clean syntax like C provided made programming in all caps in Pascal, Fortran, Basic or COBOL seem quaint. Twenty years from now, programming with curly braces and semicolons could be just as quaint. Curly braces and semicolons aren't necessary, they're just a crutch for the compiler writer.

To prove that syntax doesn’t matter, I offer 3 similar looking languages: C, Java (or C#, if you prefer) and JavaScript. They all use a syntax derives from C, but they are completely separate languages. C is a straight forward procedural language, Java is a full blown object oriented language (with some annoying edge cases), and JavaScript is a dynamic, prototype-based object oriented language. Just because a for loop looks the same in these three languages means absolutely nothing.

Knowing C doesn’t help you navigate the public static final nonsense in Java, nor does it help you understand annotations, inner classes, interfaces, polymorphism, or design patterns. Going backward from Java to C doesn’t help you write const-correct code, or understand memory allocation patterns.

Knowing C or Java doesn’t help much when trying to use JavaScript to its full potential. Neither language has anything resembling JavaScript’s dynamic, monkeypatch everything at runtime behavior. And even if you have a deep background in class-based object oriented languages, JavaScript’s use of prototypes will strike you as something between downright lovely and outright weird.

If that doesn’t convince you, consider the fact that any programmer worthy of the title already uses multiple languages with multiple syntaxes. These typically include their language of choice, some SQL, various XML vocabularies, a few config file syntaxes, a couple of template syntaxes, some level of perl-compatible regular expressions, a shell or two, and perhaps a version or two of make or a similar utility (like Ant or Maven).

Add that up, and a programmer can easily come across two dozen different syntaxes in a single project. If they can’t count that high, it’s not because they do all their work in a single syntax[1], but because it takes too much effort to stop and count all of the inconsequential little syntaxes. (Do Apache pseudo-XML config files count as a separate syntax? Yeah, I guess they do. It took that Elbonian consultant a day to track down a misconfigured directive last year…)

So, no, Mr. Anonymous. Haskell’s syntax isn’t a stumbling block. You can learn the basics in an afternoon, get comfortable within a week, and learn the corner cases in a month or two.

Now, as for monads - the problem with monads is that they seem harder to understand than they really are. That is, it is more difficult to explain what a monad is than it is to gain a visceral understanding of what they do. (I had this same problem when I was learning C — it was hard to believe that it was really that simple.)

If you caught my introduction to Haskell on ONLamp (parts 1, 2 and 3), you may have seen this tidbit right before the end of part 3:

[M]onads enforce an order of execution on their statements. With pure functions, sub-expressions may be evaluated in any order without changing their meaning. With monadic functions, the order of execution is very important.

That is, monads allow easy function composition that also ensures linear execution, much like you would expect from writing a series of statements within a function in C, a method in Java, or a block of Javascript. There are other interesting properties of monads, but this is the most fundamental.

[1]: Lisp and Smalltalk programmers might honestly count one single syntax for all their work. :-)

More SPJ

2007-08-15T00:14:00.000-05:00

If, like me, you couldn’t make it to OSCon this year and missed Simon Peyton Jones’ presentations, then you may want to catch up with these videos:

A Taste of Haskell
- Part 1 (download), 1:18
- Part 2 (download), 1:51
- Slides

Nested Data Parallelism
(an earlier version, presented to λondon Haskell Users Group)
- Video, 1:21
- Slides

Enjoy!

Haskell: more than just hype?

2007-08-15T00:04:00.000-05:00

Sometimes I wonder if the Haskell echo chamber is getting louder, or if programmers really are fed up with the status quo and slowly drifting towards functional programming. My hunch is that this is more than a mere echo chamber, and interest in Haskell and functional programming is for real.

I’m hardly an objective observer here, since I’m solidly within the echo chamber (note the name of this blog). Functional programming has been on my radar for at least a decade, when I first started playing with DSSSL. Coding with closures now feels more natural than building class hierarchies and reusing design patterns, regardless of the language I am currently using.

If you still aren’t convinced that Haskell is more than a shiny bauble lovingly praised by a lunatic fringe, here are some recent data points to consider, all involving Simon Peyton Jones:

Bryan O’Sullivan pointed out last week that the Simon’s talks are the most popular videos from OSCon:
“Simon’s Haskell language talks are the most popular of the OSCON videos, and have been viewed over 50% more times than the next ten most popular videos combined.”

In Simon’s keynote presentation at OSCon last month, he points out that threading as a concurrency model is decades old, easy enough for undergraduates to master in the small, but damn near impossible to scale up to tens, hundreds or thousands of nodes. The research on threads has stalled for decades, and it’s time to find a better concurrency model that can help us target multiprocessor and distributed systems. (Nested data parallelism and software transactional memory both help, and are both available for experimentation in Haskell today.)

An interview with Simon and Erik Meijer that introduces the topic of nirvana in programming.

Start by separating programming languages along two axes: pure/impure and useful/useless. The top left has impure and useful languages like C, Java and C#. The bottom right has pure and useless languages like Haskell (at least before monadic I/O). The sweet spot is the top right, where a pure, useful language would be found, if it existed.

However, the C# team is borrowing heavily from Haskell to produce LINQ, and the Haskell research community is folding the idea back into Haskell in the form of comprehensive comprehensions. Both languages are slowly converging on nirvana: C# is getting more pure, and Haskell is getting more useful.

The subtext in all of these tidbits is that computing is not a static discipline. We know that hardware improves at a predictable pace, in the form of faster CPUs, cheaper memory, denser storage and wider networks.

Software, or at least the way we construct software, also improves over time. One unscientific way to express this is through relative differences in productivity between programming languages. It’s reasonable to expect a project to take less time to create when using Java instead of C, and even less time when using Ruby or Python instead of Java or C#.

By extension, there should be a new language that is even more productive than Python or Ruby. I happen to think that language will be Haskell, or at least very similar to Haskell. But it might also be OCaml, Erlang or Scala. In any case, Simon Peyton Jones is right — the way forward involves functional programming, whether it means choosing a language like Haskell, or integrating ideas from Haskell into your language of choice.

Intro to Haskell, Parts 1, 2 and 3

2007-07-24T11:25:00.000-05:00

It's been a while since I've posted here (too long, in fact!).

Readers of this blog would probably be interested in a set of three articles I'm writing for the O'Reilly Network. The first two are available now:

My goals here are to reach the core audience for ONLamp: Perl, Python, PHP and Ruby programmers. Haskell is a very big step away from dynamic languages, and it's also a big change from statically typed languages in the ALGOL family (C, and its descendants C++, Java and C#). The first article makes the case that, although Haskell is strange and different, it's worth the effort to learn -- even if you never manage to use it on a job or a side project. The second article shows how it is possible to write programs using pure functions in the absence of all the familiar tools -- classes, objects, globals and mutable variables.

I'm working on part 3, which will be an introduction to monads. This is something of a rite of passage, because writing monad intros is something of a cottage industry in the Haskell world -- either you've written one, or you're still a dabbler, beginner, or casually interested in Haskell (and not committed to delving into it).

Articles on ONLamp should be article length, oddly enough. That means roughly 2000 words or so. Given that tutorials like All About Monads run dozens of pages, condensing a full tutorial into a single article is hopeless.

Rather than attempt to outdo what many other fine tutorials are going very well already, I'm taking a different approach. The one thing I want to focus on is the mechanics of monads. That is, how they work. Maybe that will be the missing link that the next wave of programmers need to see before they can study Haskell.

Part 3 should be published in a couple of weeks.

Solving Collatz Sequences

2007-06-06T22:28:00.001-05:00

Steve, over at cod3po37ry, is learning Haskell and working through some contest problems, starting with the Collatz sequence. Steve doesn't mention it explicitly, but this is an interesting unsolved problem in mathematics. The problem starts out with a simple function over a single positive integer n:

if n is even, halve it
if n is odd, triple it and add one

A sequence can be generated by starting with some positive integer n, generating the result of this function, and continually applying this function on the series of results until it produces a value of 1:

f 2 = [2,1]
f 3 = [3,10,5,16,8,4,2,1]
f 4 = [4,2,1]
f 5 = [5,16,8,4,2,1]
f 6 = [6,3,10,5,16,8,4,2,1]
f 7 = [7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]

It's an open question as to whether or not this function always converges on a value of 1 for any positive integer n. Wikipedia claims that this property has been proven up through at least 10 × 2⁵⁸, but it's unclear if it's true for all positive integers.

Steve's solution is a little, um, cumbersome, and he makes it clear that his goal is to learn:

This post follows the Law of Usenet: if you post a question, nobody will answer it. If you post an answer, they'll all rush to correct it. I welcome corrections.

In that spirit, here is my response. :-)

Unlike most languages I have used, Haskell has a very interesting property -- if you find yourself writing a lot of code, chances are you're doing something wrong. Generally, this is because the Prelude and the standard library have a rich set of tools for building solutions from the bottom up. Writing a lot of code usually means that you're ignoring a key abstraction somewhere. Haskell programs tend to be short because they can be short; many common problems have generalized solutions in the library already. If you find a general problem that's not solved in the standard library, implementing a general solution typically involves writing one missing function.

The first step to solving this problem involves the collatz function which implements the interesting 3n + 1 property:

collatz :: Int -> Int
collatz 1 = 1
collatz n = if (odd n)
             then (3 * n + 1)
             else n `div` 2

Next, there's the function to produce a "collatz sequence" starting with a number n and (hopefully) terminating with the number 1. Fortunately, this behavior is precisely what the iterate function in the Prelude provides:

iterate :: (a -> a) -> a -> [a]

That is, iterate takes a function and a seed value, and returns a list that contains the seed, and an infinite sequence of values produced by applying the function to the previous value.

Therefore, expanding this simple collatz function to a function that produces a collatz sequence is simply:

collatzSequence :: Int -> [Int]
collatzSequence = iterate collatz

Actually, that's not quite correct, since a collatz sequence terminates with a value of 1:

collatzSequence :: Int -> [Int]
collatzSequence = terminate . iterate collatz
 where
   terminate (1:_) = [1]
   terminate (x:xs) = x:terminate xs

Sure enough, this function works as expected:

*Main> collatzSequence 7
[7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]

That solves the problem of generating collatz sequences. But the contest problem was to find the longest collatz sequence within a range of numbers. Given:

produce this output:

1 10 20
100 200 125
201 210 89
900 1000 174

Generating these results is simple enough. First, convert a collatz sequence to its length:

*Main> length $ collatzSequence 7
17

Next, convert a range of integers into their collatz lengths:


*Main> map (length . collatzSequence) [1..10]
[1,2,8,3,6,9,17,4,20,7]

Next, pick out the largest sequence in the range:

*Main> maximum $ map (length . collatzSequence) [1..10]
17

Next, consume a line of input, perform this calculation, and produce a line of output:

run :: String -> IO ()
run s = do let (i:j:_) = map read (words s)
              m = maximum $ map (length . collatzSequence) [i..j]
          putStrLn (concat [show i, " ", show j, " ", show m])

And finally, write the main function that consumes all input and produces the desired output:

main = do inp <- getContents
           mapM_ run (lines inp)

Here is the completed program in its entirety:

collatz :: Int -> Int
collatz 1 = 1
collatz n = if (odd n) 
             then (3 * n + 1) 
             else n `div` 2

collatzSequence :: Int -> [Int]
collatzSequence = terminate . iterate collatz
  where
    terminate (1:_) = [1]
    terminate (x:xs) = x:terminate xs

run :: String -> IO ()
run s = do let (i:j:_) = map read (words s)
               m = maximum $ map (length . collatzSequence) [i..j]
           putStrLn (concat [show i, " ", show j, " ", show m])

main = do inp <- getContents
          mapM_ run (lines inp)

Hope this helps!

Lessons from the White-bearded Professor

2007-05-24T16:42:00.000-05:00

The first part of my Introduction to Haskell series came out on ONLamp.com today. As always, there was a lot of material I wanted to mention that didn't fit. This first part is an exploration of why haskell deserves more widespread attention. (The next two parts cover functions and monads.)

One topic I wanted to cover dates back to when I was an undergrad. One of my professors, Jim Maginnis, was something of a village elder in computing. He wasn't one of the pioneers in computing that won a Turing Award, or wrote prolifically about his favorite pet trends in computing, or a discoverer of anything fundamental.

This white-bearded, gentle professor was happy teaching decades' worth of students the skills they needed to go out in the world and solve real problems. He felt great joy in both the topics and the students he taught, and that feeling was heartfelt and infectious.

One of the stories Prof. Maginnis told dates back to when he consulted with big businesses as they started computerizing their operations in the 50s, 60s and 70s. He began by recommending every project start by hiring a mathematician for a week or two to study the problem. (Heck, hire two grad students -- they're cheaper!) His point was that if a mathematician could find an interesting property or algorithm, then it would be money well spent, and drastically reduce the time/money/effort needed to develop a system. If they didn't find anything, well, it was only a week or two, and mathematicians are cheap, anyway.

That always struck me as sound advice. Certainly more practical in the early days of computing when everything was new. Today, most projects feel a lot more mundane and predictable, and maybe it isn't as necessary.

But there's always room for good research and deep thought. That's the kind of thinking that gave us relational database engines, regular expressions, automatic garbage collection and compiler generators.

I keep thinking about this anecdote when I describe Haskell to someone for the first time. Instead of taking two grad students for two weeks, get a few dozen of PhDs around the world focused on a problem for a couple of decades. Haskell is one of the things you might end up with. So are Unix, Erlang and Plan9, for that matter.

I wonder what Prof. Maginnis would think of the world of computing if he were alive today. I can't say for sure, but more than a few brilliant computer scientists have been working for more than a few weeks on solving some really hard problems. And I think he would be very happy with the results.

Haskell: Ready for Prime Time

2007-05-23T17:03:00.000-05:00

Bryan O’Sullivan, Don Stewart and John Goerzen announced today that they are working together on a book for O'Reilly that covers Haskell programming. Their mission is to cover the topics that are left uncovered in Haskell texts that focus on being undergraduate textbooks. The draft outline reads like a set of diffs that a Java/Perl/Python/Ruby programmer needs to understand to solve the problems they already know in a new language. And that is a very good thing indeed.

Last week, Eric also announced that he is working on a Haskell book for the Pragmatic Programmers.

It sounds like at least two publishers think there is pent-up demand for a decent, practical Haskell book that could sell more than "30 units per month". And that is certainly a good thing.

Good luck, everyone. I can't wait to read your books. :-)

Analyzing Book Sales

2007-05-17T22:34:00.000-05:00

O'Reilly does the tech community a great service by publishing their quarterly analyses of tech book sales. Yesterday, Mike Hendrickson posted part 4 of the Q1 07 analysis, which details programming languages.

The book sales statistics aren't meaningful in and of themselves, because they simultaneously overstate and understate what's actually happening within the tech world. The general opinion is that Java is important, Ruby is hot, and Fortran is deader than dead, and the book sales statistics seem to back this up. Other data, like counting job postings on the web, can lead to similar conclusions. However, this isn't the entire story -- Fortran is still an incredibly useful language in certain circles (physics and climate simulations, for example), even if the people who rely on it don't buy books or hire frequently.

Although book sales don't paint the whole story, they do provide show some interesting trends and point to questions that deserve further investigation.

For example, Java and C# books each sell at a combined rate of over 50,000 units per quarter, which reinforces (but does not prove) the view that most of the 'developer mass' is focused on these languages. JavaScript, Ruby, and PHP are among the languages that are now shipping over 25,000 units per language per quarter.

In Mike's 'second tier' grouping are languages like Perl and Python, which are now shipping roughly 10,000 units per quarter. That probably deserves some additional analysis; Perl is somewhat stagnant, due to Perl 5 appearing to be in maintenance mode as Perl hackers wait (and wait, and wait) for Perl 6. Also, many recent Perl titles have been hyper-focused on specific modules that aren't widely used, or are only marginally better than the online documentation. So perhaps this indicates that Perl programmers don't buy many Perl books, or perhaps it means that Perl programmers are moving to Java/C#/Ruby to pay the mortgage. Either way, this data is inconclusive.

In the penultimate tier are languages that sell less than 1,000 units per quarter, and include Tcl, Lisp, Scheme, Lua and Haskell. Interestingly, 345 Haskell books sold in Q1 07, compared to 47 in Q1 06, an over 600% increase year-on-year. However, Mike also points out: "[t]he four Haskell titles are averaging fewer than 30 copies per month".[1]

Again, this data is interesting, but inconclusive. Ruby titles experienced a meteoric rise a couple of years ago, when the Pragmatic Programmers released two bestsellers around Ruby. Before that, Ruby titles sold poorly; recently, they are selling over 25,000 units per quarter, and the trend is increasing.

Maybe the answer here is that Haskell is poised to make a similar meteoric rise, and leave the functional programming ghetto. Maybe the answer is that people interested in learning Haskell aren't buying expensive new books, but rather buying used copies, borrowing books, or learning from online references. Maybe the answer is that the four English-language titles on Haskell aren't delivering what the book-buying public needs, indicating that the time is right for a Pragmatic Haskell to repeat the success of Programming Ruby.

Somehow, I think the answer lies in a combination of these three possibilities.

I know that when I was going through Haskell: The Craft of Functional Programming and The Haskell School of Expression, I read through them to figure out what the compiler was trying to do with the code I was trying to write. Once I got comfortable, I didn't refer back to them at all. Instead, I read through numerous papers and online tutorials. The online docs and ghci were what I referred to most to explain what was going on. Today, I am quite comfortable lending out my Haskell books to a close friend. When I was programming in Perl for a living, I would have been much more likely to order a second copy of a Perl book (through work, of course) to lend to a friend than to let go of my only copy of a book.

I wonder if my experience is typical or aberrant, and what that means to the future prospects of selling Haskell books in 2008 or 2009...

[1] The same analysis points out that Practical OCaml is selling under 40 units per quarter. With so little data to analyze, I wonder if this reflects the lack of buzz around OCaml, a general lack of interest in OCaml, or a general lack of interest in this particular title...

Parsing JSON

2007-05-03T22:12:00.000-05:00

Here is the JSON parser I referred to in the previous post. It's not heavily documented because it's pretty close to a 1:1 translation of the specification.

There are some rough edges in this parser. The test suite includes the number 23456789012E666, which is out of range of IEEE doubles, and is read in as Infinity. While this value can be read in as something meaningful, it cannot be emitted, since there is no provision in JSON to express values like Infinity, -Infinity or NaN. The pretty-printer does not re-encode strings containing Unicode characters or escaped characters into a JSON-readable format. Finally, malformed inputs cause exceptions (error).

module JSON where

import Data.Char
import Data.Map hiding (map)
import Text.ParserCombinators.Parsec hiding (token)

--------------------------------------------------------------------------

data JsonValue = JsonString String
           | JsonNumber Double
           | JsonObject (Map String JsonValue)
           | JsonArray  [JsonValue]
           | JsonTrue
           | JsonFalse
           | JsonNull
  deriving (Show, Eq)

--------------------------------------------------------------------------
-- Convenient parse combinators

token :: Parser a -> Parser a
token p = do r <- p
             spaces
             return r

comma :: Parser Char
comma = token (char ',')

--------------------------------------------------------------------------

parseJSON :: String -> JsonValue
parseJSON str = case (parse jsonFile "" str) of
               Left  s -> error (show s)
               Right v -> v

jsonFile :: Parser JsonValue
jsonFile = do contents <- jsonObject <|> jsonArray
              eof
              return contents

-- JSON Object
jsonObject :: Parser JsonValue
jsonObject = do pairs <- between open close (sepBy jsonPair comma)
                return $ JsonObject $ fromList pairs
  where
  open  = token (char '{')
  close = token (char '}')

jsonPair :: Parser (String, JsonValue)
jsonPair = do key   <- token(jsonString)
              token (char ':')
              value <- token(jsonValue)
              return (toString key, value)
  where
 toString (JsonString s) = s
 toString _ = ""

-- JSON Array
jsonArray :: Parser JsonValue
jsonArray = do values <- between open close (sepBy (token jsonValue) comma)
               return $ JsonArray values
  where
 open  = token (char '[')
 close = token (char ']')


-- Any JSON Value
jsonValue :: Parser JsonValue
jsonValue = do spaces
               obj <- token(jsonString 
                             <|> jsonNumber
                             <|> jsonObject
                             <|> jsonArray
                             <|> jsonTrue
                             <|> jsonFalse
                             <|> jsonNull
                             )
               return obj

-- JSON String
jsonString :: Parser JsonValue
jsonString = do s <- between (char '"' ) (char '"' ) (many jsonChar)
                return (JsonString s)

isValidJsonChar ch = (isAscii ch) && (isPrint ch) && (ch /= '\\') && (ch /= '"')

hexToInt s = foldl (\i j -> (16 * i) + j) 0 (map digitToInt s)

jsonChar = satisfy isValidJsonChar
           <|> do char '\\'  -- escaping backslash
                  char '\\'  -- escaped character
                    <|> char '"'
                    <|> char '/'
                    <|> (char 'b' >> return '\b')
                    <|> (char 'f' >> return '\f')
                    <|> (char 'n' >> return '\n')
                    <|> (char 'r' >> return '\r')
                    <|> (char 't' >> return '\t')
                    <|> do char 'u'
                           hex <- count 4 (satisfy isHexDigit)
                           return $ chr (hexToInt hex)

-- JSON Number
jsonNumber :: Parser JsonValue
jsonNumber = do i    <- int
                frac <- option "" frac
                e    <- option "" expo
                return $ JsonNumber (read (i ++ frac ++ e))

int :: Parser String
int = do sign  <- option "" (string "-")
         value <- (string "0" <|> many1 digit)
         return (sign ++ value)

frac :: Parser String
frac = do char '.'
          digits <- many1 digit
          return ( '.':digits)

expo :: Parser String
expo = do e <- oneOf "eE"
          p <- option '+' (oneOf "+-")
          n <- many1 digit
          return (e : p : n)

 
-- JSON Constants
jsonTrue  = token (string "true")  >> return JsonTrue
jsonFalse = token (string "false") >> return JsonFalse
jsonNull  = token (string "null")  >> return JsonNull

--------------------------------------------------------------------------
-- A JSON Pretty Printer
--------------------------------------------------------------------------
pprint v = toString "" v

toString indent (JsonString s) = show s
toString indent (JsonNumber d) = show d
toString indent (JsonObject o) = 
 if (o == empty)
  then "{}"
  else "{\n" ++ showObjs  (indent ++ " ") (toList o) ++ "\n" ++ indent ++ "}"
toString indent (JsonArray []) = "[]"
toString indent (JsonArray a)  = "[\n" ++ showArray (indent ++ " ") a          ++ "\n" ++ indent ++ "]"
toString indent (JsonTrue)     = "true"
toString indent (JsonFalse)    = "false"
toString indent (JsonNull)     = "null"

showKeyValue i k v    = i ++ show k ++ ": " ++ toString i v

showObjs i []         = ""
showObjs i [(k ,v)]   = showKeyValue i k v
showObjs i ((k, v):t) = showKeyValue i k v ++ ",\n" ++ showObjs i t

showArray i []    = ""
showArray i [a]   = i ++ toString i a
showArray i (h:t) = i ++ toString i h ++ ",\n" ++ showArray i t

--------------------------------------------------------------------------

Namespaces Confusion

2007-05-03T21:39:00.000-05:00

I haven't said much about my talk for FringeDC in March. I've been meaning to write this up for a while.

This is a story about a horrific blunder. Thankfully, no Bothans died bringing this information to you.

As I mentioned previously, I wrote a JSON parser to demonstrate how to write a real, live working Haskell program. I started by working off of the pseudo-BNF found on the JSON homepage. From the perspective of the JSON grammar, the constructs it deals with are objects (otherwise known as Maps, hashes, dicts or associative arrays), arrays, strings, numbers, and the three magic values true, false and null.

My first task was to create a data type that captures the values that can be expressed in this language:

data Value = String String
           | Number Double
           | Object (Map String Value)
           | Array [Value]
           | True
           | False
           | Null
  deriving (Eq)

With the datatype in place, I then started writing parsing functions to build objects, arrays, and so on. Pretty soon, I had a JSON parser that passed the validation tests.

I used this piece of working Haskell code during my presentation, highlighting how all the parts worked together -- the parsers that returned specific kinds of Value types, those that returned String values, and so on.

Pretty soon I got tongue tied, talking about how Value was a type, and why String was a type in some contexts, and a data constructor for Value types in other contexts. And how Number wasn't a number, but a Value.

I'm surprised anyone managed to follow that code.

The problem, as I see it, is that I was so totally focused on the JSON domain that I didn't think about the Haskell domain. My type was called Value, because that's what it's called in the JSON grammar. It never occurred to me as I was writing the code that a type called Value is pretty silly. And, because types and functions are in separate namespaces, I never noticed that the data constructor for strings was called String.

Thankfully, the code was in my editor, so I changed things on the fly during the presentation to make these declarations more (ahem) sane:

data JsonValue = JsonString String
           | JsonNumber Double
           | JsonObject (Map String JsonValue)
           | JsonArray  [JsonValue]
           | JsonTrue
           | JsonFalse
           | JsonNull
  deriving (Show, Eq)

I think that helped to clarify that String is a pre-defined type, and JsonString is a value constructor that returns something of type JsonValue.

When I gave this presentation again a couple of weeks ago, the discussion around this JSON parser was much less confusing.

Lesson learned: let the compiler and another person read your code to check that it makes sense. ;-)

Fight the next battle

2007-04-11T22:08:00.000-05:00

Tim O'Reilly posted some book sales data that indicates PHP is going mainstream:

We've noticed that one of the signs that a language is becoming mainstream (and perhaps being abandoned by the cutting edge developers) is that the For Dummies book becomes the top seller. [...]

In Q1 of 2005, the Dummies book was #7; in Q1 2006, #5; in Q1 2007 it's #1. Not only is the Dummies book now #1, four of the top five titles are now introductory.

Tim raises an important point: PHP is going mainstream, and cutting edge developers are looking elsewhere.

Many of the want ads I've seen for PHP recently read like we're looking for someone young and cheap to work like mad and crank out web pages. This is a little depressing, but not entirely surprising. The web has been the dominant development platform for over a decade; no one needs to pay top dollar to just build web apps (or worse, maintain and configure them).

Why would a professional developer want to use PHP and work on a project where someone who picked up PHP For Dummies last month is nearly as qualified? The whole point behind being a professional developer is accepting progressively more challenging tasks, not switching from one entry level skill to the next.

None of this reflects on PHP itself, just PHP's status as a mainstream tool suitable for beginners. Sure, there's interesting work for PHP experts, but the bulk of PHP work doesn't need an expert of any stripe.

Tim's point is an expression of Paul Graham's Python Paradox, but in reverse:

if a company chooses to write its software in a comparatively esoteric language, they'll be able to hire better programmers, because they'll attract only those who cared enough to learn it. And for programmers the paradox is even more pronounced: the language to learn, if you want to get a good job, is a language that people don't learn merely to get a job.

Because PHP skews to beginners and is becoming mainstream, professional programmers are moving away from it.

The problem with the Python Paradox is that it's both vague and fuzzy. By Paul's logic, if I'm starting a company, I should be using Factor or Io, because those languages are extremely esoteric.

Somehow, I don't think esoteric-ness is the important quality.

People who program professionally and truly care about their craft want to solve increasingly hard problems. They want to walk away from the old drugery and focus on what matters: solving the problem. Using the same tools over and over again means each project starts by fighting the last battle over again. Only then can the real work start: fighting the next battle, and solving the new problem.

The "fight the last battle" syndrome helps explain why languages like Python and Ruby are on the rise with professional programmers, while languages like Java and PHP are on the decline. Java uses a heavyweight, cumbersome, static object model, while Python and Ruby use a lightweight, dynamic object model, supplemented with powerful core types like lists and dictionaries. PHP makes it possible to build web applications, while Ruby and Python offer a variety of frameworks to that make web applications easier to build with much less effort.

Thus, Java and PHP are on the decline because they use old ideas and enforce older, more cumbersome styles of programming to solve problems than those offered by Python and Ruby.

As Ruby and Python make dynamic languages respectable, the whole "static vs. dynamic" language debate sounds like fighting last battle to me. It's tired. It's boring. It's a solved problem, they both work, and it's time to move onto something new.

What's the next battle? I'm not sure, but I've got a few ideas:

Functional programming over OOP and Imperative programming
Strong typing over weak static typing or dynamic typing
Mathematical foundations for concurrency over ad hoc models (i.e. threads)
Parsing over regex hackery
Implicit parallelism

I guess this is why I'm more and more comfortable spending time with Haskell these days. Haskell may not be the next thing, but whatever the next big thing is, it's probably going to have Haskelly fingerprints all over it.

Feedback on H:ACT

2007-04-11T21:24:00.000-05:00

I haven't posted anything since I gave my talk to FringeDC last month. Here's how it went.

First, I want to thank Tom Moertel for his comment on limiting the amount of material to cover. I had outlined a lot of stuff that I wanted to cover, and it didn't really fit or flow very well. So I just started cutting stuff out. That actually worked very well.

It was an informal talk, and mostly extemporaneous. The talk took about 90 minutes, with a decent amount of time for questions and discussion throughout. The slides were about the first half, and hit the highlights:

program at the highest layer of abstraction possible
formal mathematical models for programming aren't scary
you understand lambda calculus, even if you didn't know you understand it
type inferencing is good
explicit typing is better, and not odious
monads are wonderful

The second half was a walk through of two Parsec parsers I wrote -- one to parse JSON in about a page of code (including a pretty printer), and a roman numeral parser. Both are trivial programs that do something more meaningful than the overused examples of fibonacci and factorial. There was some interesting discussion in this part of the talk.

At the end of it all, Conrad Barski presented his demo of a project he's working on to simplify tagging. It was pretty interesting demo of something that he whipped up in MzScheme.

About 20-25 people showed up for the talk, which is pretty impressive in DC. Many had some Haskell experience, dabbled in it, or were in the process of learning it. (I saw a few copies of various Haskell texts sitting around the room.)

The most embarrassing part of the meeting came during the end of my slides, when I was discussing one benefit of the type system. Programming using the standard types is useful, but doesn't let the type checker know what you're trying to do. The true benefit is defining domain specific types and letting the type checker determine whether your program is sane.

The example I used was the Mars Climate Orbiter, the mission that crashed into Mars because of a confusion between lbf and Newtons. Then I threw up some handwavy examples to show operations could be defined in terms of either Newtons or pounds of force, and let the compiler prevent this kind of mixup. The examples were handwavy because I wanted to refer people to the dimensional library, which automatically converts units of force, and prevents stupid things like adding feet to seconds and coming up with nonsense.

Why was this embarassing? Because I said that this was Bjorn Bringert's library, when actually it's Bjorn Buckwalter's library. And, unbeknownst to me, Bjorn Buckwalter was sitting right in front of me during the entire presentation. What does he use this library for? Making sure things like the Mars Climate Orbiter don't happen again on his watch. :-)

Overall, the talk went rather well. I got a lot of good feedback, and I think I covered just enough material.