Thursday, May 24, 2007

Lessons from the White-bearded Professor

The first part of my Introduction to Haskell series came out on ONLamp.com today. As always, there was a lot of material I wanted to mention that didn't fit. This first part is an exploration of why haskell deserves more widespread attention. (The next two parts cover functions and monads.)

One topic I wanted to cover dates back to when I was an undergrad. One of my professors, Jim Maginnis, was something of a village elder in computing. He wasn't one of the pioneers in computing that won a Turing Award, or wrote prolifically about his favorite pet trends in computing, or a discoverer of anything fundamental.

This white-bearded, gentle professor was happy teaching decades' worth of students the skills they needed to go out in the world and solve real problems. He felt great joy in both the topics and the students he taught, and that feeling was heartfelt and infectious.

One of the stories Prof. Maginnis told dates back to when he consulted with big businesses as they started computerizing their operations in the 50s, 60s and 70s. He began by recommending every project start by hiring a mathematician for a week or two to study the problem. (Heck, hire two grad students -- they're cheaper!) His point was that if a mathematician could find an interesting property or algorithm, then it would be money well spent, and drastically reduce the time/money/effort needed to develop a system. If they didn't find anything, well, it was only a week or two, and mathematicians are cheap, anyway.

That always struck me as sound advice. Certainly more practical in the early days of computing when everything was new. Today, most projects feel a lot more mundane and predictable, and maybe it isn't as necessary.

But there's always room for good research and deep thought. That's the kind of thinking that gave us relational database engines, regular expressions, automatic garbage collection and compiler generators.

I keep thinking about this anecdote when I describe Haskell to someone for the first time. Instead of taking two grad students for two weeks, get a few dozen of PhDs around the world focused on a problem for a couple of decades. Haskell is one of the things you might end up with. So are Unix, Erlang and Plan9, for that matter.

I wonder what Prof. Maginnis would think of the world of computing if he were alive today. I can't say for sure, but more than a few brilliant computer scientists have been working for more than a few weeks on solving some really hard problems. And I think he would be very happy with the results.

Wednesday, May 23, 2007

Haskell: Ready for Prime Time

Bryan O’Sullivan, Don Stewart and John Goerzen announced today that they are working together on a book for O'Reilly that covers Haskell programming. Their mission is to cover the topics that are left uncovered in Haskell texts that focus on being undergraduate textbooks. The draft outline reads like a set of diffs that a Java/Perl/Python/Ruby programmer needs to understand to solve the problems they already know in a new language. And that is a very good thing indeed.

Last week, Eric also announced that he is working on a Haskell book for the Pragmatic Programmers.

It sounds like at least two publishers think there is pent-up demand for a decent, practical Haskell book that could sell more than "30 units per month". And that is certainly a good thing.

Good luck, everyone. I can't wait to read your books. :-)

Thursday, May 17, 2007

Analyzing Book Sales

O'Reilly does the tech community a great service by publishing their quarterly analyses of tech book sales. Yesterday, Mike Hendrickson posted part 4 of the Q1 07 analysis, which details programming languages.

The book sales statistics aren't meaningful in and of themselves, because they simultaneously overstate and understate what's actually happening within the tech world. The general opinion is that Java is important, Ruby is hot, and Fortran is deader than dead, and the book sales statistics seem to back this up. Other data, like counting job postings on the web, can lead to similar conclusions. However, this isn't the entire story -- Fortran is still an incredibly useful language in certain circles (physics and climate simulations, for example), even if the people who rely on it don't buy books or hire frequently.

Although book sales don't paint the whole story, they do provide show some interesting trends and point to questions that deserve further investigation.

For example, Java and C# books each sell at a combined rate of over 50,000 units per quarter, which reinforces (but does not prove) the view that most of the 'developer mass' is focused on these languages. JavaScript, Ruby, and PHP are among the languages that are now shipping over 25,000 units per language per quarter.

In Mike's 'second tier' grouping are languages like Perl and Python, which are now shipping roughly 10,000 units per quarter. That probably deserves some additional analysis; Perl is somewhat stagnant, due to Perl 5 appearing to be in maintenance mode as Perl hackers wait (and wait, and wait) for Perl 6. Also, many recent Perl titles have been hyper-focused on specific modules that aren't widely used, or are only marginally better than the online documentation. So perhaps this indicates that Perl programmers don't buy many Perl books, or perhaps it means that Perl programmers are moving to Java/C#/Ruby to pay the mortgage. Either way, this data is inconclusive.

In the penultimate tier are languages that sell less than 1,000 units per quarter, and include Tcl, Lisp, Scheme, Lua and Haskell. Interestingly, 345 Haskell books sold in Q1 07, compared to 47 in Q1 06, an over 600% increase year-on-year. However, Mike also points out: "[t]he four Haskell titles are averaging fewer than 30 copies per month".[1]

Again, this data is interesting, but inconclusive. Ruby titles experienced a meteoric rise a couple of years ago, when the Pragmatic Programmers released two bestsellers around Ruby. Before that, Ruby titles sold poorly; recently, they are selling over 25,000 units per quarter, and the trend is increasing.

Maybe the answer here is that Haskell is poised to make a similar meteoric rise, and leave the functional programming ghetto. Maybe the answer is that people interested in learning Haskell aren't buying expensive new books, but rather buying used copies, borrowing books, or learning from online references. Maybe the answer is that the four English-language titles on Haskell aren't delivering what the book-buying public needs, indicating that the time is right for a Pragmatic Haskell to repeat the success of Programming Ruby.

Somehow, I think the answer lies in a combination of these three possibilities.

I know that when I was going through Haskell: The Craft of Functional Programming and The Haskell School of Expression, I read through them to figure out what the compiler was trying to do with the code I was trying to write. Once I got comfortable, I didn't refer back to them at all. Instead, I read through numerous papers and online tutorials. The online docs and ghci were what I referred to most to explain what was going on. Today, I am quite comfortable lending out my Haskell books to a close friend. When I was programming in Perl for a living, I would have been much more likely to order a second copy of a Perl book (through work, of course) to lend to a friend than to let go of my only copy of a book.

I wonder if my experience is typical or aberrant, and what that means to the future prospects of selling Haskell books in 2008 or 2009...



[1] The same analysis points out that Practical OCaml is selling under 40 units per quarter. With so little data to analyze, I wonder if this reflects the lack of buzz around OCaml, a general lack of interest in OCaml, or a general lack of interest in this particular title...

Thursday, May 3, 2007

Parsing JSON

Here is the JSON parser I referred to in the previous post. It's not heavily documented because it's pretty close to a 1:1 translation of the specification.

There are some rough edges in this parser. The test suite includes the number 23456789012E666, which is out of range of IEEE doubles, and is read in as Infinity. While this value can be read in as something meaningful, it cannot be emitted, since there is no provision in JSON to express values like Infinity, -Infinity or NaN. The pretty-printer does not re-encode strings containing Unicode characters or escaped characters into a JSON-readable format. Finally, malformed inputs cause exceptions (error).

module JSON where

import Data.Char
import Data.Map hiding (map)
import Text.ParserCombinators.Parsec hiding (token)

--------------------------------------------------------------------------

data JsonValue = JsonString String
| JsonNumber Double
| JsonObject (Map String JsonValue)
| JsonArray [JsonValue]
| JsonTrue
| JsonFalse
| JsonNull
deriving (Show, Eq)

--------------------------------------------------------------------------
-- Convenient parse combinators

token :: Parser a -> Parser a
token p = do r <- p
spaces
return r

comma :: Parser Char
comma = token (char ',')

--------------------------------------------------------------------------

parseJSON :: String -> JsonValue
parseJSON str = case (parse jsonFile "" str) of
Left s -> error (show s)
Right v -> v

jsonFile :: Parser JsonValue
jsonFile = do contents <- jsonObject <|> jsonArray
eof
return contents

-- JSON Object
jsonObject :: Parser JsonValue
jsonObject = do pairs <- between open close (sepBy jsonPair comma)
return $ JsonObject $ fromList pairs
where
open = token (char '{')
close = token (char '}')

jsonPair :: Parser (String, JsonValue)
jsonPair = do key <- token(jsonString)
token (char ':')
value <- token(jsonValue)
return (toString key, value)
where
toString (JsonString s) = s
toString _ = ""

-- JSON Array
jsonArray :: Parser JsonValue
jsonArray = do values <- between open close (sepBy (token jsonValue) comma)
return $ JsonArray values
where
open = token (char '[')
close = token (char ']')


-- Any JSON Value
jsonValue :: Parser JsonValue
jsonValue = do spaces
obj <- token(jsonString
<|> jsonNumber
<|> jsonObject
<|> jsonArray
<|> jsonTrue
<|> jsonFalse
<|> jsonNull
)
return obj

-- JSON String
jsonString :: Parser JsonValue
jsonString = do s <- between (char '"' ) (char '"' ) (many jsonChar)
return (JsonString s)

isValidJsonChar ch = (isAscii ch) && (isPrint ch) && (ch /= '\\') && (ch /= '"')

hexToInt s = foldl (\i j -> (16 * i) + j) 0 (map digitToInt s)

jsonChar = satisfy isValidJsonChar
<|> do char '\\' -- escaping backslash
char '\\' -- escaped character
<|> char '"'
<|> char '/'
<|> (char 'b' >> return '\b')
<|> (char 'f' >> return '\f')
<|> (char 'n' >> return '\n')
<|> (char 'r' >> return '\r')
<|> (char 't' >> return '\t')
<|> do char 'u'
hex <- count 4 (satisfy isHexDigit)
return $ chr (hexToInt hex)

-- JSON Number
jsonNumber :: Parser JsonValue
jsonNumber = do i <- int
frac <- option "" frac
e <- option "" expo
return $ JsonNumber (read (i ++ frac ++ e))

int :: Parser String
int = do sign <- option "" (string "-")
value <- (string "0" <|> many1 digit)
return (sign ++ value)

frac :: Parser String
frac = do char '.'
digits <- many1 digit
return ( '.':digits)

expo :: Parser String
expo = do e <- oneOf "eE"
p <- option '+' (oneOf "+-")
n <- many1 digit
return (e : p : n)


-- JSON Constants
jsonTrue = token (string "true") >> return JsonTrue
jsonFalse = token (string "false") >> return JsonFalse
jsonNull = token (string "null") >> return JsonNull

--------------------------------------------------------------------------
-- A JSON Pretty Printer
--------------------------------------------------------------------------
pprint v = toString "" v

toString indent (JsonString s) = show s
toString indent (JsonNumber d) = show d
toString indent (JsonObject o) =
if (o == empty)
then "{}"
else "{\n" ++ showObjs (indent ++ " ") (toList o) ++ "\n" ++ indent ++ "}"
toString indent (JsonArray []) = "[]"
toString indent (JsonArray a) = "[\n" ++ showArray (indent ++ " ") a ++ "\n" ++ indent ++ "]"
toString indent (JsonTrue) = "true"
toString indent (JsonFalse) = "false"
toString indent (JsonNull) = "null"

showKeyValue i k v = i ++ show k ++ ": " ++ toString i v

showObjs i [] = ""
showObjs i [(k ,v)] = showKeyValue i k v
showObjs i ((k, v):t) = showKeyValue i k v ++ ",\n" ++ showObjs i t

showArray i [] = ""
showArray i [a] = i ++ toString i a
showArray i (h:t) = i ++ toString i h ++ ",\n" ++ showArray i t

--------------------------------------------------------------------------

Namespaces Confusion

I haven't said much about my talk for FringeDC in March. I've been meaning to write this up for a while.

This is a story about a horrific blunder. Thankfully, no Bothans died bringing this information to you.

As I mentioned previously, I wrote a JSON parser to demonstrate how to write a real, live working Haskell program. I started by working off of the pseudo-BNF found on the JSON homepage. From the perspective of the JSON grammar, the constructs it deals with are objects (otherwise known as Maps, hashes, dicts or associative arrays), arrays, strings, numbers, and the three magic values true, false and null.

My first task was to create a data type that captures the values that can be expressed in this language:
data Value = String String
| Number Double
| Object (Map String Value)
| Array [Value]
| True
| False
| Null
deriving (Eq)
With the datatype in place, I then started writing parsing functions to build objects, arrays, and so on. Pretty soon, I had a JSON parser that passed the validation tests.

I used this piece of working Haskell code during my presentation, highlighting how all the parts worked together -- the parsers that returned specific kinds of Value types, those that returned String values, and so on.

Pretty soon I got tongue tied, talking about how Value was a type, and why String was a type in some contexts, and a data constructor for Value types in other contexts. And how Number wasn't a number, but a Value.

I'm surprised anyone managed to follow that code.

The problem, as I see it, is that I was so totally focused on the JSON domain that I didn't think about the Haskell domain. My type was called Value, because that's what it's called in the JSON grammar. It never occurred to me as I was writing the code that a type called Value is pretty silly. And, because types and functions are in separate namespaces, I never noticed that the data constructor for strings was called String.

Thankfully, the code was in my editor, so I changed things on the fly during the presentation to make these declarations more (ahem) sane:
data JsonValue = JsonString String
| JsonNumber Double
| JsonObject (Map String JsonValue)
| JsonArray [JsonValue]
| JsonTrue
| JsonFalse
| JsonNull
deriving (Show, Eq)
I think that helped to clarify that String is a pre-defined type, and JsonString is a value constructor that returns something of type JsonValue.

When I gave this presentation again a couple of weeks ago, the discussion around this JSON parser was much less confusing.

Lesson learned: let the compiler and another person read your code to check that it makes sense. ;-)