Sunday, February 18, 2007

Haskell: Raising the bar

Last week, Steve Yegge wrote up his thoughts on the key features a language needs to provide today in order to be "great". It's a grab-bag of holy grails, and reads like a new interpretation of Paul Graham's hundred year language.

Paul's essay, now almost 4 years old, is a long-term vision of how Lisp is the ultimate language, or at least how the ultimate language will appear lispy to a Lisp hacker. Steve's essay focuses on the near-term practical issues, like a C-like syntax as a pre-requisite for widespread adoption, minimum performance requirements, and platform agnosticism. Steve also lists a grab bag of language features that any new language needs to provide today to "not suck":
  1. Object-literal syntax for arrays and hashes
  2. Array slicing and other intelligent collection operators
  3. Perl 5 compatible regular expression literals
  4. Destructuring bind (e.g. x, y = returnTwoValues())
  5. Function literals and first-class, non-broken closures
  6. Standard OOP with classes, instances, interfaces, polymorphism, etc.
  7. Visibility quantifiers (public/private/protected)
  8. Iterators and generators
  9. List comprehensions
  10. Namespaces and packages
  11. Cross-platform GUI
  12. Operator overloading
  13. Keyword and rest parameters
  14. First-class parser and AST support
  15. Static typing and duck typing
  16. Type expressions and statically checkable semantics
  17. Solid string and collection libraries
  18. Strings and streams act like collections
Steve's post led to a lot of speculation that $YOUR_FAVORITE_LANGUAGE is actually poised to be the next big language. I'm not going to comment on that here, except to say that Haskell doesn't meet some of Steve's criteria. Haskell isn't especially object-oriented, and the programming community at large is still quite comfortable with objects, uneasy with functional programming, and downright hostile to monads and typeclasses.

Although Haskell may not be Steve's "next big language", it is certainly close to Paul's "hundred year language", or at least on a path leading to the hundred year language. In other words, Haskell may not be important to the mainstream today, or even within the next five years, but it will certainly prove to be important to the industry as a whole.

Steve's focus is primarily on language issues. Paul talks a little about both language and implementation issues. Here is a list of implementation characteristics necessary for a new language to "not suck":

An Open Source Implementation: "Open Source" as a concept label is 10 years old this year, and the advantages of open source (given an active, functioning community) are pretty obvious. The pressure to go open is now hitting Java, and helping JavaScript settle down and grow. The economics aren't in favor of a new, closed language becoming big. Closed languages like K will continue to pop up from time to time, but they will always serve niche markets and never become "big".

A Foreign Function Interface: One of the main advantages of being open source is the ability to reuse code. However, a huge body of code already exists, and isn't written (and won't be rewritten) in a "next big language". Integrating with low level libraries like database drivers, crypto toolkits, XML parsers and graphics processing toolkits are best done using interfaces to native code. Writing glue code that mediates between a language runtime and native code really isn't a productive use of time; it's better to just write declarations in the target language, and let the runtime (or compiler) figure out the rest.

A Big Library: One of the reasons why Java gained widespread popularity in the late 1990s is because development was moving to the web, and Java provided network libraries (including HTTP) in the core environment. Today, a popular language needs libraries that provide HTTP and related protocols, XML and HTML parsing, scripted web client libraries (like WWW::Mechanize), popular database drivers, and a database-agnostic modelling library (like Perl's DBI or HDBC). If these libraries aren't available by default, they should be easily installed through a package management system.

A Package Management System: It's cliché to talk about how Perl's greatest strength is CPAN. But CPAN isn't just a well-structured repository of modules, it's also a utility to install modules from the repository. Thankfully, many language communities have learned the lesson that you need both: Ruby has rubygems and gem, and Haskell has Cabal and cabal-get. The next big language will need to follow the trend and provide both a common repository of libraries, and a means to install them easily.

A Compiler: Steve mentions that the next big language should probably work on both the JVM and .Net, which implies at least two bytecode compilers. But if you haven't chosen a VM for your project yet, you probably want to keep avoiding them. Compiling down to native code (perhaps via C, perhaps using an abstract model like the G-Machine) will certainly help those who don't want to download (and maintain) a big VM just to run your program.

An Interpreter: Paul points out that having an interpreter gives developers flexability, at least during development. Perl, Python and Ruby programmers all know that deployment with an interpreter isn't so bad these days, either. Not having an interpreter makes programming in C (and C++, and Java, and C#) a real PITA. Firing up an interpreter to play with a library or a module really helps answer questions (and isolate bugs) quickly.

A Source-level Debugger: Sometimes, the easiest way to see what a chunk of code does is to step through it line by line. Tools like perl -d and gdb can remove all doubt by showing you what happens inside a running program, without modifying it by adding print statements everywhere.

In practice, you probably don't need both an interpreter and a source level debugger. One or the other should suffice. (Perl doesn't have an "interpreter" per se, but the debugger is often invoked as an interpreter: perl -de0 does the trick.) Having both will help a language become the "next big language," especially when many programmers are coming from environments that have debuggers, but no interpreters.

A Profiler: As Paul points out, it's best to get a program right, then make it fast. Ideally, a program could be written using simple but inefficient data structures. If that's fast enough, then fine. If not, then perhaps a few type declarations could be added to swap out inefficient data structures for more efficient ones. If that doesn't work, then a few hot spots could be optimized by rewriting code. However, this rewriting should be driven by performance data, not dubious dictums like "avoid bignums", or other such nonsense.

A Good Concurrency Model: Steve's list of features are things a language must provide to not suck. Erlang-style concurrency isn't on the list of necessary features, but is on his short list of features needed to make the next big language "great". Paul thinks the ultimate language needs to have a concurrency model deeply ingrained into it, but something that you ask for specifically as an optimization. Paul is right on this one; multi-core CPUs are here and getting denser. Any software we write today will probably be running in some kind of multi-core/multi-cpu architecture. If there's room for a new mainstream programming language today, it needs to get concurrency right, or at least better than raw threads. If not, we'll have to wait for the "next next big language" to fix it.

Open Classes: Language designers may be fantastically brilliant individuals, but they generally aren't clairvoyant. A language designer may create a perfect set of fundamental types, and miss a few operations on those types. Java's designers, for example, made strings a little too complex, and completely botched the date and time types. (To be fair, date and time types are almost always botched.) In order for a language to evolve, types need to evolve, including the fundamental ones. Ruby allows this kind of evolution via monkey patching. The next big language should allow this kind of evolution, ideally scoped to a single module rather than changed globally throughout an entire program.

If you read between the lines, GHC delivers most of these features today. This is a testament to the abilities and foresight of the GHC team, as well as the functional programming community as a whole. (Many of these features have been available in various Lisps for decades. Some are available today, with less lengthy histories.)

Haskell in general, and GHC in particular is a little short in some areas. There isn't a big library of Haskell code yet, but Cabal is vibrant and growing. Source level debugging is still a holy grail, but work on the GHCi debugger is progressing. (Thanks, mnislaih!)

Interestingly, the way Haskell implements "open classes" is lexically scoped by default. There's nothing stopping you from adding new numeric types (like vectors or sparse matrices), or even declaring strings to be numeric. It's actually much cleaner than monkey patching in Ruby.

On top of all that, Haskell does provide many of Steve's necessary language features:
  • Destructuring bind (e.g. x, y = returnTwoValues())
  • Function literals and first-class, non-broken closures
  • List comprehensions
  • Namespaces and packages
  • Cross-platform GUI
  • Operator overloading
  • First-class parser and AST support
  • Type expressions and statically checkable semantics
  • Solid string and collection libraries
  • Strings and streams act like collections
Add that up, and while Haskell may not be the next big language (for the mainstream), it's on the road towards the hundred year language, and it certainly doesn't suck.


Harald Korneliussen said...

The last point, about "open classes", is worth noticing, especially as it concerns strings. Haskell, I maintain, initially got strings wrong. Strings as lists, like Erlang does it, just doesn't seem to be a good idea for anything but education :-/ ... You have to switch them to another representation in order to do just about any reasonably fast operation, and then there is the pain of switching back every time you need to use a function that takes a regular string, for instance to open a file.
But strings as linked lists is not the only way to get strings wrong. Take a look at Ada, for instance. It's either using the default strings with severely limited sizes and serious restrictions on how they can be passed to functions, or using Ada.Strings.Unbounded and doing the switching dance every time you need to print them, use them as a file name etc.
I _don't_ think Haskell has managed to work around this, the last point on the list. At least not yet. It has much of the framework in place, though. If String was a typeclass instead of a type, and you could specify in the module header that all literal strings in this module are ByteStrings, for instance, then the flexibility would be there, and evolution of the basic types would be possible. It would perhaps be a good idea for the other literals as well - I hear there are issues about defaulting to Int.

Neil Bartlett said...

Nice post, but I have just one quibble... surely "Open Source" is far more than 10 years old?

Stallman published the GNU Manifesto in 1985, and I'm sure there were things going on before that which could be called "Open Source", even if they weren't called that at the time. So as a concept it's more than 20 years old.

Adam Turoff said...

surely "Open Source" is far more than 10 years old?

The open source definition dates back to a discussion in 1997.

The benefits of an open source language implementation weren't exactly clear before the open source movement got its foothold. I was trying to make that connection, not a connection to the deeper ideals of free software.

Calling it a "concept" doesn't really make that clear. Updated to refer to the age of the label. Thanks. :-)

augustss said...

The HEAD version of GHC has overloaded string literals, so you can use the usual syntax for String, ByteString, or anything else you declare being in class IsString.