Wednesday, May 14, 2008

Thought of the Day

Also from Steve Yegge's talk:

[W]hen we talk about performance, it's all crap. The most important thing is that you have a small system. And then the performance will just fall out of it naturally.

Static vs. Dynamic Languages

One permatopic across programming blogs is the good ol' static-vs-dynamic languages debate. Static languages (like C, C++, C#, C--, Java, etc.) require variables to have type declarations primarily to generate compiled code and secondarily to catch errors or emit warnings at compile time. Dynamic languages (like Perl, Python, PHP, Ruby, Tcl, Lisp, Scheme, Smalltalk, etc.) do away with all those needless keystrokes, allow programmers to be more productive, and generally aren't as dangerous as the C-programmers would have you believe. Furthermore, dynamic language programmers are quick to point out that any real program in a static language requires typecasting, which throws all of the "guarantees" out the window, and bring in all of the "dangers" of dynamic languages with none of the benefits.

However, this debate isn't limited to these opposite points of view. There's another perspective, which I heard from Mark-Jason Dominus about a decade ago: static typing in languages like C and Pascal isn't really static typing! (Correction: Mark actually said that "strong" typing in C and Pascal isn't really strong typing.)

The kind of static typing supported by C, its ancestors and its descendants, dates back to the 1950s and Fortran (or, rather FORTRAN, since lower case letters hadn't been invented yet). Type annotations in these languages are guides to the compiler in how much storage to allocate for each variable. Eventually, compilers began to emit warnings or compiler errors mixing variables of incompatible types (like floating point numbers and strings), because those kinds of operation don't make sense. If the irksome compiler gets in your way, you can always tell the it to shut up and do the operation anyway and maybe let the program crash at runtime. Or maybe silently corrupt data. Or maybe something completely different. None of which makes much sense, but there you go.

Modern static typing, on the other hand, uses a strong dose of the Hindley-Milner type inference algorithm to determine (a) what values a variable may contain and (b) prevent the use of incompatible values in expressions. Furthermore, Hindley-Milner prevents values being cast from one type to another on a whim (unlike C), and thus prevents odd runtime errors through abuse of the type system. This leads to an strong, unbreakable type system that is enforced by the compiler, and prevents a whole class of loopholes allowed by lesser type systems. Because types can be inferred, explicit type annotations aren't always needed, which leads to an interesting property: languages that are safer than "safe" static languages like C and Java, and programs that are free of those troublesome, noisy type declarations, just like dynamic languages.

Use of the Hindley-Milner type system comes to us through ML, and descendants like Haskell, OCaml, SML, F# and the like. It's such a good idea, that it's slowly making its way into new programming languages like Perl 6, Fortress, Factor, Scala, and future versions of C++, C#, and VB.Net.

So, really, the whole static-vs-dynamic languages debate is both misleading and moot. It's misleading, because it's not about static vs. dynamic typing, it's about weak compile-typing vs. runtime-typing. It's moot, because the writing is on the wall, and weakly typed languages are on the way out, to be replaced (eventually) by strongly-typed languages of one form or another. (It also tends to be a meaningless "debate", because it's typically a shouting match between C/C++/Java/etc. acolytes who are uncomfortable with Perl/Python/Ruby/etc., and Perl/Python/Ruby/etc. acolytes who have a strong distaste for C/C++/Java/etc., and neither group has a hope of persuading the other.)

Normally, I would avoid topic entirely, but today, I couldn't resist. Steve Yegge posted a transcription of his talk on dynamic languages, where he covers both sides of the debate, and moves beyond the two-sides-talking-past-each-other and uncovers the core issues. He covers the meat of the discussion, not the typical push-button issues that go nowhere and convince no one. (I think he misses the point behind Hindley-Milner, but that's an insignificant side issue.)

  • Most of the work in compiler optimization in the last few decades has focused on weakly-typed static languages.
  • Dynamic languages are, in general, slower, only because there has been relatively little research interest in making them faster.
  • There are some interesting approaches to optimizing dynamic languages that are now ready for prime time. Much of this comes from Strongtalk, an effort to optimize Smalltalk that led to Java's HotSpot VM.
  • Some of the "benefits" of "static" typing, like refactoring IDEs, aren't as good as they appear at first. Refactoring dynamic languages is different but IDEs could do about as good a job as they do for C++/C#/Java. Refactoring properly is a hard problem that no one has solved completely, yet.
  • "Fast" is relative. C++ used to be the "fast" language because it compiles down to native code. But in a multicore world, C/C++ is becoming slow, because C/C++ programs cannot take advantage of multiple CPUs easily, unlike Java.
  • Dynamic languages of the 1990s (Perl/Python/Ruby/Tcl) don't have a reasonable answer to scaling across CPUs, either.
  • Making Java bytecode run quickly takes more work than you probably realized.
  • Any sufficiently large system built with a static language will need to build in the kind of introspection that's available with a dynamic language and REPL. (Greenspun's Tenth Rule in action)
  • It was easy for the industry to switch programming languages every decade or so, up until 1995. Today, the bar is higher, and the amount of effort invested in existing code makes it much more difficult to migrate to a new language and platform.
  • Google has some of the most brilliant people and language designers in the industry, but they use C++, Java, Python and JavaScript, and none of the esoteric languages the smart kids love. This is because it's more important to standardize on a set of technologies and get stuff done than let the geeks run wild and free. But you probably knew that already. ;-)

Steve's talk is long and rambling, but certainly worth the time read or at least skim through.