Wednesday, May 14, 2008

Static vs. Dynamic Languages

One permatopic across programming blogs is the good ol' static-vs-dynamic languages debate. Static languages (like C, C++, C#, C--, Java, etc.) require variables to have type declarations primarily to generate compiled code and secondarily to catch errors or emit warnings at compile time. Dynamic languages (like Perl, Python, PHP, Ruby, Tcl, Lisp, Scheme, Smalltalk, etc.) do away with all those needless keystrokes, allow programmers to be more productive, and generally aren't as dangerous as the C-programmers would have you believe. Furthermore, dynamic language programmers are quick to point out that any real program in a static language requires typecasting, which throws all of the "guarantees" out the window, and bring in all of the "dangers" of dynamic languages with none of the benefits.

However, this debate isn't limited to these opposite points of view. There's another perspective, which I heard from Mark-Jason Dominus about a decade ago: static typing in languages like C and Pascal isn't really static typing! (Correction: Mark actually said that "strong" typing in C and Pascal isn't really strong typing.)

The kind of static typing supported by C, its ancestors and its descendants, dates back to the 1950s and Fortran (or, rather FORTRAN, since lower case letters hadn't been invented yet). Type annotations in these languages are guides to the compiler in how much storage to allocate for each variable. Eventually, compilers began to emit warnings or compiler errors mixing variables of incompatible types (like floating point numbers and strings), because those kinds of operation don't make sense. If the irksome compiler gets in your way, you can always tell the it to shut up and do the operation anyway and maybe let the program crash at runtime. Or maybe silently corrupt data. Or maybe something completely different. None of which makes much sense, but there you go.

Modern static typing, on the other hand, uses a strong dose of the Hindley-Milner type inference algorithm to determine (a) what values a variable may contain and (b) prevent the use of incompatible values in expressions. Furthermore, Hindley-Milner prevents values being cast from one type to another on a whim (unlike C), and thus prevents odd runtime errors through abuse of the type system. This leads to an strong, unbreakable type system that is enforced by the compiler, and prevents a whole class of loopholes allowed by lesser type systems. Because types can be inferred, explicit type annotations aren't always needed, which leads to an interesting property: languages that are safer than "safe" static languages like C and Java, and programs that are free of those troublesome, noisy type declarations, just like dynamic languages.

Use of the Hindley-Milner type system comes to us through ML, and descendants like Haskell, OCaml, SML, F# and the like. It's such a good idea, that it's slowly making its way into new programming languages like Perl 6, Fortress, Factor, Scala, and future versions of C++, C#, and VB.Net.

So, really, the whole static-vs-dynamic languages debate is both misleading and moot. It's misleading, because it's not about static vs. dynamic typing, it's about weak compile-typing vs. runtime-typing. It's moot, because the writing is on the wall, and weakly typed languages are on the way out, to be replaced (eventually) by strongly-typed languages of one form or another. (It also tends to be a meaningless "debate", because it's typically a shouting match between C/C++/Java/etc. acolytes who are uncomfortable with Perl/Python/Ruby/etc., and Perl/Python/Ruby/etc. acolytes who have a strong distaste for C/C++/Java/etc., and neither group has a hope of persuading the other.)

Normally, I would avoid topic entirely, but today, I couldn't resist. Steve Yegge posted a transcription of his talk on dynamic languages, where he covers both sides of the debate, and moves beyond the two-sides-talking-past-each-other and uncovers the core issues. He covers the meat of the discussion, not the typical push-button issues that go nowhere and convince no one. (I think he misses the point behind Hindley-Milner, but that's an insignificant side issue.)

  • Most of the work in compiler optimization in the last few decades has focused on weakly-typed static languages.
  • Dynamic languages are, in general, slower, only because there has been relatively little research interest in making them faster.
  • There are some interesting approaches to optimizing dynamic languages that are now ready for prime time. Much of this comes from Strongtalk, an effort to optimize Smalltalk that led to Java's HotSpot VM.
  • Some of the "benefits" of "static" typing, like refactoring IDEs, aren't as good as they appear at first. Refactoring dynamic languages is different but IDEs could do about as good a job as they do for C++/C#/Java. Refactoring properly is a hard problem that no one has solved completely, yet.
  • "Fast" is relative. C++ used to be the "fast" language because it compiles down to native code. But in a multicore world, C/C++ is becoming slow, because C/C++ programs cannot take advantage of multiple CPUs easily, unlike Java.
  • Dynamic languages of the 1990s (Perl/Python/Ruby/Tcl) don't have a reasonable answer to scaling across CPUs, either.
  • Making Java bytecode run quickly takes more work than you probably realized.
  • Any sufficiently large system built with a static language will need to build in the kind of introspection that's available with a dynamic language and REPL. (Greenspun's Tenth Rule in action)
  • It was easy for the industry to switch programming languages every decade or so, up until 1995. Today, the bar is higher, and the amount of effort invested in existing code makes it much more difficult to migrate to a new language and platform.
  • Google has some of the most brilliant people and language designers in the industry, but they use C++, Java, Python and JavaScript, and none of the esoteric languages the smart kids love. This is because it's more important to standardize on a set of technologies and get stuff done than let the geeks run wild and free. But you probably knew that already. ;-)

Steve's talk is long and rambling, but certainly worth the time read or at least skim through.

12 comments:

Slava Pestov said...

> It's such a good idea, that it's slowly making its way into new programming languages like Perl 6, Fortress, Factor, Scala, and future versions of C++, C#, and VB.Net.

Factor is dynamically typed and doesn't use a Hindley-Milner type system.

Anonymous said...

Agreed 99%. However even ML derivatives would do good to support optional dynamic types for those cases where types depend on external data such as databases or XML files without a schema.

Unknown said...

One small note: Tcl was NOT included in steve's list of languages that have no concurrency answer. Because from that list Tcl is the only one that HAS native threads without global interpreter lock with a sane message passing based threading model. It scales quite fine over your available CPUs if you use it that way (no automagical threading).

Adam Turoff said...

Factor is dynamically typed and doesn't use a Hindley-Milner type system.

My apologies, Slava. I remember reading that you had added type/stack effect inference to Factor, and I must have conflated the two. Corrected.

Anonymous said...

Scala uses local type inference, not Hindley-Milner. You still have to declare function signatures, but you can omit types for local variables.

The proposed type inference for C++0x is local as well.

In general, Hindley-Milner type systems feel like ML. Add anything to the type system, and HM breaks down. Most interesting developments in type systems that I'm currently aware of, such as Typed Scheme, are based on explicit typing (plus local type inference).

shapr said...

Actually, poly-inline caching was developed for Self, not Strongtalk.

shapr said...

For more details, here's the link to the (as far as I know) first poly-inline caching paper: Optimizing Dynamically-Typed Object-Oriented Programming Languages with Polymorphic Inline Caches.

Anonymous said...

That's a cool point that you attributed to me, but honesty compels me to admit that it isn't a point that I have ever made. Or at least not on purpose.

My "Strong Typing Doesn't Have to Suck" talk from 1999 pretty much says the opposite: the type system of FORTRAN is a real type system because it lets the programmer inform the compiler that some variable will be an INTEGER*4 or a REAL*4 or a BOOLEAN*4 or an ARRAY[2] of INTEGER*2 or whatever.

Moritz said...

I follow the Perl 6 development quite closely, and I couldn't find a reference to compile-time type inference in the specs, or any current discussions about it on perl6-language@perl.org.

Actually the type system is duck typing + optional static typing, and Larry maintains the view that it shouldn't become more complicated than that.

It's not necessarily pretty, but it works well so far.

Of course all implementations are free to do any compile time analysis they wont, but there is no guarantee that type inference is carried out as far as possible and compile time errors are thrown.

Adam Turoff said...

Mark, thanks for the link to your talk. That's obviously the one I was thinking about.

What I remember most was the discussion that strong typing as offered by C/Pascal sucks, and the type system to strive for is ML's, because of inferencing of both specific (Int) and general types (lists, 2-tuples).

I've corrected the misstatement above. Sorry for the confusion.

As far as Fortran's type system, I was probably over generalizing there as well. Fortran has a perfectly sane type system for what it does. IIRC, it only uses types to guide code generation, and doesn't allow casts. Adding casts (a la C/Pascal and their descendants) is the mistake, and a small jump away from Fortran. That point could have been made more clearly.

Adam Turoff said...

Moritz,

The reference to Perl6 came from the wikipedia article on type inferencing. The link to Hindley-Milner redirects there, and that page describes a whole raft of languages that offer or will soon offer some form of type inferencing, not necessarily Hindley-Milner.

I didn't realize that the article talks about other forms of type inferencing, which is why I included Factor and Perl6 in the list, and I should have checked the references more thoroughly. I remember some form of type inferencing being discussed in both languages, and a quick skim of the wikipedia article seemed to confirm some usage of H-M, when a more careful reading actually confirms something completely different.

fogus said...

This is a generally nice post. I like to think about these topics from time to time to keep them fresh in my mind. However, it seems to fall into the common trap of mixing the definition of strong and static typing.

But that is the pitfall of the whole perpetual typing debate where these definitions are munged together and used almost interchangeably.
-m