Wednesday, August 15, 2007

Does Syntax Matter?

An anonymous commenter on yesterday’s post posits that Haskell won’t become mainstream because of the familiar pair of leg irons:
I think one of the biggest problems in Haskell, aside from it not being very easy (whats a monad?), is syntax.
There are many reasons why Haskell may not become mainstream, but syntax and monads aren’t two of them. I’m a populist, so I get offended when a language designer builds something that’s explicitly designed to drag masses of dumb, lumbering programmers about half way to Lisp, Smalltalk, or some other great language. I want to use a language built by great language designers that they themselves not only want to use, but want to invite others to use.

I could be wrong here. Maybe being a ‘mainstream programming language’ is means designing something down to the level of the great unwashed. I hope not. I really hope not. But it could be so. And if it is, that’s probably the one and only reason why Haskell won’t be the next big boost in programming language productivity. That would also disqualify O’Caml, Erlang and perhaps Scala as well. Time will tell.

But syntax? Sorry, not a huge issue.

Sure, C and its descendants have a stranglehold on what a programming language should look like to most programmers, but that’s the least important feature a language provides. Functional programmers, especially Lisp hackers have been saying this for decades. Decades.

A language’s syntax is a halfway point between simplifying the job of the compiler writer and simplifying the job of the programmer. No one is going back and borrowing syntax from COBOL, because it’s just too damn verbose and painful to type. C is a crisp, minimal, elegant set of constructs for ordering statements and expressions, compared to its predecessors.

Twenty years ago, the clean syntax like C provided made programming in all caps in Pascal, Fortran, Basic or COBOL seem quaint. Twenty years from now, programming with curly braces and semicolons could be just as quaint. Curly braces and semicolons aren't necessary, they're just a crutch for the compiler writer.

To prove that syntax doesn’t matter, I offer 3 similar looking languages: C, Java (or C#, if you prefer) and JavaScript. They all use a syntax derives from C, but they are completely separate languages. C is a straight forward procedural language, Java is a full blown object oriented language (with some annoying edge cases), and JavaScript is a dynamic, prototype-based object oriented language. Just because a for loop looks the same in these three languages means absolutely nothing.

Knowing C doesn’t help you navigate the public static final nonsense in Java, nor does it help you understand annotations, inner classes, interfaces, polymorphism, or design patterns. Going backward from Java to C doesn’t help you write const-correct code, or understand memory allocation patterns.

Knowing C or Java doesn’t help much when trying to use JavaScript to its full potential. Neither language has anything resembling JavaScript’s dynamic, monkeypatch everything at runtime behavior. And even if you have a deep background in class-based object oriented languages, JavaScript’s use of prototypes will strike you as something between downright lovely and outright weird.

If that doesn’t convince you, consider the fact that any programmer worthy of the title already uses multiple languages with multiple syntaxes. These typically include their language of choice, some SQL, various XML vocabularies, a few config file syntaxes, a couple of template syntaxes, some level of perl-compatible regular expressions, a shell or two, and perhaps a version or two of make or a similar utility (like Ant or Maven).

Add that up, and a programmer can easily come across two dozen different syntaxes in a single project. If they can’t count that high, it’s not because they do all their work in a single syntax[1], but because it takes too much effort to stop and count all of the inconsequential little syntaxes. (Do Apache pseudo-XML config files count as a separate syntax? Yeah, I guess they do. It took that Elbonian consultant a day to track down a misconfigured directive last year…)

So, no, Mr. Anonymous. Haskell’s syntax isn’t a stumbling block. You can learn the basics in an afternoon, get comfortable within a week, and learn the corner cases in a month or two.

Now, as for monads - the problem with monads is that they seem harder to understand than they really are. That is, it is more difficult to explain what a monad is than it is to gain a visceral understanding of what they do. (I had this same problem when I was learning C — it was hard to believe that it was really that simple.)

If you caught my introduction to Haskell on ONLamp (parts 1, 2 and 3), you may have seen this tidbit right before the end of part 3:
[M]onads enforce an order of execution on their statements. With pure functions, sub-expressions may be evaluated in any order without changing their meaning. With monadic functions, the order of execution is very important.
That is, monads allow easy function composition that also ensures linear execution, much like you would expect from writing a series of statements within a function in C, a method in Java, or a block of Javascript. There are other interesting properties of monads, but this is the most fundamental.

[1]: Lisp and Smalltalk programmers might honestly count one single syntax for all their work. :-)


Anonymous said...

Here's the thing you're missing: Syntax "doesn't matter" in the sense that it's easy to parse something relatively unsurprising. Lisp ("everything must be prefix parenthesized") and Smalltalk ("precedence? for the not-1337!") get this wrong by being dogmatic. Haskell gets it right by being more pragmatic.

Syntax *does* matter in the sense that it's easy to make it a non-issue, and really annoying to refuse to do so because of some personal dogma.

Michael Nischt said...

I used to code in Haskell at the university and liked it very much...

IMHO readable code one of the most important things and a reason why pure C and Java are still that popular.

Lately, I gave erlang a try* and I believe it is even easier to learn than haskell - the syntax it feels easier to read...

anyway, any functional language without a shared memory model would be a great improvement.

*you know, with all the hype, who can resist ;-)

Tom Moertel said...

I tend to judge syntaxes not by how they look but by how much they charge me to write my code. So, for me at least, syntax does matter: each syntax implies a programming cost (one might call it a tax).

XML, for example, is a rather "expensive" way of representing serialized data structures and configuration files (but it's fine for text). JSON and YAML are alternative syntaxes that offer much lower costs for these applications.

Fortunately, Haskell's syntax tax is impressively low. Even people who initially don't like the look of Haskell tend to warm up to the syntax after working with it on a project or two.

Cheers! --Tom

igouy said...

anonymous said Syntax "... doesn't matter" in the sense that it's easy to parse something relatively unsurprising. ... and really annoying to refuse to do so because of some personal dogma.

The easiest way not to be surprised is to not look at anything new.

Smalltalk unary, binary and keyword method precedence is utterly consistent and trivial to learn.

The problems newbies had with Smalltalk were firstly problems understanding basic OO, and then understanding the structure of programs expressed as a bundle of cooperating objects - "where's the program?"

Vagif Verdi said...

Syntax matters of course - just look at Perl :))
If seriously - look at all the rave over DSLs - were syntax matters most.

And haskell has much better syntax than any "mainstream" language.
So why downplay such an advantage ?
I think it is time to realize - haskell is past defense stage. It gained enough popularity amongst programmers, to stop being so shy about its advantages.

To anonymous: i'd rather spend a year writing-reading lisp than a day writing-reading java.

Anonymous said...

@vagif: I'd also prefer the Lisp, but because of semantics, not syntax. Besides, Java's syntax is bletcherous enough..

@igouy: AFAICT the only reason to have binary messages at all is for their half-ass imitation of infix math. But if a language is going to do such a lousy job, why bother at all? I'd rather see them left out entirely than setting a trap for the unwary by less-than-half-solving some problem. (Full disclosure: Smalltalk strikes me as one of the most dogmatic single-paradigm languages around, and I find that tooth-grindingly painful.)

-- (same anonymous)

Anonymous said...

Personally, while I love its semantics, I do have some deeper issues with Haskell's syntax other than the trivial "oh but it doesn't look like C" kind.

Part of it's syntax, but part of it is the common usage patterns encouraged by the syntax:

* Firstly, the ability to create custom operators with custom precendences can make haskell code quite daunting to read.

Increasingly haskell code I see turns to creating its own three-or-four-ascii-symbol-long operators - which are frankly horrible, cryptic and ugly to read.
>>>, <<<, .==., -->, <<-, etc. Line noise!

One's then often required to remember their associativity and precedence when mentally parsing the code - which again is hard.

It makes haskell look like poorly-typeset and cryptic mathematical expressions, rather than program code.

I totally recognise that this is a matter of coding style - but I feel haskell's syntax encourages it and the haskell community does too little to discourage this kind of use.

* The tendency towards a plethora of short, cryptic variable names, and to some extend function names too.

Again a matter of coding style, but again something I feel is encouraged by haskell's syntax and where too little is done by the community to discourage it.

* The fact that, in most other languages, common ascii symbols are reserved for "punctuation" style use to clarify the structure of the program in a helpful and standard way. Especially the structure of function calls. While haskell's straightforward applicative style has a certain elegance, it doesn't in my opinion contain enough 'punctuation'-type syntactic elements to be easily readable in many cases.

* Haskell code is frequently just too dense to be easily readable. In a way this is an advantage - you can say an awful lot in just a few lines - but unpacking those few lines mentally later on can be hard work, and I think this is a trade-off which many Haskellers make poor choices on.

Josef said...

I agree with everything you say about syntax and it's importance to a language. Yet, I still think you're wrong when it comes to the adoption of a new language. As you say C, Java and Javascript are three very different languages and just because their syntax is similar doesn't mean that programming in them is the same. But the point is *Joe Programmer doesn't understand this*. When he sees C like syntax he think "Oh, I know that". There are many programmers who don't think further than the level of syntax, even I was at that level once. Hence when they see Haskell they think "Weird". That is the importance of syntax. I'm sad to say.

Of course there are also quite a few programmers who understand the difference between syntax and semantics and these are typically the good programmers. Hopefully they can help educate those who understand less.

Tac said...

I agree with the anon above. Operator overloading is nice, but it should be used with extreme hesitation. Operators are harder to read because they both obscure function name and order of operation (off the top of your head, what is the precedence of +++?)

I also have issues with people spelling out lists like
[ 1
, 2
, 3 ].

It just looks gross! (I do mostly Python btw, can you tell?)

I think Haskell could use a well thought-out style guide in the near future.

Anonymous said...

Complaining about haskell's expressive operator overloading model, sounds like you're from a Java background or mindset.

The whole point of haskell's flexiablity of operator overloading is for aiding in the development of Domain Specific Embedded Languages (DSELs), getting as close to the ideal abstraction as you could possible get without the need of a macro syntax extension systems.

You should try to understand the significance of this.

Modern C++ generic libraries try there best to achieve this but C++'s operator overloading model is limited and rigid. Do you understand it's an absolute pain thus library authors have to resort to discusting hackery to achieve what haskell does elegantly and effortlessly.

So basically you asking the haskell community to "discourage" the writing of DSELs that makes a lot of sense...

About memorizing associativity & precedence your point is moot for a number of reasons.

Lets start with the obvious reason, if somebody really can't remember or doesn't know the associativity and/or precedence of a particular operator overload they can look at the type signature of that operator/function, there are various ways of getting the type signature even when none are explicitly provided/defined (haskell is type inferred language).

DSELs are typically used by people who have domain knowledge or even are domain expects in that field/subject and hence will instantly recognize what the symbol means and what it's associativity & precedence is/should be.

Furthermore if there are no such direct mappings with the problem domain, symbols/operators and there associativity & precedence are choosen with common sense and intuition applied.

No one is silly enough to overload an operator typically left-associative with right associativity if they can avoid it, that wouldn't very intuitive would it.

Anonymous said...

Slightly aside from the main point;

3:[M]onads enforce an order of execution on their statements. ... With monadic functions, the order of execution is very important.

Actually, this isn't true about monads in general. But it is a side effect of what you think you see...with most monads you use, there is an order of execution enforced on their statements - but this comes through data dependence - Failure (Maybe/Either), State transformation, IO etc all use data dependence to ensure the ordering.

However this isn't something fundamental about monads. The (lazy) Reader and (->)) monad (and maybe Cont) don't enforce any orderings on their statements other than by what is forced as per normal haskell.