Wednesday, January 31, 2007

Ruby vs. Haskell: Choose what works

I'm working on a Ruby on Rails project for $WORK at the moment. The bulk of the work involves modeling a large corpus of XML documents, extracting metadata, and providing a browse-and-edit interface for that metadata. Standard webapp. Nothing special.

Except that the work I'm focused on is on the workflow system, the part that actually reads documents, parses the metadata, and shoves it into the database.

For the front end, it makes sense that the application uses the full Rails stack: ActiveRecord, ActiveController and the like. It's just a webapp, and the CPU overhead of running the application code in Ruby doesn't matter when compared to the overhead of talking to the database and the client. The advantages of using Ruby on Rails far outweigh the meager amount of hardware it requires.

However, the backend is all about throughput, and the added overhead of running Ruby is quite noticeable. Doubly so because it uses Rails' ActiveRecord models, which spend a lot of time doing reflection on the fly. Yet none of that is really necessary; all the backend is doing is reading the filesystem, parsing XML, and stuffing it into a database. The logic for all three parts is pretty static, and there's no real benefit to using a highly dynamic runtime, especially when the cost in CPU overhead is quite noticable.

In the bad old days, problems like this had one solution: rewrite the code in C.

Today, there's an alternative: rewrite the code in Haskell.

The goal then, as now, is to remove the interpreter overhead from a long running, high throughput process. In the bad old days, that meant leaving features like polymorphism and garbage collection behind, and dealing with segfaults and memory leaks just to get something close to acceptable performance.

Today, none of those tradeoffs are necessary. Haskell can compile down to machine code, and eliminates the interpreter overhead (Thank You, Thank You, Thank You, Simon Peyton Jones!) . But it also provides garbage collection, and polymorphism. It's the best of both worlds: a high level language that compiles down to a native executable. Additionally, once a program gets past the type checker, the compiler pretty much guarantees that segfaults go away. Time and space leaks can still occur, but the profiler can help isolate the misbehaving code with a little effort.

The last time I came across a problem like this, nearly a decade ago, there really was no way out. Rewriting a slow Perl program in C was generally seen as distasteful, tedious, error-prone, but necessary. Today, rewriting a slow Ruby program in Haskell is one of many options available, and certainly isn't that distasteful or tedious anymore. Perhaps in another decade, there will be more alternatives. Perhaps this kind of problem will just go away. Who knows.

While at Rails Edge, I mentioned this strategy to a few people, all of whom were working on Rails apps. No one disagreed that Rails can be a poor choice for backend processing, because of the high CPU overhead and low throughput. Rewriting a Ruby app in Haskell certainly seemed to make sense to everyone I talked to, at least in this one particular scenario.

8 comments:

Anonymous said...

You might also think about doing the work in Java. Among other things, you get smoking XML performance, correct Unicode support, and you can integrate your favorite Ruby bits via JRuby. (Not that I'd dissuade you from doing it in Haskell, but Java is looking more and more like an excellent adjunct to Ruby and vice versa.)

Kartesus said...

You can stay with Ruby using Nitro (http://www.nitroproject.org). You can go with python that compiles to machine code too. But if you like Haskell you can obvously rewrite to it too ;)

Keith Lancaster said...

Makes perfect sense to me - in fact I've been pondering both the use of compiled Haskell as a backend as well whether it would be possible to call Haskell code directly from my Rails/Ruby apps.

Anonymous said...

i think it is funny you mention ActiveRecord on the backend and then mention that you will just be doing file and xml operations on the data - why mention it, if you are not going to use it? that's a nice bait and switch premise.

you would have had a more convincing story if you had coded it in ruby and then told us it wasn't fast enough after optimization. and why haskell? java is the defacto server side language - why did you deviate - is there something wrong with java? now your customer has 2 languages that are not mainstream to support.

filippo said...

Just wondering.
How does the Ruby front end interact with the haskell back end?
Only through the database?

gnupate said...

Any thought of working on RubyInline::Haskell? That could be a fun way to bring the two languages closer together.

Jon Harrop said...

OCaml is also an excellent language and the derived language F# gives you access to .NET's XML parsing and Unicode support whilst probably being a lot faster than Haskell (and definitely Ruby, Python etc.).

Have a look at F#.

Tim Watson said...

Hmn,
Yes F# is pretty cool - given it's an OCaml descendant, that's hardly surprising! ;)

F# does give you access to .NET (the runtime, APIs, etc) and you can presumably integrate Ruby code using IronRuby (though how mature that is right now, I'm not sure).

I'd hazard a guess that our host is probably now able to choose a microsoft technology (such as F#) over something with *slightly lower costs*, e.g. RoR. Yes, there are free IDEs & tools and yes I know all about mono (which is v. cool by the way), but for most peeps, choosing .NET means choosing windows, both of which are often political choices more than technological.

Both OCaml and Haskell have decent foreign function interfaces (OCaml's is good but a bit clunky, although ocamlidl helps minimize the pain!) and Haskell's is excellent and quite simple to follow. Because of these two points, interfacing between ruby and OCaml and/or Haskell is possible.

But wait! Didn't we want to get the heck away from using 'C' to write code! Arrgghhh! :P

So my suggestion is this: leave your web front end alone, providing it works. Put something on the back end that is fast and reliable and interface to it *simply* - this probably means *not* using FFI and the like. Pipes or Sockets would make a good choice, unless you're extremely worried about traffic - which I doubt because you're running Rails! :P

OCaml has some particularly nice IO features that could help with this - including functions that operate on (unix) file descriptors or sockets alike, using bufferred I/O. In fact, I suspect it wouldn't be at all hard to make (* some Ocaml *) look like a file descriptor to Ruby, giving you a simple touch point.

Paul's point about java is a good one. You might now like it, but it's library support is awesome, expecially where things like xml/xslt are concerned.

This point bares some repeating I think. I'm writing some server side *stuff* at the moment using Erlang (with bits of OCaml floating around beside it), and the lack of decent library support is a pain, expecially where it comes to xml/xslt processing - which I really need - because there isn't any; at least not that can stand to work with [e.g. xmerl - yuk].This fact means that I'm stuck writing FFI code in C/C++ so that my erlang code can call into libxslt. Not an enjoyable way to spend ones time (although it is - educational, let's say).

Think hard about library support pertinent to your problem domain, before rushing into any decisions.

Having said that - I love Haskell and would probably choose it in a pinch!