Wednesday, January 31, 2007

Proxies and Delegation vs. Existential Types

Haskell may not be especially object-oriented, but all of the core concepts are still there. They just look a little different.

Consider the following problem: you want to manage multiple kinds of "archive" files (e.g. zip, tar.gz, tar.bz2, .tar, etc.). You want to write one function that returns the most appropriate kind of archive, and functions that work on generic archive objects.

The problem is that the type checker forces you to return a concrete type. Something like this isn't allowed past the type checker:
makeArchive :: (Archive a) => String -> a
Why not? Because makeArchive makes no guarantees on what it's returning. Any function that uses it doesn't really know what type it's getting back. Sure, it looks similar to read:

read :: (Read a) => String -> a
Except that the type checker can infer which version of read you're trying to use. If you're expecting to read an integer, the type checker will infer that you want the version of read that returns an integer. If you're expecting to read some indeterminate type, the type checker will tell you give you an error ("Ambiguous type variable..."):
works = do s <- getLine  
let i = read s
print (i + 1) -- read must provide an Int

doesntWork = do s <- getLine
let i = read s
print i -- no constraint what read provides
The problem is that a function like makeArchive cannot return any instance of a typeclass like Archive, it needs to return a specific type. Another way of looking at it is that you cannot write a version of makeArchive that returns one instance of Archive in one place, and a different instance of Archive in another place. That would be like writing a function that sometimes returns strings, sometimes returns integers, and sometimes returns lists of tuples -- such a function is simply not allowed by the type system.

In object oriented programs, a common solution would be to return a proxy object that delegates all of its operations to some specific object. Fortunately, that's exactly what we need to do in Haskell. All of the necessary types, including the proxy type, will be related to each other because they will be members of the same typeclass:
class Archive a where
getContents :: a -> [String]
...

data ZipArchive = ...
instance Archive ZipArchive where
getContents = ...

data TarArchive = ...
instance Archive TarArchive where
getContents = ...

data TarGzArchive = ...
instance Archive TarGzArchive where
getContents = ...

data TarBz2Archive = ...
instance Archive TarBz2Archive where
getContents = ...
Next, we need to define a proxy type that will wrap any type that is an Archive. To do this, we need to use existential types:
{-# OPTIONS -fglasgow-exts #-}
data ArchiveProxy = forall a. (Archive a) => ArchiveProxy a
instance Archive ArchiveProxy where
getContents (ArchiveProxy a) = getContents a
Now, we can write makeArchive to create a specific concrete type (like ZipArchive, TarArchive, or whatnot), and wrap that in the ArchiveProxy constructor to return something of type ArchiveProxy:
makeArchive :: String -> ArchiveProxy
makeArchive fn
| ".zip" `isSuffixOf` fn = ArchiveProxy (ZipArchive ...)
| ".tar" `isSuffixOf` fn = ArchiveProxy (TarArchive ...)
| ...
The value that's returned by makeArchive is opaque, and can perform any methods that are defined by the Archive typeclass. The benefit is that functions can be written in terms of this generic proxy, but always execute the most specific operation for the archive under consideration.

All of this is explained in the Haskell Wiki under existential types, but that discussion approaches the topic from an academic and mathematical perspective. I suspect many professional programmers would be more comfortable with this object oriented view of the world, using proxies and delegates. The terms are different, but the concept is the same.

3 comments:

TuringTest said...

Your data declaration has only one constructor and it takes only one parameter. Therefore you can use a "newtype" declaration:


newtype ArchiveProxy = forall a. (Archive a) => ArchiveProxy a


The "newtype" creates a distinct type just like "data" but the compiler knows it is really an alias for "forall a. Archive a => a" and after type checking it deletes the type from the program.

So the running program is guaranteed to have zero overhead from constructing/wrapping and destructing/unwrapping the Archive type.

Anonymous said...

I'm copying my comment from Reddit in case you don't frequent that site:

Maybe I'm missing something, but I would do that example as

data Archive = ZipArchive ZipArchive
| TarArchive TarArchive
| TarGzArchive TarGzArchive
| TarBz2Archive TarBz2Archive

makeArchive :: String -> Archive
makeArchive fn
| ".zip" `isSuffixOf` fn = ZipArchive (makeZipArchive fn)
| ".tar" `isSuffixOf` fn = TarArchive (makeTarArchive fn)
| ...

Anonymous said...

To anonymous: That only works if you have an known number of fixed types to begin with. Part of the advantage of using objects is that you can extend functionality at any time by just adding a new class. The URL class in java has a factory that can generate one of many types of URLs given the url string. Anyone can add a new url type (ie- blah:// in addition to http://, ftp:// etc.) and simply register it for use with the system. You need existential types in Haskell to do the same thing.