Wednesday, June 6, 2007

Solving Collatz Sequences

Steve, over at cod3po37ry, is learning Haskell and working through some contest problems, starting with the Collatz sequence. Steve doesn't mention it explicitly, but this is an interesting unsolved problem in mathematics. The problem starts out with a simple function over a single positive integer n:
  • if n is even, halve it
  • if n is odd, triple it and add one
A sequence can be generated by starting with some positive integer n, generating the result of this function, and continually applying this function on the series of results until it produces a value of 1:
f 2 = [2,1]
f 3 = [3,10,5,16,8,4,2,1]
f 4 = [4,2,1]
f 5 = [5,16,8,4,2,1]
f 6 = [6,3,10,5,16,8,4,2,1]
f 7 = [7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]
It's an open question as to whether or not this function always converges on a value of 1 for any positive integer n. Wikipedia claims that this property has been proven up through at least 10 × 258, but it's unclear if it's true for all positive integers.

Steve's solution is a little, um, cumbersome, and he makes it clear that his goal is to learn:
This post follows the Law of Usenet: if you post a question, nobody will answer it. If you post an answer, they'll all rush to correct it. I welcome corrections.
In that spirit, here is my response. :-)

Unlike most languages I have used, Haskell has a very interesting property -- if you find yourself writing a lot of code, chances are you're doing something wrong. Generally, this is because the Prelude and the standard library have a rich set of tools for building solutions from the bottom up. Writing a lot of code usually means that you're ignoring a key abstraction somewhere. Haskell programs tend to be short because they can be short; many common problems have generalized solutions in the library already. If you find a general problem that's not solved in the standard library, implementing a general solution typically involves writing one missing function.

The first step to solving this problem involves the collatz function which implements the interesting 3n + 1 property:
collatz :: Int -> Int
collatz 1 = 1
collatz n = if (odd n)
then (3 * n + 1)
else n `div` 2
Next, there's the function to produce a "collatz sequence" starting with a number n and (hopefully) terminating with the number 1. Fortunately, this behavior is precisely what the iterate function in the Prelude provides:
iterate :: (a -> a) -> a -> [a]
That is, iterate takes a function and a seed value, and returns a list that contains the seed, and an infinite sequence of values produced by applying the function to the previous value.

Therefore, expanding this simple collatz function to a function that produces a collatz sequence is simply:
collatzSequence :: Int -> [Int]
collatzSequence = iterate collatz
Actually, that's not quite correct, since a collatz sequence terminates with a value of 1:
collatzSequence :: Int -> [Int]
collatzSequence = terminate . iterate collatz
where
terminate (1:_) = [1]
terminate (x:xs) = x:terminate xs
Sure enough, this function works as expected:
*Main> collatzSequence 7
[7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]
That solves the problem of generating collatz sequences. But the contest problem was to find the longest collatz sequence within a range of numbers. Given:
1 10
100 200
201 210
900 1000
produce this output:
1 10 20
100 200 125
201 210 89
900 1000 174
Generating these results is simple enough. First, convert a collatz sequence to its length:
*Main> length $ collatzSequence 7
17
Next, convert a range of integers into their collatz lengths:

*Main> map (length . collatzSequence) [1..10]
[1,2,8,3,6,9,17,4,20,7]
Next, pick out the largest sequence in the range:
*Main> maximum $ map (length . collatzSequence) [1..10]
17
Next, consume a line of input, perform this calculation, and produce a line of output:
run :: String -> IO ()
run s = do let (i:j:_) = map read (words s)
m = maximum $ map (length . collatzSequence) [i..j]
putStrLn (concat [show i, " ", show j, " ", show m])
And finally, write the main function that consumes all input and produces the desired output:
main = do inp <- getContents
mapM_ run (lines inp)
Here is the completed program in its entirety:
collatz :: Int -> Int
collatz 1 = 1
collatz n = if (odd n)
then (3 * n + 1)
else n `div` 2

collatzSequence :: Int -> [Int]
collatzSequence = terminate . iterate collatz
where
terminate (1:_) = [1]
terminate (x:xs) = x:terminate xs

run :: String -> IO ()
run s = do let (i:j:_) = map read (words s)
m = maximum $ map (length . collatzSequence) [i..j]
putStrLn (concat [show i, " ", show j, " ", show m])

main = do inp <- getContents
mapM_ run (lines inp)
Hope this helps!