Wednesday, June 6, 2007

Solving Collatz Sequences

Steve, over at cod3po37ry, is learning Haskell and working through some contest problems, starting with the Collatz sequence. Steve doesn't mention it explicitly, but this is an interesting unsolved problem in mathematics. The problem starts out with a simple function over a single positive integer n:
  • if n is even, halve it
  • if n is odd, triple it and add one
A sequence can be generated by starting with some positive integer n, generating the result of this function, and continually applying this function on the series of results until it produces a value of 1:
f 2 = [2,1]
f 3 = [3,10,5,16,8,4,2,1]
f 4 = [4,2,1]
f 5 = [5,16,8,4,2,1]
f 6 = [6,3,10,5,16,8,4,2,1]
f 7 = [7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]
It's an open question as to whether or not this function always converges on a value of 1 for any positive integer n. Wikipedia claims that this property has been proven up through at least 10 × 258, but it's unclear if it's true for all positive integers.

Steve's solution is a little, um, cumbersome, and he makes it clear that his goal is to learn:
This post follows the Law of Usenet: if you post a question, nobody will answer it. If you post an answer, they'll all rush to correct it. I welcome corrections.
In that spirit, here is my response. :-)

Unlike most languages I have used, Haskell has a very interesting property -- if you find yourself writing a lot of code, chances are you're doing something wrong. Generally, this is because the Prelude and the standard library have a rich set of tools for building solutions from the bottom up. Writing a lot of code usually means that you're ignoring a key abstraction somewhere. Haskell programs tend to be short because they can be short; many common problems have generalized solutions in the library already. If you find a general problem that's not solved in the standard library, implementing a general solution typically involves writing one missing function.

The first step to solving this problem involves the collatz function which implements the interesting 3n + 1 property:
collatz :: Int -> Int
collatz 1 = 1
collatz n = if (odd n)
then (3 * n + 1)
else n `div` 2
Next, there's the function to produce a "collatz sequence" starting with a number n and (hopefully) terminating with the number 1. Fortunately, this behavior is precisely what the iterate function in the Prelude provides:
iterate :: (a -> a) -> a -> [a]
That is, iterate takes a function and a seed value, and returns a list that contains the seed, and an infinite sequence of values produced by applying the function to the previous value.

Therefore, expanding this simple collatz function to a function that produces a collatz sequence is simply:
collatzSequence :: Int -> [Int]
collatzSequence = iterate collatz
Actually, that's not quite correct, since a collatz sequence terminates with a value of 1:
collatzSequence :: Int -> [Int]
collatzSequence = terminate . iterate collatz
where
terminate (1:_) = [1]
terminate (x:xs) = x:terminate xs
Sure enough, this function works as expected:
*Main> collatzSequence 7
[7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]
That solves the problem of generating collatz sequences. But the contest problem was to find the longest collatz sequence within a range of numbers. Given:
1 10
100 200
201 210
900 1000
produce this output:
1 10 20
100 200 125
201 210 89
900 1000 174
Generating these results is simple enough. First, convert a collatz sequence to its length:
*Main> length $ collatzSequence 7
17
Next, convert a range of integers into their collatz lengths:

*Main> map (length . collatzSequence) [1..10]
[1,2,8,3,6,9,17,4,20,7]
Next, pick out the largest sequence in the range:
*Main> maximum $ map (length . collatzSequence) [1..10]
17
Next, consume a line of input, perform this calculation, and produce a line of output:
run :: String -> IO ()
run s = do let (i:j:_) = map read (words s)
m = maximum $ map (length . collatzSequence) [i..j]
putStrLn (concat [show i, " ", show j, " ", show m])
And finally, write the main function that consumes all input and produces the desired output:
main = do inp <- getContents
mapM_ run (lines inp)
Here is the completed program in its entirety:
collatz :: Int -> Int
collatz 1 = 1
collatz n = if (odd n)
then (3 * n + 1)
else n `div` 2

collatzSequence :: Int -> [Int]
collatzSequence = terminate . iterate collatz
where
terminate (1:_) = [1]
terminate (x:xs) = x:terminate xs

run :: String -> IO ()
run s = do let (i:j:_) = map read (words s)
m = maximum $ map (length . collatzSequence) [i..j]
putStrLn (concat [show i, " ", show j, " ", show m])

main = do inp <- getContents
mapM_ run (lines inp)
Hope this helps!

5 comments:

Anonymous said...

And, of course, your list-generating function can be written using 'unfoldr':

collatzSequence :: Int -> [Int]
collatzSequence = unfoldr collatz
   where
         collatz 0 = Nothing
         collatz 1 = Just (1, 0)
         collatz x = if odd x
                     then Just (x, 3 * x + 1)
                     else Just (x, x `div` 2)

Since zero cannot be generated by collatz, I used it as the terminating value for 'unfoldr'. Sorry for the poor formatting; I couldn't use the <pre> tag.

Keep up the good work! This is an informative blog...

Unknown said...

Small world. I'm going through the same list of problems in an effort to pick up Haskell. I like the iterate/unfoldr solutions more than the one I came up with; not perverting the collatz function such that it returns a list is much more clear. Thanks for posting this! Very helpful.

Luke said...

Nice code. I'm still learning Haskell as well, and don't remember seeing "$" before; I'll have to see if I can use it to make my code less cumbersome.

Couple of nit-picks:
* I've always thought for some reason that the sequence ends in a 3-cycle:
1, 4, 2, 1, 4, 2 ...
(see the Wikipedia page). With this assumption, you can remove the pattern match
collatz 1 = 1
- the terminate function should still work (at least for positive inputs :)

* In the outputs, the length of the Collatz sequence for 9 is given as 20, but the maximum of the lengths when the inputs range from 1 to 10 is 17?

newsham said...

Is there a reason you prefer to define terminate rather than use takeWhile?

Adam Turoff said...

I defined terminate instead of using takeWhile, because (takeWhile (/= 1)) will remove the last element of the sequence, while terminate (as defined here) preserves it.