Wednesday, September 27, 2017

Sum of multiples problem

One of the exercism.io problems generalizes a Project Euler problem: what's the sum of multiples of 3 and 5 less than 1000? Seems simple enough:
sum [3, 6..999] + sum [5, 10..999]
but that's too big; it counts some values twice. With a little thought, though, you'll realize that the duplicates are the multiples of 15, so
sum [3, 6..999] + sum [5, 10..999] - sum[15, 30..999]
will do the trick. Why 15? Well, it's 3 * 5... but more on that later.

The exercism.io problem makes the upper bound a parameter, and allows an array (or list, depending on the language) of factors. Looking around (which you can do on the site after you've submitted your own solution), one sees two typical approaches:
  • Go through [1..limit - 1], pick out the values that are multiples of one or more of the factors, and add up the result.
  • Create a set or hash table from [[f, 2*f..limit - 1] f <- factors] (thus getting rid of the duplicates), and add up the elements of the set/keys of the hash table.
Both work, but they have drawbacks:
  • Divisibility checking means using the relatively expensive mod, potentially repeatedly, for each value in [1..limit - 1], and as many times as possible for values that aren't summed. As limit grows, that overhead grows.
  • Sets and hash tables build up an underlying data structure that grows with each value added to it. As limit grows, so does RAM usage. (My Python version using this method, handed an upper bound much larger than the exercism.io test values and set up to time with Python's timeit package, quickly sucked down immense amounts of virtual memory and had to be killed.)
There ought to be a way to generalize what we did for the initial [3, 5] case... and indeed there is. What did we do? We took the sum of the multiples of each factor and then subtracted off the sum of the duplicates. If there's just one factor, that sum is zero. If there are two, it's the sum of the multiples of... the product of the factors? Not quite!

Suppose that our factors are 6 and 21. 6 * 21 is 126, but 7 * 6 == 2 * 21 == 42. We want not the product, but the least common multiple of the two factors. 3 and 5 are relatively prime, so their least common multiple is their product (lcm m n == m * n `div` (gcd m n), and m and n are relatively prime iff their greatest common divisor is 1.) Try to fake us out, eh?

More generally, if you have factors f1, f2, ..., fn, each fi gives rise to {lcm fi fj | i < j}  as new sets of factors whose multiples are duplicates of the multiples of fi--but those in turn have duplicate issues, so it's time to recur. You won't recur forever, because the sets of factors get smaller each time.

One more thing... we don't need to evaluate sum [f, f+f, .. limit - 1] at all. There's a better way to do it, and since Young Sheldon had its premiere this week, let's return to the 19th century and Young Carl to learn why we don't have to generate that arithmetic sequence or count on GHC to do fusion. Young Carl Friedrich Gauss was assigned the problem of adding up all the integers from one to one hundred (in the first version I read, the class was assigned the problem, and only after solving it could a student go out to recess). The teacher looked, but Carl wasn't busily adding numbers. Soon he wrote down a number and handed the paper in, and to the teacher's amazement it was correct. The math book I learned it from explained Gauss's insight this way: consider the numbers from 1 to 100 written down twice: once in ascending order, and immediately under that in descending order, and consider corresponding terms in the two sequences. The first, 1 and 100, add up to 101. To get from one pair to the next, the top increases by 1 and the bottom decreases by one, so the sum remains the same: 101. There are 100 such pairs, so collectively they add up to 100 * 101 = 10100... but remember, we wrote the sequence down twice, so that sum is twice the value we want. The number young Carl wrote was 5050, and in general, the sum of the integers from 1 to n is n * (n + 1) / 2, and as an immediate consequence the sum of the multiples of f less than limit is f * n * (n + 1) `div` 2 where n = (limit - 1) `div` f.

Here's what I ended up with. Haskell has both gcd and lcm functions, I'm happy to say.

sumOfMultiples :: [Integer] -> Integer -> Integer
sumOfMultiples factors limit =
    sum (map oneFactorSum factors) -
        sum (map (`sumOfMultiples` limit) (dupFactors factors))
    where sumOneTo     n  = n * (n + 1) `div` 2
          oneFactorSum f  = f * sumOneTo ((limit - 1) `div` f)
          dupFactors   fs = map lcmHead $ take (length fs - 1) (iterate tail fs)
          lcmHead      fs = [lcm (head fs) f | f <- tail fs]

A flashback and analogy

You've probably heard about how the notion of sum types (e.g. Algol 68 union s, Rust enum s, Haskell type s) and product types (e.g. tup...