Tuesday, August 18, 2009

Flipping Coins with Clojure

So... there was a question on Reddit today regarding flipping a series of coins. Quite a bit of dispute arose about the actual phrasing of the question, so rather than plagiarize one of the many suggested variations, I'll spell it out here in plain english.

Bob has a serious problem affecting dozens around the U.S. today. He's a habitual coin flipper. You see... Bob's life fell apart a few years ago after his wife Janet left him, and now he sits around his house flipping coins and writing down how they land. Let's take a peek into Bob's journal for a couple days and see how things have been.

Day 1, Monday:

Woke up today and put some more vaseline on the tip of my thumb. All this flipping has been good, but there's some things that go along with it. Damned blisters keep flaring, and my thumbnail is cracked. Oh well, time to pour us up a drink and see what's what. Today I'll be flipping just to see what the big man upstairs has in store. I'm gonna flip until the sun sets and write it down everytime the coins go head-tails-head.

Day 2, Tuesday:

Well, shit, yesterday was a good time, but all this flippin' just makes me wanna flip more. Time to pour us up a drink and see what ol' mister flipper has ready for me. I'm gonna flip till my flipper-flapper is all tuckered out and write it down everytime the coins go head-tails-tails.

If Bob only knew what Clojure could do for him, he could get his old life back, and Janet and the kids would come home, but I digress... Let's explore Bob's habit and try to answer the following question: "In an infinite sequence of coin flips, which pattern will occur more often: HTH or HTT?" (ok... so I plagiarized a little).

We'll start off by defining the patterns we're looking for and the number of samples we want to take. In this case, we'll be representing a head as zero and a tail as one.
(def head-tail-head [0 1 0])
(def head-tail-tail [0 1 1])
(def num-samples 100000)
Next we'll need a function to flip the coin and calculate the average number of tries before Bob cracks the champagne.
(defn flip-coin []
  (rand-int 2))

(defn avg [coll]
  (/ (apply + coll) (count coll)))
Super easy so far, right? If only Bob knew... Now for something slightly more advanced, our main worker function, which I'll break down in pieces.
(defn num-flips-to-match-pattern [pattern]
  (let [coin-flips (repeatedly flip-coin)]
    (loop [x 0]
      (if (= pattern (nth-flip-sequence x coin-flips))
        (+ 3 x)
        (recur (inc x))))))
This function takes a given pattern and returns the number of flips required to match the pattern. The flipping is obviously random, but with a large enough sample size, some pattern should emerge. Side note: I add three to the return result because it takes a minimum number of three flips to get a valid result and we're starting with zero (offset math FTW).

To make sure we have enough coin flips at our disposal, we'll generate them from an infinite lazy sequence (on line #2). Clojure makes these kinds of infinite sets trivial to define, and they can be very convenient when you're not sure how much data you'll need.

Next up you'll see a yet to be defined function, nth-flip-sequence. I would have defined it up above, but it makes better sense in context, so I'll define it now.
(defn nth-flip-sequence [x coin-flips]
  (nth (partition 3 1 coin-flips) x))
The nth-flip-sequence function takes an offset and an infinite sequence (in this case a sequence of coin flips). From here, it uses partition to grab three elements with a step of one. As an example, it would break up a dataset as follows.
(def coin-flips [0 1 1 0 0 1 1 1])
(nth (partition 3 1 coin-flips) 0)
(0 1 1)
(nth (partition 3 1 coin-flips) 1)
(1 1 0)
(nth (partition 3 1 coin-flips) 2)
(1 0 0)
As usual, Clojure has some powerful batteries included to do most of the work for us. With the hard part out of the way, we write a function to average the number of flips required to match a given pattern for a specified number of samples.
(defn take-samples [pattern num-samples]
  (float (avg (for [x (range num-samples)]
                   (num-flips-to-match-pattern pattern)))))
We'll go ahead and define Bob's two incredibly fulfilling days.
(def monday (future (take-samples head-tail-head num-samples)))
(def tuesday (take-samples head-tail-tail num-samples))
If you noticed the future function above, good eye. Bob likes to get things done fast, so this is a trivial way to calculate both days in parallel. We are using Clojure afterall...

After this, we simply print the results and cleanup the thread pool (since we used future).
(println "mon: " @monday)
(println "tue: " tuesday)
(shutdown-agents)
A sample run confirms that the reasoning in the original article is sound and hopefully brings Bob some peace.

mon:  10.03817 # avg num of coin flips to hit head-tails-head
tue:  8.00416 # avg num of coin flips to hit head-tails-tails