#include <adlib.h>Max Shinn's blog.
/
Thu, 09 Mar 2017 08:11:31 -0500Thu, 09 Mar 2017 08:11:31 -0500Jekyll v3.0.1When to stop and when to keep going<p>I was recently posed the following puzzle:</p>
<blockquote>
<p>Imagine you are offered a choice between two different bets. In
(A), you must make 2/3 soccer shots, and in (B), you must make
5/8. In either case, you receive a <span class="MathJax_Preview">$100</span><script type="math/tex">$100</script> prize for winning the bet.
Which bet should you choose?</p>
</blockquote>
<p>Intuitively, a professional soccer player would want to take the
second bet, whereas a hopeless case like me would want to take the
first. However, suppose you have no idea whether your skill level is
closer to Lionel Messi or to Max Shinn. The puzzle continues:</p>
<blockquote>
<p>You are offered the option to take practice shots to determine your
skill level at a cost of <span class="MathJax_Preview">$0.01</span><script type="math/tex">$0.01</script> for each shot. Assuming you and
the goalie never fatigue, how do you decide when to stop taking
practice shots and choose a bet?</p>
</blockquote>
<p>Clearly it is never advisable to take more than <span class="MathJax_Preview">100/.01=10000</span><script type="math/tex">100/.01=10000</script>
practice shots, but how many <em>should</em> we take? A key to this question
is that you do not have to determine the number of shots to take
beforehand. Therefore, rather than determining a fixed number of
shots to take, we will instead need to determine a decision procedure
for when to stop shooting and choose a bet.</p>
<p>There is no single “correct” answer to this puzzle, so I have
documented my approach below.</p>
<h2 id="approach">Approach</h2>
<p>To understand my approach, first realize that there are a finite
number of potential states that the game can be in, and that you can
fully define each state based on how many shots you have made and how
many you have missed. The sum of these is the total number of shots
you have taken, and the order does not matter. Additionally, we
assume that all states exist, even if you will never arrive at that
state by the decision procedure.</p>
<p>An example of a state is taking 31 shots, making 9 of them, and
missing 22 of them. Another example is taking 98 shots, making 1 of
them and missing 97 of them. Even though we may have already made a
decision before taking 98 shots, the concept of a state does not
depend on the procedure used to “get there”.</p>
<p>Using this framework, it is sufficient to show which decision we
should take given what state we are in. My approach is as follows:</p>
<ol>
<li>Find a tight upper bound <span class="MathJax_Preview">B \ll 10000</span><script type="math/tex">B \ll 10000</script> on the number of practice
shots to take. This limits the number of states to work with.</li>
<li>Determine the optimal choice based on each potential state after
taking <span class="MathJax_Preview">B</span><script type="math/tex">B</script> total shots. Once <span class="MathJax_Preview">B</span><script type="math/tex">B</script> shots have been taken, it is
always best to have chosen either bet (A) or bet (B), so choose the
best bet without the option of shooting again.</li>
<li>Working backwards, starting with states with <span class="MathJax_Preview">B-1</span><script type="math/tex">B-1</script> shots and
moving down to <span class="MathJax_Preview">B-2,...,0</span><script type="math/tex">B-2,...,0</script>, determine the expected value of each
of the three choices: select bet (A), select bet (B), or shoot
again. Use this to determine the optimal choice to make at that
position.</li>
</ol>
<p>The advantage of this approach is that the primary criterion we will
work with is the expected value for each decision. This means that if
we play the game many times we will maximize the amount of money we
earn. As a convenient consequence of this, we know much money we can
expect to earn given our current state.</p>
<p>The only reason this procedure is necessary is because we don’t know
our skill level. If we could determine with 100% accuracy what are
skill level was, we would never need to take any shots at all. Thus,
a key part of this procedure is estimating our skill level.</p>
<h2 id="what-if-you-know-your-skill-level">What if you know your skill level?</h2>
<p>We define skill level as the probability <span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script> that you will make a
shot. So if you knew your probability of making each shot, we could
find your expected payoff from each bet. On the plot below, we show
the payoff (in dollars) of each bet on the y-axis, and how it changes
with skill on the x-axis.</p>
<div>
<figure>
<center><img src="/res/soccer/winning_prob_binom.png" /></center>
<figcaption class="imagecaption">Assuming
we have a precise knowledge of your skill level, we can find how much
money you can expect to make from each bet.</figcaption>
</figure>
</div>
<p>The first thing to notice is the obvious: as our skill improves, the
amount of money we can expect to win increases. Second, we see that
there is some point (the “equivalence point”) at which the bets are
equal; we compute this numerically to be <span class="MathJax_Preview">p_0 = 0.6658</span><script type="math/tex">p_0 = 0.6658</script>. We should
choose bet (A) if our skill level is worse than <span class="MathJax_Preview">0.6658</span><script type="math/tex">0.6658</script>, and bet (B) if
it is greater than <span class="MathJax_Preview">0.6658</span><script type="math/tex">0.6658</script>.</p>
<p>But suppose our guess is poor. We notice that <em>the consequence for
guessing too high is less than the consequence for guessing too low</em>.
It is better to bias your choice towards (A) unless you obtain
substantial evidence that you have a high skill level and (B) would be
a better choice. In other words, the potential gains from choosing
(A) over (B) are larger than the potential gains for choosing (B) over
(A).</p>
<h2 id="finding-a-tight-upper-bound">Finding a tight upper bound</h2>
<p>Quantifying this intuition, we compute the maximal possible gain of
choosing (A) over (B) and (B) over (A) as the maximum distance between
the curves on each side of the equivalence point. In other words, we
find the skill level at which the incentive is strongest to choose one
bet over the other, and then find what the incentive is at these
points.</p>
<div>
<figure>
<center><img src="/res/soccer/winning_prob_binom_lines.png" /></center>
<figcaption class="imagecaption">We
see here the locations where the distance between the curves is
greatest, showing the skill levels where it is most advantageous to
choose (A) or (B).</figcaption>
</figure>
</div>
<p>This turns out to be <span class="MathJax_Preview">$4.79</span><script type="math/tex">$4.79</script> for choosing (B) over (A), and
<span class="MathJax_Preview">$17.92</span><script type="math/tex">$17.92</script> for choosing (A) over (B). Since each shot costs
<span class="MathJax_Preview">$0.01</span><script type="math/tex">$0.01</script>, we conclude that it is never a good idea to take more than
479 practice shots. Thus, our upper bound <span class="MathJax_Preview">B=479</span><script type="math/tex">B=479</script>.</p>
<h2 id="determining-the-optimal-choice-at-the-upper-bound">Determining the optimal choice at the upper bound</h2>
<p>Because we will never take more than 479 shots, we use this as a
cutoff point, and force a decision once 479 shots have been taken. So
for each possible combinations of successes and failures, we must
find whether bet (A) or bet (B) is better.</p>
<p>In order to determine this, we need two pieces of information: first,
we need the expected value of bets (A) and (B) given <span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script> (i.e. the
curve shown above); second, we need the distribution representing our
best estimate of <span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script>. Remember, it is not enough to simply choose
(A) when our predicted skill is less than <span class="MathJax_Preview">0.6658</span><script type="math/tex">0.6658</script> and (B) when it
is greater than <span class="MathJax_Preview">0.6658</span><script type="math/tex">0.6658</script>; since we are biased towards choosing (A),
we need a probability distribution representing potential values of
<span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script>. Then, we can find the expected value of each bet given the
distribution of <span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script> (see appendix for more details). This can be
computed with a simple integral, and is easy to approximate
numerically.</p>
<p>Once we have performed these computations, in addition to having
information about whether (A) or (B) was chosen, we also know the
expected value of the chosen bet. This will be critical for
determining whether it is beneficial to take more shots before we have
reached the upper bound.</p>
<h2 id="determining-the-optimal-choice-below-the-upper-bound">Determining the optimal choice below the upper bound</h2>
<p>We then go down one level: if 478 shots have been taken, with <span class="MathJax_Preview">k</span><script type="math/tex">k</script>
successes and <span class="MathJax_Preview">(478-k)</span><script type="math/tex">(478-k)</script> failures, should we choose (A), should we
choose (B), or should we take another shot? Remember, we would like
to select the choice which will give us the highest expected outcome.</p>
<p>Based on this principle, it is only advisable to take another shot if
it would influence the outcome; in other words, if you would choose
the same bet no matter what the outcome of your next shot, it does not
make sense to take another shot, because you lose <span class="MathJax_Preview">$0.01</span><script type="math/tex">$0.01</script> without
gaining any information. It only makes sense to take the shot if the
information gained from taking the shot increases the expected value
by more than <span class="MathJax_Preview">$0.01</span><script type="math/tex">$0.01</script>.</p>
<p>Thus, we would only like to take another shot if the information
gained is worth more than <span class="MathJax_Preview">$0.01</span><script type="math/tex">$0.01</script>. We can compute this by finding the
expected value of each of the three options (choose (A), choose (B),
or shoot again). Using our previous experiments to judge the
probability of a successful shot (see appendix), we can find the
expected payoff of taking another shot. If it is greater than
choosing (A) or (B), we take the shot.</p>
<p>Working backwards, we continue until we are on our first shot, where
we assume we have a <span class="MathJax_Preview">50</span><script type="math/tex">50</script>% chance of succeeding. Once we reach this
point, we have a full decision tree, indicating which action we should
take based on the outcome of each shot, and the entire decision
process can be considered solved.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Here is the decision tree, plotted in raster form.</p>
<div>
<figure>
<center><img src="/res/soccer/decision-tree.png" /></center>
<figcaption class="imagecaption">Starting at
the point (0,0), go one to the right for every shot that you take, and
one up for every shot that you make. Red indicates you should shoot
again, blue indicates you should choose (A), and green indicates you
should choose (B).</figcaption>
</figure>
</div>
<p>Looking more closely at the beginning, we see that unless you are
really good, you should choose (A) rather quickly.</p>
<div>
<figure>
<center><img src="/res/soccer/decision-tree-zoomed.png" /></center>
<figcaption class="imagecaption">An
identical plot to that above, but zoomed in near the beginning.</figcaption>
</figure>
</div>
<p>We can also look at the amount of money you will win on average if you
play by this strategy. As expected, when you make more shots, you
will have a higher chance of winning more money.</p>
<div>
<figure>
<center><img src="/res/soccer/value-tree.png" /></center>
<figcaption class="imagecaption">For each point in
the previous figures, these values correspond to the choices.</figcaption>
</figure>
</div>
<p>We can also look at the zoomed in version.</p>
<div>
<figure>
<center><img src="/res/soccer/value-tree-zoomed.png" /></center>
<figcaption class="imagecaption">An
identical plot to the one above, but zoomed in near the beginning.</figcaption>
</figure>
</div>
<p>This algorithm grows in memory and computation time like <span class="MathJax_Preview">O(B^2)</span><script type="math/tex">O(B^2)</script>,
meaning that if we double the size of the upper bound, we quadruple
the amount of memory and CPU time we require.</p>
<p>This may not be the best strategy, but it seems to be a principled
strategy which works reasonably well with a relatively small runtime.</p>
<h2 id="appendix-determining-the-distribution-of-p0">Appendix: Determining the distribution of <span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script></h2>
<p>In order to find the distribution for <span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script>, we consider the
distribution of <span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script> for a single shot. The chance that we make a
shot is <span class="MathJax_Preview">100</span><script type="math/tex">100</script>% if <span class="MathJax_Preview">p_0=1</span><script type="math/tex">p_0=1</script>, <span class="MathJax_Preview">0</span><script type="math/tex">0</script>% if <span class="MathJax_Preview">p_0=0</span><script type="math/tex">p_0=0</script>, <span class="MathJax_Preview">50</span><script type="math/tex">50</script>% if
<span class="MathJax_Preview">p_0=0.5</span><script type="math/tex">p_0=0.5</script>, and so on. Thus, the distribution of <span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script> from a
single successful trial is <span class="MathJax_Preview">f(p)=p</span><script type="math/tex">f(p)=p</script> for <span class="MathJax_Preview">0 ≤ p ≤ 1</span><script type="math/tex">0 ≤ p ≤ 1</script>. Similarly,
if we miss the shot, then the distribution is <span class="MathJax_Preview">f(p)=(1-p)</span><script type="math/tex">f(p)=(1-p)</script> for
<span class="MathJax_Preview">0≤p≤1</span><script type="math/tex">0≤p≤1</script>. Since these probabilities are independent, we can multiply
them together and find that, for <span class="MathJax_Preview">n</span><script type="math/tex">n</script> shots, <span class="MathJax_Preview">k</span><script type="math/tex">k</script> successes, and
<span class="MathJax_Preview">(n-k)</span><script type="math/tex">(n-k)</script> failures, we have <span class="MathJax_Preview">f(p)=p^k (1-p)^{n-k}/c</span><script type="math/tex">f(p)=p^k (1-p)^{n-k}/c</script> for some
normalizing constant <span class="MathJax_Preview">c</span><script type="math/tex">c</script>. It turns out, this is identical to the
beta distribution, with parameters <span class="MathJax_Preview">α=k+1</span><script type="math/tex">α=k+1</script> and <span class="MathJax_Preview">β=n-k+1</span><script type="math/tex">β=n-k+1</script>.</p>
<p>However, we need a point estimate of <span class="MathJax_Preview">p_0</span><script type="math/tex">p_0</script> to compute the expected
value of taking another shot. We cannot simply use the ratio <span class="MathJax_Preview">n/k</span><script type="math/tex">n/k</script>
for two practical reasons: first, it is undefined when no shots have
been taken, and second, when the first shot has been taken, we have a
<span class="MathJax_Preview">100</span><script type="math/tex">100</script>% probability of one outcome and a <span class="MathJax_Preview">0</span><script type="math/tex">0</script>% probability of the
other. If we want to assume a <span class="MathJax_Preview">50</span><script type="math/tex">50</script>% probability of making the shot
initially, an easy way to solve this problem is to use the ratio
<span class="MathJax_Preview">(k+1)/(n+2)</span><script type="math/tex">(k+1)/(n+2)</script> instead of <span class="MathJax_Preview">k/n</span><script type="math/tex">k/n</script> to estimate the probability.
Interestingly, this quick and dirty solution is equivalent to finding
the mean of the beta distribution. When no shots have been taken,
<span class="MathJax_Preview">k=0</span><script type="math/tex">k=0</script> and <span class="MathJax_Preview">n=0</span><script type="math/tex">n=0</script>, so <span class="MathJax_Preview">α=1</span><script type="math/tex">α=1</script> and <span class="MathJax_Preview">β=1</span><script type="math/tex">β=1</script>, which is equivalent to the
uniform distribution, hence our non-informative prior.</p>
<h3>Code/Data:</h3>
<ul>
<li><a href="/res/soccer/find-best-strategy.py">Analysis script</a></li>
</ul>
Wed, 08 Mar 2017 00:00:00 -0500
/2017/03/08/when-to-stop-and-when-to-keep-going.html
/2017/03/08/when-to-stop-and-when-to-keep-going.htmlalgorithmstatsmodelingpuzzlegamePuzzleWhich hints are best in Towers?<p>There is a wonderful collection of puzzles by Simon Tatham called the
<a href="http://www.chiark.greenend.org.uk/~sgtatham/puzzles/">Portable Puzzle Collection</a>
which serves as a fun distraction. The game “Towers” is a simple puzzle
where you must fill in a
<a href="https://en.wikipedia.org/wiki/Latin_square">Latin square</a> with
numbers <span class="MathJax_Preview">1 \ldots N</span><script type="math/tex">1 \ldots N</script>, only one of each per row/column, as if the
squares contained towers of this height. The number of towers visible
from the edges of rows and columns are given as clues. For example,</p>
<div>
<figure>
<center><img src="/res/towers/example-board.png" /></center>
<figcaption class="imagecaption">An example starting board from the Towers game.</figcaption>
</figure>
</div>
<p>Solved, the board would appear as,</p>
<div>
<figure>
<center><img src="/res/towers/solved-example.png" /></center>
<figcaption class="imagecaption">The previous example solved.</figcaption>
</figure>
</div>
<p>In more advanced levels, not all of the hints are given.
Additionally, in these levels, hints can also be given in the form of
the value of particular cells. For example, the initial conditions of
the puzzle may be,</p>
<div>
<figure>
<center><img src="/res/towers/hard-level.png" /></center>
<figcaption class="imagecaption">A more difficult example board.</figcaption>
</figure>
</div>
<p>With such different types of hints, it raises the question of whether
some hints are better than others.</p>
<h2 id="how-will-we-approach-the-problem">How will we approach the problem?</h2>
<p>We will use an
<a href="https://en.wikipedia.org/wiki/Shannon_information">information-theoretic</a>
framework to understand how useful different hints are. This allows
us to measure the amount of information that a particular hint gives
about the solution to a puzzle in bits, a nearly-identical unit to
that used by computers to measure file and memory size.</p>
<p>Information theory is based on the idea that random variables
(quantities which can take on one of many values probabilistically)
are not always independent, so sometimes knowledge of the value of one
random variable can change the probabilities for a different random
variable. For instance, one random variable may be a number 1-10, and
a second random variable may be whether that number is even or odd. A
bit is an amount of information equal to the best possible yes or no
question, or (roughly speaking) information that can cut the number of
possible outcomes in half. Knowing whether a number is even or odd
gives us one bit of information, since it specifies that the first
random variable can only be one of five numbers instead of one of ten.</p>
<p>Here, we will define a few random variables. Most importantly, we
will have the random variable describing the correct solution of the
board, which could be any possible board. We will also have random
variables which represent hints. There are two types of hints:
initial cell value hints (where one of the cells is already filled in)
and tower visibility hints (which define how many towers are visible
down a row or column).</p>
<p>The number of potential Latin squares of size <span class="MathJax_Preview">N</span><script type="math/tex">N</script> grows very fast.
For a <span class="MathJax_Preview">5×5</span><script type="math/tex">5×5</script> square, there are 161,280 possibilities, and for a
<span class="MathJax_Preview">10×10</span><script type="math/tex">10×10</script>, there are over <span class="MathJax_Preview">10^{47}</span><script type="math/tex">10^{47}</script>. Thus, for computational
simplicity, we analyze a <span class="MathJax_Preview">4×4</span><script type="math/tex">4×4</script> puzzle with a mere 576 possibilities.</p>
<h2 id="how-useful-are-initial-cell-value-hints">How useful are “initial cell value” hints?</h2>
<p>First, we measure the entropy, or the maximal information content that
a single cell will give. For the first cell chosen, there is an equal
probability that any of the values (<code class="highlighter-rouge">1</code>, <code class="highlighter-rouge">2</code>, <code class="highlighter-rouge">3</code>, or <code class="highlighter-rouge">4</code>) will be
found in that cell. Since there are two options, this give us 2 bits
of information.</p>
<p>What about the second initial cell value? Interestingly, it depends
both on the location and on the value. If the second clue is in the
same row or column as the first, it will give less information. If it
is the same number as the first, it will also give less information.</p>
<p>Counter-intuitively, in the 4×4 board, this means we gain <em>more</em> than
2 bits of information from the second hint. This is because, once we
reveal the first cell’s value, the probabilities of each of the other
cell’s possible values are not equal as they were before. Since we
are not choosing from the same row or column of our first choice, is
more likely that this cell will be equal to the first cell’s value
than to any other value. So therefore if we reveal a value which is
different, it will provide more information.</p>
<p>Intuitively, for the 4×4 board, suppose we reveal the value of a cell
and it is <code class="highlighter-rouge">4</code>. There cannot be another <code class="highlighter-rouge">4</code> in the same column or row,
so if we are to choose a hint from a different column or row, we are
effectively choosing from a leaving a 3×3 grid. There must be 3 <code class="highlighter-rouge">4</code>
values in the 3×3 grid, so the probability of selecting it is 1/3. We
have an even probability of selecting a <code class="highlighter-rouge">1</code>, <code class="highlighter-rouge">2</code>, or <code class="highlighter-rouge">3</code>, so each
other symbol has a probability of 2/9. Being more surprising finds,
we gain 2.17 bits of information from each of these three.</p>
<p>Consequently, selecting a cell in the same row or column, or one which
has the same value as the first, will give an additional 1.58 bits of
information.</p>
<h2 id="how-about-tower-visibility-hints">How about “tower visibility” hints?</h2>
<p>In a 4×4 puzzle, it is very easy to compute the information gained if
the hint is a <code class="highlighter-rouge">1</code> or a <code class="highlighter-rouge">4</code>. A hint of <code class="highlighter-rouge">1</code> always gives the same
amount of information as a single square: it tells us that the cell on
the edge of the hint must be a <code class="highlighter-rouge">4</code>, and gives no information about the
rest of the squares. If only one tower can be seen, the tallest tower
must come first. Thus, it must give 2 bits of information.</p>
<p>Additionally, we know that if the hint is equal to <code class="highlighter-rouge">4</code>, the only
possible combination for the row is <code class="highlighter-rouge">1</code>, <code class="highlighter-rouge">2</code>, <code class="highlighter-rouge">3</code>, <code class="highlighter-rouge">4</code>. Thus, this
gives an amount of information equal to the entropy of a single row,
which turns out to be 4.58 bits.</p>
<p>For a hint of <code class="highlighter-rouge">2</code> or <code class="highlighter-rouge">3</code>, the information content is not as
immediately clear, but we can calculate them numerically. For a hint
of <code class="highlighter-rouge">2</code>, we have 1.13 bits, and for a hint of <code class="highlighter-rouge">3</code>, we have 2 bits.</p>
<p>Conveniently, due to the fact that the reduction of entropy in a row
must be equal to the reduction of entropy in the entire puzzle, we can
compute values for larger boards. Below, we show the information
gained about the solution from each possible hint (indicated by the
color). In general, it seems higher hints are usually better, but a
hint of <code class="highlighter-rouge">1</code> is generally better than one of <code class="highlighter-rouge">2</code> or <code class="highlighter-rouge">3</code>.</p>
<div>
<figure>
<center><img src="/res/towers/information-by-board-size.png" /></center>
<figcaption class="imagecaption">For each board size, the information content of each
potential hint is plotted.</figcaption>
</figure>
</div>
<h2 id="conclusion">Conclusion</h2>
<p>In summary:</p>
<ul>
<li>The more information given by a hint for a puzzle, the easier that
hint makes it to solve the puzzle.</li>
<li>Of the two types of hints, usually the hints about the tower
visibility are best.</li>
<li>On small boards (of size less than 5), hints about individual cells
are very useful.</li>
<li>The more towers visible from a row or column, the more information is
gained about the puzzle from that hint.</li>
</ul>
<p>Of course, remember that all of the hints combined of any given puzzle
must be sufficient to completely solve the puzzle (assuming the puzzle
is solvable), so the information content provided by the hints must be
equal to the entropy of the puzzle of the given size. When combined,
we saw in the “initial cell value” that hints may become more or less
effective, so these entropy values cannot be directly added to
determine which hints provide the most information. Nevertheless,
this serves as a good starting point in determining which hints are
the most useful.</p>
<h2 id="more-information">More information</h2>
<h3 id="theoretical-note">Theoretical note</h3>
<p>For initial cell hints, it is possible to compute the information
content analytically for any size board. For a board of size <span class="MathJax_Preview">N×N</span><script type="math/tex">N×N</script>
with <span class="MathJax_Preview">N</span><script type="math/tex">N</script> symbols, we know that the information contained in the
first hint is <span class="MathJax_Preview">-\log(1/N)</span><script type="math/tex">-\log(1/N)</script> bits. Suppose this play uncovers token
<code class="highlighter-rouge">X</code>. Using this first play, we construct a sub-board where the row
and column of the first hint are removed, leaving us with an
<span class="MathJax_Preview">(N-1)×(N-1)</span><script type="math/tex">(N-1)×(N-1)</script> board. If we choose a cell from this board, it has a
<span class="MathJax_Preview">1/(N-1)</span><script type="math/tex">1/(N-1)</script> probability of being <code class="highlighter-rouge">X</code> and an equal chance of being
anything else, giving a <span class="MathJax_Preview">\frac{N-2}{(N-1)^2}</span><script type="math/tex">\frac{N-2}{(N-1)^2}</script> probability of each of
the other tokens. Thus, information gained is
<span class="MathJax_Preview">-\frac{N-2}{(N-1)^2}×\log\left(\frac{N-2}{(N-1)^2}\right)</span><script type="math/tex">-\frac{N-2}{(N-1)^2}×\log\left(\frac{N-2}{(N-1)^2}\right)</script> if the
value is different from the first, and
<span class="MathJax_Preview">-1/(N-1)×\log\left(1/(N-1)\right)</span><script type="math/tex">-1/(N-1)×\log\left(1/(N-1)\right)</script> if they are the same; these
expressions are approximately equal for large <span class="MathJax_Preview">N</span><script type="math/tex">N</script>. Note how no
information is gained when the second square is revealed if <span class="MathJax_Preview">N=2</span><script type="math/tex">N=2</script>.</p>
<p>Similarly, when a single row is revealed (for example by knowing that
<span class="MathJax_Preview">N</span><script type="math/tex">N</script> towers are visible from the end of a row or column) we know that
the entropy must be reduced by <span class="MathJax_Preview">-\sum_{i=1}^N \log(1/N)</span><script type="math/tex">-\sum_{i=1}^N \log(1/N)</script>. This is
because the first element revealed in the row gives <span class="MathJax_Preview">-\log(1/N)</span><script type="math/tex">-\log(1/N)</script>
bits, the second gives <span class="MathJax_Preview">-\log(1/(N-1))</span><script type="math/tex">-\log(1/(N-1))</script> bits, and so on.</p>
<h3 id="solving-a-puzzle-algorithmically">Solving a puzzle algorithmically</h3>
<p>Most of these puzzles are solvable without backtracking, i.e. the next
move can always be logically deduced from the state of the board
without the need for trial and error. By incorporating the
information from each hint into the column and row states and then
integrating this information across rows and columns, it turned out to
be surprisingly simple to write a quick and dirty algorithm to solve
the puzzles. This algorithm, while probably not of optimal
computational complexity, worked reasonably well. Briefly,</p>
<ol>
<li>Represent the initial state of the board by a length-<span class="MathJax_Preview">N</span><script type="math/tex">N</script> list of
lists, where each of the <span class="MathJax_Preview">N</span><script type="math/tex">N</script> lists represents a row of the board,
and each sub-list contains all of the possible combinations of this
row (there are <span class="MathJax_Preview">N!</span><script type="math/tex">N!</script> of them to start). Similarly, define an
equivalent (yet redundant) data structure for the columns.</li>
<li>Enforce each condition on the start of the board by eliminating the
impossible combinations using the number of towers visible from
each row and column, and using the cells given at initialization.
Update the row and column lists accordingly.</li>
<li>Now, the possible moves for certain squares will be restricted by
the row and column limitations; for instance, if only 1 tower is
visible in a row or column, the tallest tower in the row or column
must be on the edge of the board. Iterate through the cells,
restricting the potential rows by the limitations on the column and
vice versa. For example, if we know the position of the tallest
tower in a particular <em>column</em>, eliminate the corresponding <em>rows</em>
which do not have the tallest tower in this position in the row.</li>
<li>After sufficient iterations of (3), there should only be one
possible ordering for each row (assuming it is solvable without
backtracking). The puzzle is now solved.</li>
</ol>
<p>This is not a very efficient algorithm, but it is fast enough and
memory-efficient enough for all puzzles which might be fun for a human
to solve. This algorithm also does not work with puzzles which
require backtracking, but could be easily modified to do
so.</p>
<h3>Code/Data:</h3>
<ul>
<li><a href="/res/towers/compute_information.py">Analysis script</a></li>
<li><a href="/res/towers/solve.py">Script to solve a Towers puzzle</a></li>
</ul>
Sat, 28 Jan 2017 00:00:00 -0500
/2017/01/28/which-hints-are-best-in-towers.html
/2017/01/28/which-hints-are-best-in-towers.htmlpuzzlegametowersalgorithminformation-theoryMathWhen should you leave for the bus?<p>Anyone who has taken the bus has at one time or another wondered,
“When should I plan to be at the bus stop?” or more importantly, “When
should I leave if I want to catch the bus?” Many bus companies
suggest
<a href="http://www.matatransit.com/ridersguide/how-to-ride/">arriving</a>
<a href="http://www.riderta.com/howtoride">a</a>
<a href="http://routes.valleymetro.org/">few</a>
<a href="http://www.metrotransit.org/ride-the-bus">minutes</a>
<a href="http://atltransit.org/guide/tips/">early</a>, but there seem to be no
good analyses on when to leave for the bus. I decided to find out.</p>
<h2 id="finding-a-cost-function">Finding a cost function</h2>
<p>Suppose we have a bus route where a bus runs every <span class="MathJax_Preview">I</span><script type="math/tex">I</script> minutes, so if
you don’t catch your bus, you can always wait for the next bus.
However, since more than just your time is at stake for missing the
bus (e.g. missed meetings, stress, etc.), we assume there is a penalty
<span class="MathJax_Preview">\delta</span><script type="math/tex">\delta</script> for missing the bus in addition to the extra wait time.
<span class="MathJax_Preview">\delta</span><script type="math/tex">\delta</script> here is measured in minutes, i.e. how many minutes of your
time would you exchange to be guaranteed to avoid missing the bus.
<span class="MathJax_Preview">\delta=0</span><script type="math/tex">\delta=0</script> therefore means that you have no reason to prefer one bus
over another, and that you only care about minimizing your lifetime
bus wait time.</p>
<p>Assuming we will not be late enough to need to catch the third bus, we
can model this with two terms, representing the cost to you (in
minutes) of catching each of the next two buses, weighted by the
probability that you will catch that bus:</p>
<div class="MathJax_Preview">C(t) = \left(E(T_B) - t\right) P\left(T_B > t + L_W\right) + \left(I + E(T_B) - t + \delta\right) P(T_B < t + L_W)</div>
<script type="math/tex; mode=display">% <![CDATA[
C(t) = \left(E(T_B) - t\right) P\left(T_B > t + L_W\right) + \left(I + E(T_B) - t + \delta\right) P(T_B < t + L_W) %]]></script>
<p>where <span class="MathJax_Preview">T_B</span><script type="math/tex">T_B</script> is the random variable representing the time at which
the bus arrives, <span class="MathJax_Preview">L_W</span><script type="math/tex">L_W</script> is the random variable respresenting the
amount of time it takes to walk to the bus stop, and <span class="MathJax_Preview">t</span><script type="math/tex">t</script> is the time
you leave. (<span class="MathJax_Preview">E</span><script type="math/tex">E</script> is expected value and <span class="MathJax_Preview">P</span><script type="math/tex">P</script> is the probability.) We
wish to choose a time to leave the office <span class="MathJax_Preview">t</span><script type="math/tex">t</script> which minimizes the cost
function <span class="MathJax_Preview">C</span><script type="math/tex">C</script>.</p>
<p>If we assume that <span class="MathJax_Preview">T_B</span><script type="math/tex">T_B</script> and <span class="MathJax_Preview">L_W</span><script type="math/tex">L_W</script> are Gaussian, then it can shown that
the optimal time to leave (which minimizes the above function) is</p>
<div class="MathJax_Preview">t = -\mu_W - \sqrt{\left(\sigma_B^2 + \sigma_W^2\right)\left(2\ln\left(\frac{I+\delta}{\sqrt{\sigma_B^2+\sigma_W^2}}\right)-2\ln\left(\sqrt{2\pi}\right)\right)}</div>
<script type="math/tex; mode=display">t = -\mu_W - \sqrt{\left(\sigma_B^2 + \sigma_W^2\right)\left(2\ln\left(\frac{I+\delta}{\sqrt{\sigma_B^2+\sigma_W^2}}\right)-2\ln\left(\sqrt{2\pi}\right)\right)}</script>
<p>where <span class="MathJax_Preview">\sigma_B^2</span><script type="math/tex">\sigma_B^2</script> is the variance of the bus arrival time,
<span class="MathJax_Preview">\sigma_W^2</span><script type="math/tex">\sigma_W^2</script> is the variance of your walk, and <span class="MathJax_Preview">\mu_W</span><script type="math/tex">\mu_W</script> is expected
duration of your walk. In other words, you should plan to arrive at
the bus stop on average <span class="MathJax_Preview">\sqrt{\left(\sigma_B^2 + \sigma_W^2\right)\left(2\ln\left(\left(I+\delta\right)/\sqrt{\sigma_B^2+\sigma_W^2}\right)-2\ln\left(\sqrt{2\pi}\right)\right)}</span><script type="math/tex">\sqrt{\left(\sigma_B^2 + \sigma_W^2\right)\left(2\ln\left(\left(I+\delta\right)/\sqrt{\sigma_B^2+\sigma_W^2}\right)-2\ln\left(\sqrt{2\pi}\right)\right)}</script> minutes before your bus arrives.</p>
<p>Note that one deliberate oddity of the model is that the cost function
does not just measure wait time, but also walking time. I optimized
on this because, in the end, what matters is the total time you spend
getting on the bus.</p>
<h2 id="what-does-this-mean">What does this mean?</h2>
<p>The most important factor you should consider when choosing which bus
to take is the variability in the bus’ arrival time and the
variability in the time it takes you to walk to the bus. The arrival
time scales approximately linearly with the standard deviation of the
variability.</p>
<p>Additionally, it scales at approximately the square root of the log
the your value of time and of the frequency of the buses. So even if
very high values of time and very infrequent buses do not
substantially change the time at which you should arrive. For
approximation purposes, you might consider adding a constant in place
of this term, anywhere from 2-5 minutes depending on the frequency of
the bus.</p>
<h2 id="checking-the-assumption">Checking the assumption</h2>
<p>First, we need to collect some data to assess whether the bus time
arrival (<span class="MathJax_Preview">T_B</span><script type="math/tex">T_B</script>) is normally distributed. I wrote scripts to scrape
data from Yale University’s Blue Line campus shuttle route. Many bus
systems (including Yale’s) now have real-time predictions, so I used
many individual predictions by Yale’s real-time arrival prediction
system as the expected arrival time, simulating somebody checking this
to see when the next bus comes.</p>
<p>For our purposes, the expected arrival time looks close enough to a
Gaussian distribution:</p>
<div>
<figure>
<center><img src="/res/bus/isnormal.png" /></center>
<figcaption class="imagecaption">It actually looks like a Gaussian!</figcaption>
</figure>
</div>
<h2 id="so-what-time-should-i-leave">So what time should I leave?</h2>
<p>When estimating the <span class="MathJax_Preview">\sigma_B^2</span><script type="math/tex">\sigma_B^2</script> parameter, we only examine bus
times which are 10 minutes away or later. This is because you can’t
use a real-time bus system to plan ahead of time to catch something if
it is too near in the future, which defeats the purpose of the present
analysis. The variance in arrival time for the Yale buses is
<span class="MathJax_Preview">\sigma_B^2=5.7</span><script type="math/tex">\sigma_B^2=5.7</script>.</p>
<p>We use an inter-bus interval of <span class="MathJax_Preview">I=15</span><script type="math/tex">I=15</script> minutes.</p>
<p>While the variability of the walk to the bus station <span class="MathJax_Preview">\sigma_W^2</span><script type="math/tex">\sigma_W^2</script> is
unique for each person, I consider two cases: one case, where we
assume that arrival time variability is small (<span class="MathJax_Preview">\sigma_W^2=0</span><script type="math/tex">\sigma_W^2=0</script>)
compared to the bus’ variability, representing the case where the bus
stop is (for intance) located right outside one’s office building. I
also consider the case where the time variability is comperable to the
variability for the bus (<span class="MathJax_Preview">\sigma_W=5</span><script type="math/tex">\sigma_W=5</script>), representing the case where
one must walk a long distance to the bus stop.</p>
<p>Finally, I consider the case where we strongly prioritize catching the
desired bus (<span class="MathJax_Preview">\delta=60</span><script type="math/tex">\delta=60</script> corresponding to, e.g., an important meeting)
and also the case where we seek to directly minimize the expected wait
time (<span class="MathJax_Preview">\delta=0</span><script type="math/tex">\delta=0</script> corresponding to, e.g., the commute home).</p>
<div>
<figure>
<center><img src="/res/bus/variants.png" /></center>
<figcaption class="imagecaption">Even though the
shape of the optimization function changes greatly, the optimal
arrival time changes very little.</figcaption>
</figure>
</div>
<p>We can also look at a spectrum of different cost tradeoffs for missing
the bus (values of <span class="MathJax_Preview">\delta</span><script type="math/tex">\delta</script>) and variance in the walk time (values
of <span class="MathJax_Preview">\sigma_W^2 = var(W)</span><script type="math/tex">\sigma_W^2 = var(W)</script>). Because they appear similarly in the
equations, we can also consider these values to be changes in the
interval of the bus arrival <span class="MathJax_Preview">I</span><script type="math/tex">I</script> and the variance of the bus’ arrival
time <span class="MathJax_Preview">\sigma_B^2=var(B)</span><script type="math/tex">\sigma_B^2=var(B)</script> respectively.</p>
<div>
<figure>
<center><img src="/res/bus/howearly.png" /></center>
<figcaption class="imagecaption">Across all
reasonable values, the optimal time to plan to arrive is between 3.5
and 8 minutes early.</figcaption>
</figure>
</div>
<h2 id="conclusion">Conclusion</h2>
<p>So to summarize:</p>
<ul>
<li>If it always takes you approximately the same amount of time to walk
to the bus stop, plan to be there 3-4 minutes early on your commute
home, or 5-6 minutes early if it’s the last bus before an important
meeting.</li>
<li>If you have a long walk to the bus stop which can vary in duration,
plan to arrive at the bus stop 4-5 minutes early if you take the bus
every day, or around 7-8 minutes early if you need to be somewhere
for a special event.</li>
<li>These estimations assume that you know how long it takes you on
average to walk to the bus stop. As we saw previously, if you need
to be somewhere at a certain time, arriving a minute early is much
better than arriving a minute late. If you don’t need to be
somewhere, just make your best guess.</li>
<li>The best way to reduce waiting time is to decrease variability.</li>
<li>These estimates also assume that the interval between buses is
sufficiently large. If it is small, as in the case of a subway,
there are
<a href="http://erikbern.com/2016/07/09/waiting-time-math.html">different factors</a>
that govern the time you spend waiting.</li>
<li>This analysis focuses on buses with an expected arrival time, not
with a scheduled arrival time. When buses have schedules, they will
usually wait at the stop if they arrive early. This situation would
require a different analysis than what was performed here.</li>
</ul>
<h3>Code/Data:</h3>
<ul>
<li><a href="/res/bus/2016-07-28-data.csv">Data</a></li>
<li><a href="/res/bus/getyalebus.py">Data collection script</a></li>
<li><a href="/res/bus/analyzebus.py">Data analysis script</a></li>
</ul>
Mon, 01 Aug 2016 00:00:00 -0400
/2016/08/01/when-should-you-leave-for-the-bus.html
/2016/08/01/when-should-you-leave-for-the-bus.htmldatabusstatsmodelingData