# ETOOBUSY ðŸš€ minimal blogging for the impatient

# Some Bayes helps - more

**TL;DR**

Where you are patient enough to continue on the Bayes rabbit hole.

In last post Some Bayes helps we took a look at how we can use a campaign of measurements to get a table that helps us understanding how to infer information from more future measurements.

# (re-)Starting point

We restart from the estimated conditional probabilities table:

Â | $X_1$ | $X_2$ | â€¦ | $X_M$ |
---|---|---|---|---|

C | $P_{X_1|C} = \frac{\Gamma_{X_1} }{\sum_X \Gamma_X}$ | $P_{X_2|C} = \frac{\Gamma_{X_2} }{\sum_X \Gamma_X}$ | â€¦ | $P_{X_M|C} = \frac{\Gamma_{X_M} }{\sum_X \Gamma_X}$ |

D | $P_{X_1|D} = \frac{\Delta_{X_1} }{\sum_X \Delta_X}$ | $P_{X_2|D} = \frac{\Delta_{X_2} }{\sum_X \Delta_X}$ | â€¦ | $P_{X_M|D} = \frac{\Delta_{X_M} }{\sum_X \Delta_X}$ |

Remember that we also have some a-priori estimation of *absolute*
probability of a *contact* $P_C$, and that the maximum likelihood
principle led us to establish the following inference algorithm:

- if $P_C \cdot P_{X|C} \geq (1 - P_C) \cdot P_{X|D}$ we infer $C$ from $X$, otherwise
- if $P_C \cdot P_{X|C} < (1 - P_C) \cdot P_{X|D}$ we infer $D$ from $X$.

This inference algorithm partitions the set of all measured classes $X_1 .. X_M$ into two sets:

# Can we be wrong?

You bet we can. The fact itself that a column in our table can have two non-zero values means that we will be wrong in some cases - the whole point of the maximum likelihood approach is to try and choose the option that makes us fail less often.

There are two kind of errors that we can make:

- saying that a contact happened, when it didnâ€™t - this is called a
*false positive*; - saying that a contact did not happen, when it did - this is called a
*false negative*.

Each of the possible classes from our measurement will contribute to at most one of these errors (it might contribute to neither if the column has one $1$ and one $0$, but this is the real world).

In particular:

- elements in $\mathscr{C}$ all lead to inferring a contact, and can
thus contribute to
*false positives*only; - elements in $\mathscr{D}$ can only contribute to
*false negatives*.

## False positives

Letâ€™s start with *false positives*. As we saw, we chose to infer $C$
from event $X$ because this applies:

which is equal to say:

This basically means that, when $X$ happens, we will be right in inferring $C$ with probability $P_{C|X}$, and fail otherwise.

For this reason, the term on the right is exactly our false positive probability, conditioned to event $X$:

Itâ€™s easy at this point to calculate the overall probability of a false positive, adding up all terms that can lead to this kind of error weighting them with the probability of each event $X$:

The summatory takes all items from the table corresponding to the row
for $D$ and to those columns that provide an inference to $C$; this sum
is weighted with the absolute probability of *distancing*, which makes
sense because a false positive means being distanced (which happens with
probability $(1 - P_C)$) but having ended up with classes that we
consider for a *contact*.

## False Negatives

Calculating the probability of *false negatives* is the dual of what we
considered in the previous section, so we end up with:

# Not all wrongs are created equal

As we saw, for each measured event $X$ with non-zero probability of being associated to $C$ and $D$ (i.e. $X$â€™s column in the table has both entries greater than zero) we end up with some probability of doing errors (respectively false positives and false negatives).

Do we think they are the same? Maybe not.

Just as you would probably pick your umbrella if the wheather forecasts tell you thereâ€™s a 40% probability of rain, at the risk of carrying it unnecessarily most of the times, we might want to be conservative and prefer false positives over false negatives.

For this reason, we might set a higher limit to the probability of false
negatives, and move all classes that would exceed this limit from
$\mathscr{D}$ to $\mathscr{C}$. This would be against the maximum
likelihood principle and would mean being *more wrong* in average, but
at least it would be our favourite flavor of being wrong.

Note that itâ€™s still meaningful to allow for some false negatives, though: in most cases, bringing this number down to zero would mean that every Bluetooth exchange is interpreted as a contact, which might not be economically feasible (e.g. for lack of sufficient testing capabilities).

# Fancier uses?

The mathematical model we ended up with can also be used for a *hybrid*
solution. Many things in life are not black and white, so why should
this be different?

At the basic level, we can consider that measuring event $X$ gives us two probabilities, i.e. one for inferring a contact ($C$) and one for inferring correct distancing ($D$). We already saw these two probabilities, expressed in terms of quantities we have or have assumed:

So, if our health testing capabilities are of $K$ tests per day, we might do like this:

- record all occurrences of every event $X$, without removing any;
- sort them by inverse $P_{C|X}$, i.e. from greater probability of a contact to lower ones;
- each day, remove the top $K$ and apply the health test to them.

This would allow prioritizing the most probable contacts, remain within
the limits of the SSN checking capabilities, and decide where to draw
the line of *excessive testing* when there are more available data.