Introduction and its role in insurance

The law of large numbers is a classical result of probability theory that formalizes a simple intuitive idea. That is, that the more times we repeat a probabilistic experiment, the better representation of its expected value we get.

If we throw a coin 2 times, it's quite probable that we get two heads, or two tails. However, after doing it ten thousand times we expect that half of the tosses are heads, and the other half tails (excluding the somewhat pedantic possibility of getting neither heads or tails).

This idea is the foundation for insurance. For one individual the variance of the loss distribution can be rather big, but as the number of individuals increases, the variance of the sample average decreases because big losses "cancel out" with small ones.

The theorem and its versions

There are two main versions of the law of large numbers: the weak and the strong LLN.

The weak version states the following: Let $X, X_1,X_2,\dots$ be i.i.d. random variables with finite expectation $\mathbb{E}X = \mu$. Then $$ \lim_{n\to\infty}P\left( \left|\frac{1}{n}\sum_{i=1}^nX_i -\mu\right|>\varepsilon\right) = 0. $$

The strong version states the following: Let $X, X_1,X_2,\dots$ be i.i.d. random variables with finite expectation $\mathbb{E}X = \mu$. Then $$ P\left(\lim_{n\to\infty}\frac{1}{n}\sum_{i=1}^nX_i =\mu\right) = 1. $$

This is the difference between two notions of convergence of random variables. Convergence in probability and convergence almost surely. In this scenario, convergence in probability essentially means that the probability of an event in which the sample average deviates from the expectation, decreases as $n$ tends to infinity. Whereas convergence almost surely means that the set of events for which the sample average doesn't converge to the expectation is of measuere (probability) $0$. Convergence almost surely is stronger than convergence in probability, hence the respective names for the strong and weak LLNs.

For insurance purposes, the weak version could be enough, at least qualitatively. Let's prove it for the case when $\operatorname{Var}(X)<\infty$. First, we need some results.

Chebyshev's inequality

Suppose that $X\geq 0$, $\mathbb{E}X<\infty$ and $a>0$. Since $X\geq X\mathbb{1}_{X\geq a}\geq a\mathbb{1}_{X\geq a}$ by definition of the indicator function, it also holds that $$ \mathbb{E}X\geq\mathbb{E}(X\mathbb{1}_{X\geq a})\geq\mathbb{E}(a\mathbb{1}_{X\geq a})=aP(X\geq a). $$ That is, $$ P(X\geq a)\leq \frac{\mathbb{E}X}{a}. $$ This is known as the Chebyshev's inequality. If $X$ is not positive but $\mathbb{E}X^2<\infty$, we can do the following: $$ P(X\geq a) \leq P(X^2\geq a^2)\leq \frac{\mathbb{E}X^2}{a^2}. $$ Notice that for a random variable with $0$ expectation, this is the same as $$ P(X\geq a) \leq \frac{\operatorname{Var}X}{a^2}. $$

Variance of a sum of independent random variables

Suppose now that $X_1,X_2,\dots, X_n$, where $n\in\mathbb{N}$, is a set of independent random variables. Then the variance of the sum is the sum of the variances. To see this, first note that the variance of a random variable $X$ is the same as that of the random variable $X-a$ where $a\in\mathbb{R}$:

\[ \begin{aligned} \operatorname{Var}(X-a) &= \mathbb{E}[(X-a)^2] - \bigl(\mathbb{E}[X-a]\bigr)^2 \\ &= \mathbb{E}[X^2 - 2aX + a^2] - (\mathbb{E}[X]-a)^2 \\ &= \mathbb{E}[X^2] - (\mathbb{E}[X])^2\\ &= \operatorname{Var}(X). \end{aligned} \]

Therefore, we can assume w.l.o.g. that $\mathbb{E}X_i =0\, \forall i=1,2,\dots,n$. We have:

$$ \begin{aligned} \operatorname{Var}\Bigl(\sum_{i=1}^n X_i\Bigr) &=\mathbb{E}\Bigl[\Bigl(\sum_{i=1}^n X_i\Bigr)^2\Bigr] \\ &=\mathbb{E}\Bigl[\sum_{i=1}^n X_i^2+2\sum_{1\leq i < j\leq n}X_iX_j\Bigr] \\ &=\sum_{i=1}^n \mathbb{E}[X_i^2]+2\sum_{1\leq i < j\leq n}\mathbb{E}[X_iX_j] \\ &=\sum_{i=1}^n \mathbb{E}[X_i^2] \quad \text{(since \(\mathbb{E}[X_iX_j]=0\) for \(i\neq j\))} \\ &=\sum_{i=1}^n \operatorname{Var}(X_i). \end{aligned} $$

Now, notice how for i.i.d. random variables, $\operatorname{Var}(\sum_{i=1}^nX_i)=n\operatorname{Var}(X_1)$, in comparison to summing $n$ times one particular random variable, where we get $\operatorname{Var}(nX_1)=n^2\operatorname{Var}(X_1)$. The intuition for this, is that when summing many i.i.d. random variables, ones will get high values, others low values, and their deviations from the mean will tend to cancel out, making the variance smaller. This is at the same time an intuition for the weak law of large numbers, which follows from these facts.

Proof of the weak law of large numbers

Now it's just a matter of applying the previous results to $\left|\frac{1}{n}\sum_{i=1}^nX_i -\mu\right|$:

\[ \begin{aligned} P\left(\left|\frac{1}{n}\sum_{i=1}^nX_i -\mu\right|>\varepsilon\right) &\leq \frac{\operatorname{Var}(\frac{1}{n}\sum_{i=1}^nX_i)}{\varepsilon^2}\\ &= \frac{n\operatorname{Var}(X)}{n^2\varepsilon^2}\\ &= \frac{\operatorname{Var}(X)}{n\varepsilon^2}\to 0\\ \end{aligned} \]

Q.E.D.

Premiums in insurance

Typically in insurance, the insurer issues policies (contracts) to an insured by which he will pay benefits (claim payment) in exchange for a set of premiums (payments of the insured to the insurer). The expected value of the benefits is often called pure net premium, or fair value. This is based on the law of large numbers, as for a large amount of insureds, the probability of the average benefit deviating from the expected value by more than $\varepsilon$ is very small. This is the source of all formulas for net premiums.

Then, gross premiums are calculated by doing expense loading. This is achieved by adding terms to the losses which represent administrative and operational costs, profits or risks/uncertainty.