### Low-dimensional archers have it easy

Part 1: Archery

Lately I’ve been interested in archery. The target of target archery is to shoot a circle square in the middle. The target is usually placed quite far away, some 70 meters in the case of the Olympic games, so it’s not always very easy. Here is one of my volleys from 18 meters:

If you look closely at the picture, you will see that each ring has a number on it, indicating its score; the closer your arrow hits the center, the more points you get. This particular target has numbers ranging from 5 to 10, with each ring having the same width (i.e for each ring, the radius of the outer circle minus the radius of the inner circle is the same). The image below shows a 2d schematic of where my arrows hit:

In this batch, I got 10+8+8+7+7+7 = 47 points. One of the things you might notice is that while the “average shot” is actually not too far off from the center, only one of my arrows hit the 10 point mark, with the rest being absolute jerks and skirting about the average. This is because in order to hit the exact same spot, it’s generally advisable to fire your arrows from the exact same position, i.e your body posture has to be exactly the same before each shot. But this is usually a hard thing to do: There are a lot of joints and muscles in the human body, and each one offers several degrees of freedom. Can you really duplicate the posture of your elbows, wrists, neck (for aiming), back, hip, chest, shoulders, knees and toes (et. al)? There’s bound to be a bit of variation between one arrow and another, even without considering averse external conditions such as wind and rain. Indeed, in this sense Olympic target archery is quite boring: All you’re trying to do is just stand the same way twenty times in row.

Part 2: Math and arrows

Now, I’m quite certain that people came up with various sophisticated statistical models which try to describe the distribution of the archer’s arrows on the target, but for the purpose of this post I am definitely going to blatantly ignore all of them and just assume that each arrow hits a position $Z \in \mathbb{R}^2$ which distributes as a Gaussian (Normal) random variable $N((0,0), \sigma^2 \cdot \mathrm{Id})$ centered around the 10-point mark. In plain words, the arrows scatter around the 10-point mark in a bell curve, and the width of the curve is determined by the standard deviation $\sigma$. Rookie archers (like me) have a very high $\sigma$, while experienced professionals have $\sigma$ very close to 0.
For simplicity, I’ll assume that the target has radius 1. Since each ring in the target has the same width, the number of points won by a particular shot is then given by a simple function of the magnitude of $Z$:

$f(Z) = \begin{cases} 0 & ~~~1 < ||Z|| \\ 5 & 5/6 < ||Z|| \leq 1 \\ 6 & 4/6 < ||Z|| \leq 5/6 \\ 7 & 3/6 < ||Z|| \leq 4/6 \\ 8 & 2/6 < ||Z|| \leq 3/6 \\ 9 & 1/6 < ||Z|| \leq 2/6 \\ 10 & ~~~~~~~~||Z|| \leq 1/6 \end{cases}$

If $\sigma$ is known, a diligent computer can use this function to easily calculate the probabilities of getting the different scores.

Now, I’m very much a newbie, and most of the time (if I even hit the target at all) I get fives, sixes and sevens. But still, occasionally fortune smiles upon me, and I manage to land an eight, nine, or even the ever-desired ten. And despite the fact that life in general is devoid of any meaning and we are all lost souls drifting hopelessly in space, this still brings some warmth to my heart. So it is at this point that I must say:
Thank you, dear universe, for being three dimensional!
(and not higher). Because otherwise, beginner archers like me would practically never score a 10, and in fact everyone would just score $f(\sigma)$ almost all the time.
I’ll explain what I mean by this. We can imagine a world that is two-dimensional, where cute two-dimensional creatures launch one-dimensional arrows into a one dimensional target. In this case, the position of the arrow is just a univariate Gaussian random variable, and the score function $f$ only depends on $|Z|$.

We can also imagine a world that is three-dimensional, where cute three-dimensional creatures launch one dimensional arrows into a two dimensional target. Incidentally, this is the world we currently live in.

The next worlds we can’t actually really imagine, but we can imagine that we can imagine a general world that is $(d+1)$-dimensional, where cute $(d+1)$-dimensional creatures launch one-dimensional arrows into a $d$-dimensional target. In this case, the position $Z \in \mathbb{R}^d$ of an arrow is a multivariate, zero mean normal random variable $N(\mathbf{0}, \sigma^2/d \cdot \mathrm{Id})$. Naturally, I do not have an image readily available.
(Technical note: The variance of the normal random variable is $\sigma^2 / d$. This is because a standard normal random variable $N(\mathbf{0},\mathrm{Id})$ tends to have norm of about $\sqrt{d}$ (this can be shown by a direct-but-unpleasant calculation which will not be presented here). The normalization $1/d$ is there to make sure that our young archer still hits a target of radius 1).

Now, here’s the thing about the norms of $d$-dimensional Gaussian distributions: They are really, really, really well concentrated around their means.

Theorem (Gaussian concentration): Let $Z \sim N(\mathbf{0}, \sigma^2 / d \cdot \mathrm{Id})$ be a $d$-dimensional Gaussian random variable. Then for any $r > 0$,

$\mathbb{P}[ \left| ~||Z|| - \sigma \right| \geq r \sigma /\sqrt{d}] < 2 \exp(-r^2 / 2).$

Basically, what this theorem tells us is that in high dimensions, there is almost no hope for an archer to hit different rings in the target. She will almost always hit the ring at distance $\sigma$ from the center, and that’s that. As a concrete example, suppose that $\sigma = 0.6$, so that her average arrow gives her just 7 points. To get an 8, she needs her arrows to hit at least $0.1$ closer to the bullseye. In light of the above theorem, this requires choosing $r = 0.1 \sqrt{d} / \sigma = 0.17 \sqrt{d}$. The probability of hitting this far away is smaller than $2 \exp(-0.01 \cdot d)$. That’s exponential in $d$! If the dimension is large enough, this probability is minuscule; our beloved high dimensional rookie archer will need to fire billions and billions of arrows until she succeeds in scoring a value other than 7! Sadly, this would not increase target archery’s reputation as a spectator sport.

Part 3: More proofs, less arrows

One way you can prove the Gaussian concentration theorem is by using moment generating functions, like we did in the post about the Johnson-Lindenstrauss lemma. But math, unlike archery, is not really about doing the exact same thing in perfect succession. Rather, in this post we’ll prove it using “concentration of measure as derived from isoperimetric inequalities”.

Our first task is to make ourselves acquainted with the isoperimetric concentration function, who turns out to be quite an agreeable chap once you get to know him.
Basically, the standard $d$-dimensional Gaussian distribution $N(0, \mathrm{Id})$, which we denote by $\Gamma$, makes $\mathbb{R}^d$ into a mathematician’s favorite kind of space: A metric measure space! (well, favorite for some mathematicians, anyway). It is a metric space, because there is a well defined distance between every two points $x$ and $y$, namely, the norm $||x-y||$. It is a measure space because we can assign to every nice enough set $A \subseteq \mathbb{R}^d$ a volume, namely $\Gamma(A)$. Since $\Gamma$ is a probability distribution, the volume of the entire space is 1: $\Gamma(\mathbb{R}^d) = 1$. Note that this notion of volume, called “Gaussian measure”, differs tremendously from our regular, Euclidean notion of volume: The volume of the entire space is not infinite, multiplying the side-lengths of a box by $2$ doesn’t increase the volume by $2^d$, and in general, Euclidean geometry goes out the window.
Even so, we can still mix up distances and volumes, and ask how they relate to each other. One of the more (in)famous questions in this vein is “Given a volume $v$, what shape of volume $v$ minimizes the surface area?”. For example, in regular good-ol’ Euclidean space, the shape that minimizes the surface area is a sphere. This is intuitive for anyone who played with soap bubbles, but not very easy to prove even after taking numerous baths. These types of problems are called isoperimetric problems, and you can find plenty of literature about them, for all sorts of odd spaces.

We haven’t actually said what surface area means, and it isn’t always all that obvious when dealing with the wildly weird sets that mathematicians sometimes summon from the deep depths of hell. Here I’ll use the following definitions, which work well for pretty much all spaces $X$ with a metric function $\rho(x,y)$ and a probability measure $\mu$. For a set $A \subseteq X$ and a number $r > 0$, the $r$-inflation of $A$ is defined as

$A_r = \{x \in X \mid \rho(x,A) < r \},$

that is, the set of all points which are at distance smaller than $r$ from $A$. You can think that if $r$ is very very tiny, then $A_r$ is just $A$ itself plus a very thin shell around $A$.

The surface area of $A$ can then be defined as

$\mathrm{Surface~area} = \liminf_{r \to 0} \frac{\mu(A_r) - \mu(A)}{r}.$

In other words, you measure the volume of the tiny shell you just added, divide by its thickness, and then send that thickness to $0$.

For our purpose, though, we won’t use the surface area and talk explicitly about the isoperimetric inequality, but rather ask: Suppose we have a set of large size, say, of at least half the total volume of our space. What happens when we inflate it by a small number $r$? Will it only increase marginally in volume, or will it suddenly explode, filling up almost all of the entire space? To this end, we define the concentration function $\alpha(r)$ for the space $(X,\rho,\mu)$:

$\alpha_\mu(r) = \sup \{1 - \mu(A_r) \mid A\subseteq X \mathrm{~s.t~} \mu(A) \geq 1/2 \}.$

Suppose $A$ has volume exactly $1/2$. If inflating $A$ by $r$ doesn’t change its volume too much, then $\mu(A_r)$ will be almost $1/2$, and $1 - \mu(A_r)$ will also be very close to $1/2$ as well. So if there exists even a single set $A$ which doesn’t blow up when inflating, we’ll get a large value in our concentration function. But if, on the other hand, for every $A$ it turns out that $\mu(A_r)$ is enormous and takes up almost all the volume, then $1 - \mu(A_r)$ will be very close to $0$, and the concentration function will be very small.
You can already smell that the concentration function is related to isoperimetric problems I described earlier: If a set has a large surface area, then inflating it by $r$ will cause it to fill up a lot of the space. On the other hand, sets with small surface area might not fill up a lot of space. So finding a set with small surface area is like finding a set for which $1-\mu(A_r)$ is large.
The function $\alpha_\mu$ is known for some metric measure spaces, and amongst them our beloved Euclidean-distance Gaussian-measure space $(\mathbb{R}^d, ||x-y||, \Gamma)$. The following theorem, while appearing as mere harmless pixels on your screen, actually involves considerable effort, mixed in with some tears and toil:

Theorem (Gaussian isoperimetry):

$\alpha_\Gamma(r) \leq e^{-r^2/2}.$

One way to prove this is to roll up your sleeves and find a (unique) set $A \in \mathbb{R}^d$ for which $1 - \Gamma(A_r)$ is maximal. It turns out that this is a half-plane, but showing it is a matter for another post. The thing to take home is this: For even moderately large values of $r$, the expression $e^{-r^2/2}$ is tiny. In this case we say that “$\Gamma$ has normal concentration”, which, given that $\Gamma$ is a normal distribution, isn’t that surprising.

Now that we made friends with the Gaussian isoperimetric concentration function, it’s time to ruthlessly exploit it.

Theorem (Isoperimetric concentration of measure): Let $(X,\rho, \mu)$ be a metric measure space, and let $F: X \to \mathbb{R}$ be a real valued continuous function with Lipschitz constant $||F||_{\mathrm{Lip}}$. Let $m_F$ be a median of $F$, i.e $\mu({F \leq m_F}) \geq 1/2$ and $\mu({F \geq m_F}) \geq 1/2$. Then

$\mu(\{|F - m_F| \geq r \}) \leq 2 \alpha_\mu(r / ||F||_{\mathrm{Lip}}).$

To put it plainly: The isoperimetric concentration function $\alpha_\mu$ puts strong bounds on the concentration of measure of nicely-behaving functions! If $\alpha_\mu$ is small, then continuous functions have a really hard time being far away from their median value; they just can’t help being concentrated around it.
Maybe a few reminders are in order. The Lipschitz constant of a function $F$ is the smallest constant $c$ so that $|f(x) - f(y)| \leq c \cdot \rho(x,y)$ for all $x,y\in X$. In other words, it controls by how much $F$ can change as the inputs moves over the domain. If $Z$ is a random variable, a median $m_Z$ of $Z$ is a number such that $\mu({Z \leq m_Z}) \geq 1/2$ and $\mu({Z \geq m_Z}) \geq 1/2$. In other words, half the time $Z$ is above this value, and half the time it is below.

The keen-eyed among you will have already noticed that in fact, this theorem gives us exactly what we wanted to prove concerning the norms of Gaussians in $\mathbb{R}^d$. We just need to apply it for our specific case: We choose our space to be $X = \mathbb{R}^d$, the distance to be $\rho(x,y) = ||x-y||$, and the measure to be $\mu = \Gamma$. As our Lipschitz function, we choose the norm: $F(x) = ||x||$. This function has a Lipschitz constant of 1, since by the reverse triangle inequality,

$F(x) - F(y) = ||x|| - ||y|| \leq ||x - y||.$

We also need to calculate the median of $F$, i.e the median value of $||Z||$ when $Z$ is a $d$-dimensional standard normal random variable. Now, in this case, by “calculate”, I mean “look up”, since these sorts of calculations often tend to be tedious. Luckily, we live in the golden age of the Internet, and this information is readily accessible if you’re willing to trust the writings of people you’ve never met. Since the real function $x \mapsto x^2$ is strictly increasing on the positive axis, the median value of $||Z||$ will just be the square root of the median value of $||Z||^2$, as squaring does not change the relative order of numbers. The distribution of $||Z||^2$ is the well known $\chi^2$ distribution with $d$ degrees of freedom, which according to Wikipedia has median $d$ (up to some insignificant correction). The median of $F$ is therefore $\sqrt{d}$.

Plugging this into the concentration of measure theorem and using the Gaussian isoperimetric bound, we get that if $Z$ is a standard normal $N(\mathbf{0}, \mathrm{Id})$ in $d$ dimensions,

$\mathbb{P}(\{\left| ~||Z|| - \sqrt{d} \right| \geq r \}) \leq 2 e^{-r^2/2}.$

We now only have to fix the standard deviation: Our original discussion used not a standard Gaussian, but rather one with a variance of $\sigma^2 / d$. This amounts to multiplying both sides of the inequality inside the probability brackets by $\sigma / \sqrt{d}$, giving the desired result.

As a famous professor in my department once said: “Good”. Now we only have to prove the isoperimetric concentration of measure theorem, and we’re done! Luckily, this is not too difficult. Consider the set $A = {x\in X \mid F(x) \leq m_F }$, i.e the set of all points in space on which $F$ is smaller than its median. This is a set of measure at least $1/2$, by the very definition of the median. Now let $y \in X$ be a point which is at most a distance $r$ from $A$, i.e there exists a point $x \in A$ so that $\rho(x,y) \leq r$. Since $F$ is Lipschitz, the value $F(y)$ cannot be too large:

$F(y) \leq F(x) + \rho(x,y) ||F||_{\mathrm{Lip}} \leq m_F + r||F||_{\mathrm{Lip}}.$

This means that for every $y \in A_r$, we have $F(y) \leq m_F + r||F||{\mathrm{Lip}}$. Hence the set of all points for which $F$ obtains a value larger than $m_F + r||F||{\mathrm{Lip}}$ must have a measure smaller than $1 - \mu(A_r)$. But by definition, the concentration function $\alpha_\mu(r)$ is the supremum of $1-\mu(A_r)$ over all sets $A$ of measure at least $1/2$. We thus have

$\mu(\{F > m_F + r ||F||_\mathrm{Lip} \}) \leq \alpha_\mu (r ),$

or, equivalently,

$\mu(\{F > m_F + r \}) \leq \alpha_\mu (r / ||F||_\mathrm{Lip}).$

A similar calculation can be made for $F < m_F - r$ by looking at the function $-F$, and union bounding the two gives us the isoperimetric concentration theorem.