Random graphs: The Erdős–Rényi G(n,p) model

Some mathematicians like probability, and some mathematicians like graphs, so it’s only natural that some mathematicians like probabilistic graphs. That is, they like to generate graphs at random, and then ask all sorts of questions about them: What are the features of a random graph? Will it be connected? Will it contain many triangles? Will it be possible to color it with k colors without coloring two adjacent vertices in the same color? Will it bring me tenure? They may also check if the graphs can be used in order to prove other theorems, or if they can represent real-world networks in some way.

The answers to all these very interesting questions (interesting to some of us, anyway) depend very much on what we mean when we say that we “generate a random graph”. In fact, for the phrase “generate a random graph” to be meaningful, you must tell me the distribution’s probability law; that is, you must tell me, for each graph, what is the probability of obtaining it. Only then can we carry out the calculations.

There are many ways of choosing such a law; in this series of posts I will present the random graphs commonly found in mathematics and network theory, and demonstrate some of their properties.

Erdős and Rényi. Left image taken from the Technion. Right image by Paul Halmos. Again, they coordinated their glasses.

Perhaps the simplest model for generating a random graph is called the Erdős–Rényi G(n,p) model. It goes as follows. You start with some number n of disconnected vertices. You then go over all possible edges one by one, and independently add each one with probability 0 \leq p \leq 1 . That’s it; you now have a random graph on n vertices.

This can also be viewed another way. If a graph has n vertices, then it can have up to {n \choose 2} = (n^2 - n)/2 possible edges: One for each pair of vertices. We can then represent a graph by a vector with {n \choose 2} entries which take values in \{0,1\} . Each entry represents a possible edge between two vertices in the graph: An entry with the value 1 indicates that the corresponding edge appears in the graph, while the value 0 indicates that the edge does not appear. In this view, the Erdős–Rényi G(n,p) model can then be seen as a product measure over vectors X \in {0,1}^{{n \choose 2}} . That is, it is just a sequence of {n \choose 2} i.i.d random variables, where each one is a Bernoulli variable with success probability p .

The above description is both an algorithmic construction – if you wanted to generate a random graph according to the G(n,p) model with a computer, you now know exactly what to do – and it gives an explicit probability for obtaining a particular graph. If a graph G has m edges, then we need exactly m successes in our biased coin tosses. Hence

\text{Pr} [G(n,p) = G ] = p^{m}(1-p)^{{n \choose 2} - m}.

You can readily check that if p = 1/2 , then the probability for obtaining a graph is independent of the number of edges. In fact, in this case, every graph on n vertices has an equal probability of being generated! In this sense the G(n,1/2) model is a way to define – and construct – a “uniformly random” graph.

Caption: a typical example of a G(200, 0.01) graph.

On the number of edges

What is the number of edges S_n in a random G(n,p) graph? Of course, with non-zero probability we can get a graph with either 0 or {n \choose 2} edges, so the question is actually “how many edges does a typical graph have?”. That is, what range of number of edges appears with highest probability?

Calculating the expected number of edges is easy. If (X)_{i} is the vector of Bernoulli random variables representing the graph, then by linearity of expectation,

\mathbb{E}[S_n] = \mathbb{E} \sum_{i=1}^{n \choose 2}X_i = p{n \choose 2}.

This type of calculation always works, whether we are counting how many edges, or how many triangles, or how many truncated dodecahedrons there are in the graph, because the expectation is always linear, no matter what the dependence is between the edges of subgraphs that we are counting.

As for the typical number of edges – it is here that we are in luck, because of the model’s edge independence. The number of edges in the graph is actually just a sum of {n \choose 2} independent Bernoulli random variables. This means we can use a standard concentration of measure trick, the same as was used in the post about the Johnson-Lindenstrauss lemma. More specifically, we ask: For a given \delta > 0 , what is the probability that the number of edges in the graph is greater than (1+\delta) times the expected number of edges, or smaller than (1-\delta) times the expected number of edges? In other words (formulae), what is

\text{Pr}[|S_n - \mathbb{E}[S_n]| > \delta \mathbb{E}[S_n]]?

As it turns out, this probability is exponentially bounded; that is, there exists a C > 0
(which depends on p ) such that

\text{Pr}[|S_n - \mathbb{E}[S_n]| > \delta \mathbb{E}[S_n]] < e^{-C n^2}.

To show this, we use Markov’s inequality, which states that if W is a positive random variable, then for every a > 0 ,

\text{Pr}[W \geq a] \leq \frac{\mathbb{E}W}{a}.

Intuitively, all this inequality says is that if the expectation is small, then it’s impossible to have too large a probability of getting too large a value. Markov’s inequality is innocuous enough, but it becomes powerful when we combine it with exponential moments, that is, when we apply to e^{tW} instead of to W itself. Let’s see how that works. Denote the total possible number of edges by M = {n \choose 2} .

\text{Pr}[S_n \geq (1+\delta) p M]

= \text{Pr}[S_n - (1+\delta) p M \geq 0]

= \text{Pr}[e^{t(S_n - (1+\delta) p M)} \geq 1].

Here t is some number greater than 0, and we will optimize over it soon in order to get the best possible bounds. Now we invoke Markov’s inequality with a = 1 , getting

\leq  \mathbb{E}[e^{t S_n}] e^{-t(1+\delta)pM}

The random variable S_n is just a sum of independent random variables S_n = \sum_{i=1}^{M}X_i , so the expectation of the product is the product of expectations:

\mathbb{E}[e^{t S_n}] = \mathbb{E}[e^{t \sum_{i=1}^{M}X_i}] = \mathbb{E}[e^{t X_i}]^{M}.

This last expectation is easy, since we know the law of X_i exactly: It is 1 with probability p and 0 with probability 1-p :

\mathbb{E}[e^{t X_i}] = p e^t + (1-p)

so our initial probability becomes

\text{Pr}[S_n \geq (1+\delta) p M] \leq  (p e^t + (1-p))^{M} e^{-t(1+\delta)pM}.

All that is left is to minimize this last expression as a function of t , which you can do for yourself, I guess, if you are that kind of person. A similar bound can also be obtained for the other inequality, although that requires a little more effort which we will not show here.

Unfortunately, this trick does not work if we want to count some other subgraph in G(n,p) . For example, the expected number of triangles in such a graph is p^3 {n \choose 3} (why?). But alas, triangles are not independent: If two triangles share an edge, then knowing whether one triangle exists or not affects the probability that the other one exists. Calculating the expectation \mathbb{E}[e^{M X}] by looking at the product \mathbb{E}[e^X]^M then no longer works.

As it turns out, however, the probability of getting a number of triangles very far away from the mean IS exponentially small. Showing this requires going deeper into the lovely theory of large deviations, which we will postpone to a later post.

Threshold phenomena

One interesting thing we can do with random graphs is have p , the probability for having an edge, go to 0 as a function of n . This allows our random graphs to typically be what is known as “sparse graphs”. Sparse graphs are families of graphs whose number of edges is (eventually) smaller than c \cdot {n \choose 2} for every c > 0 . In other words, these are graphs with o(M) edges. We expect many real-life large networks to be sparse; in fact, a dense graph, one with O(n^2) edges, is in some way very non-local, since each vertex on average has many (Cn ) neighbors. As an example of a sparse graph, consider the Facebook graph, i.e the graph whose vertices are Facebook users with an edge between two users if they are “Facebook friends”. While there can potentially be an unbounded number of users in the system, for each particular user, we expect them to know only a finite collection of people, and in any case, certainly not a constant fraction of the entire user database. If there are n users and the average user has (say) about 500 Facebook friends, then we should expect about 250n edges; certainly smaller than n^2 !

Anyway, once we take p to go to 0, we can speak about threshold phenomena. For example, if p is very large, then we can expect the typical G(n,p) graph to be connected. However, if p is too small, then there will hardly be any edges in the graph at all, and it will almost certainly be disconnected. There might then be some decreasing function p(n) , so that if p is slightly larger than p(n) , the typical graph will be connected, while if p is slightly smaller than p(n) , the typical graph will be disconnected. Such a function is called a “threshold”. More formally, a function p(n) is a threshold for some property A if for every w(n) that goes to \infty with n (no matter how slowly), we have

1)~\text{Pr}[G(n,\frac{p(n)}{w(n)}) \text{ satisfies } A] \rightarrow 0

and

2)~\text{Pr}[G(n,p(n)w(n)) \text{ satisfies } A] \rightarrow 1.

Let’s take the property “G has a triangle”, for example. What is the threshold function for this property? We start out the same as we did for edges: Denote the number of triangles by T , written as

T = \sum_{i=1}^{n \choose 3} T_i

where each T_i corresponds to some triplet of vertices in the graph, and is 1 if there is a triangle between them and 0 otherwise. The expected number of triangles is then

\mathbb{E}[T] = p^3 {n \choose 3}

since the probability for a single triangle to appear in the graph is p^3 . Now, {n \choose 3} is asymptotic to n^3/6 , so the expected number of triangles is pretty well approximated by \approx (pn)^3/6 . The number of triangles is always non-negative, and is always an integer. We can then use Markov’s inequality (the non-exponential version this time), to get a bound on the probability for having at least one triangle:

\text{Pr}[T > 0] = \text{Pr}[T \geq 1]

= \text{Pr}[T \geq  \mathbb{E}[T] \cdot \frac{1}{\mathbb{E}[T]} ] \leq \mathbb{E}[T] \approx (pn)^3 / 6.

This means that p(n) = 1/n is a good candidate for a threshold function! Indeed, if w(n) is a function that tends to \infty as n \rightarrow \infty , then the probability of having a triangle in G(n,p(n)/w(n)) is dominated by

(pn)^3 = (\frac{n}{n \cdot w(n)})^3 = \frac{1}{w^3(n)} \rightarrow 0.

Thus the function p(n) = 1/n satisfies item (1) in our two conditions for being a threshold function. Of course, there are many functions that satisfy this condition; for example, we could have definitely taken e^{-n} as well. But it seems intuitive that 1/n is the right choice, since it is the slowest decreasing function that still sends the probability to 0 for any w(n) ; taking something like e^{-n} would just be overkill in this sense.

Of course, to be formal, we still have to show that item (2) is satisfied. One way to do this is to look at the variance of T , defined as \mathbb{E}[T^2] - (\mathbb{E}[T])^2 . We won’t do the calculation here, but here is an overview: First open up the variance so as to get a sum of products of the form T_i T_j . Then consider the types of dependencies involved (i.e two triangles which don’t share an edge are independent, two triangles which share a single edge are dependent, and two triangles which share two edges are actually the same triangle). This will give a variance which is proportional to p^3 {n \choose 3} , plus some higher order terms.

Having obtained this value, we use Chebyshev’s inequality (also spelled Chebychev, Chebysheff, Chebychov, Chebyshov, Tchebychev, Tchebycheff, Tschebyschev, Tschebyschef, Tschebyscheff or Чебышёв). Chebyshev’s inequality states that for any a>0 and any random variable W ,

\text{Pr}[|W - \mathbb{E}[W]| \geq a] \leq \frac{Var[W]}{a^2} .

The keen-eyed among you will notice that this is in fact just Markov’s inequality applied to the variance, and indeed, some people like to call this Markov’s inequality as well. Anyway, applying this to triangle counts, we notice the following. The event “T is 0” is contained in the event “T is at least its expectation further away from its expectation”. Indeed, if T is exactly \mathbb{E}[T] away from \mathbb{E}[T] , then it is either 0 or it is 2\mathbb{E}[T] . We can then write

\text{Pr}[T = 0] \leq \text{Pr}[|T - \mathbb{E}[T]| \geq \mathbb{E}[T]] \leq \frac{1}{Var[T]} \approx (pn)^{-3}.

The last inequality is the application of Chebyshev. And now, if w(n) is a function that tends to \infty as n \rightarrow \infty , then the probability of not having a triangle in G(n,p(n)w(n)) goes to 0, so the probability of having a triangle goes to 1!

Nice. That was both cool and easy. Cool, because it demonstrates that as we change p to depend more strongly on n , there is a “phase transition” in the number of triangles, which go from some finite number to 0 rather abruptly at p = 1/n . Easy, because we only had to use some variance calculations. There are other properties that exhibit a phase transition – for example, whether or not there exists a giant component in the graph that encompasses a constant fraction of all vertices – but these are usually much harder to prove.

An application

Here is a nice application of G(n,p) to the real world: party theory (also known as “Ramsey Theory” among non-hip mathematicians). We’ll begin with an example. Suppose that Jack throws a party, but only six other people show up. Then here is a fact about the guests: out of those six people, either there are three people who all know each other from before the party, or there are three people where this is the first time they ever met each other.

The standard proof goes like this. We’ll model the relationship between the guests with a graph. Each guest will be a vertex, and there will be a red edge between two guests if they knew each other from before, and a blue edge if they were strangers. Then the statement above translates to “either there exists a red triangle, or there exists a blue triangle”. Pick some vertex v from the graph; in the following image, we colored that vertex green, and colored all the edges gray, indicating that we don’t know what colors they are yet.

This vertex has 5 neighbors, and the edges between them are colored in either red or blue. This means that one color appears at least three times; suppose that it’s red.

Now look at the endpoints of the three red edges emanating from v . If any of these three endpoints are connected with a red edge, then there exists a red triangle, consisting of this edge together with two of the original three red edges.

And if non of the three endpoints are connected with a red edge, then they are all connected with a blue edge, which means there is a blue triangle. Done!

In general, we define the Ramsey number R(s,t) to be the smallest number n such that if you take the complete graph on n vertices and color its edges in whichever way you want, there will always be either a red clique of size s or a blue clique of size t . What we have shown above is that that R(3,3) \leq 6 . In fact, it is equal to exactly 6 , since you can find a coloring of the 5-vertexed complete graph that doesn’t contain either a red or a blue triangle.

Finding Ramsey numbers explicitly is very, very hard, so we try to get bounds for them. Here is a way to do using random graphs.

Theorem: R(k,k) \geq \sqrt{2}^{k} .

Proof: We adjust the G(n,p) model, so that for every possible edge, we color it red with probability 1/2, and color it blue with probability 1/2. In essence, we generate a random coloring. What is the probability that there exists a monochromatic clique of size k inside this coloring? Well, for any particular choice of k vertices from the graph, the probability that all the edges between are red is exactly (1/2)^{k \choose 2} . The same goes for blue, so the success probability for a particular choice of k vertices is 2^{1 - {k \choose 2}} . Now let’s go over the all choices of k vertices from the original graph; by the union bound, the sum of these probabilities is larger than the probability for getting a monochromatic clique. Thus, since there are {n \choose k} possible cliques, we have

\text{Pr}[\text{Monochromatic clique}] \leq {n \choose k}2^{1 - {k \choose 2}}.

Suppose that n is such that the bound on the probability is smaller than 1 . This means that the probability for getting a random coloring without any monochromatic clique is greater than 0, which in turn means that there has to exist such a coloring! Thus R(k,k) > n for all such n . Let’s bound the probability then. You can readily verify that if k > 3 , then 2 {n \choose k} < n^k . Our probability can then be bounded by

\text{Pr} < n^{k}2^{- {k \choose 2}} = e^{k \log{n} - \frac{k^2}{2} \log{2}}

For this to be smaller than 1, we need the exponent to be smaller than 0. Thus

\log{n} \leq \frac{k}{2} \log{2}

which indeed means that we can choose any n \leq \sqrt{2}^{k} , proving our theorem.

The above proof is a typical use of G(n,p) graphs: When we want to show that there exist graphs with some property, one way of doing so is to generate a random graph, and show that it satisfies that property with non-zero probability. This is part of a larger tool-set called “the probabilistic method”.

A more complicated application of this method is in the field of expander graphs, which, roughly speaking, are sparse graphs that are so well connected that you can get from any one vertex to another in a small number of steps, even if at every vertex you choose a random edge to walk on. But this will have to wait for another post.

Real life

Unfortunately, apart from the dramatic implications to parties (as well as being a delightful discussion topic at the parties themselves), the Erdős–Rényi G(n,p) model doesn’t describe real life networks very well. For example, many networks exhibit edge dependencies. In a social network, if two people have many mutual friends, it is more likely that they themselves are friends. This phenomenon cannot be captured by G(n,p) , whose edges are by definition independent. Further, many networks exhibit a power-law distribution of vertex degrees: if we write down the degree of each vertex, sort the vertices in decreasing order according this degree, and draw a plot of the degree vs. the vertex number, we get a decaying power law. The plot should look something like this:

However, G(n,p) does not exhibit this power law distribution. This is the degree plot in a typical G(n,p) graph:

These inadequacies, among others, bring us to an agonizing conclusion: the G(n,p) model is fundamentally inapplicable for the analysis of real-world networks. But fear not! We will continue studying it for its simplicity and elegance. And if we want to allude to the real world, we must turn elsewhere, to other random graph distributions; in other words, to the next posts in the series…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s