Random graphs: The exponential random graph model

Some mathematicians like probability, and some mathematicians like graphs, so it’s only natural that some mathematicians like probabilistic graphs. That is, they like to generate graphs at random, and then ask all sorts of questions about them: What are the features of a random graph? Will it be connected? Will it contain many triangles? Will it be possible to color it with $k$ colors without coloring two adjacent vertices in the same color? Will it bring me tenure? They may also check if the graphs can be used in order to prove other theorems, or if they can represent real-world networks in some way.

The answers to all these very interesting questions (interesting to some of us, anyway) depend very much on what we mean when we say that we “generate a random graph”. In fact, for the phrase “generate a random graph” to be meaningful, you must tell me the distribution’s probability law; that is, you must tell me, for each graph, what is the probability of obtaining it. Only then can we carry out the calculations.

There are many ways of choosing such a law; in this series of posts I will present the random graphs commonly found in mathematics and network theory, and demonstrate some of their properties.

Last Time we discussed the Erdős–Rényi $G(n,p)$ model for random graphs. It was a simple model, in which whether or not a given edge appears in the graph was independent from all other edges. This independence gave us an easy time both when generating the graph, and when analyzing some of its properties. However, this independence also made it inadequate for representing real-life networks. The model we will discuss now, exponential random graphs, denoted $G_{n}^{f}$, allows for dependence between edges. It is easy to describe, but notoriously difficult to analyze. It goes as follows.

Let $n$ be the number of vertices we want there to be in the graph, and let $f$ be some function on graphs with $n$ vertices. For example, for a given graph $G$, $f(G)$ may count the number of triangles in $G$; it may return the size of the largest clique in $G$; or it may return 1 if $G$ is a tree and 17 otherwise. Borrowing from its physical origins, the function $f$ is often called the Hamiltonian of the model. Using $f$, we define the following probability distribution on graphs with $n$ vertices:

$\text{Pr} [G_{n}^{f} = G ] = e^{f(G)}/Z,$

where $Z$ is a normalizing constant that is just there to make sure that the probabilities sum up to $1$; it may be written as $Z = \sum_{G} e^{f(G)}.$
The important thing to take home is that the probability to get a graph is proportional to $e^{f(G)}$.

Ok, I admit, I haven’t actually told you anything when I wrote this equation, since $f$ was arbitrary, and with arbitrary Hamiltonians we can get arbitrary distributions. That is, for every probability distribution $\pi$ on $n$-vertex graphs, we can always find an $f$ so that $\pi = G_{n}^{f}$; simply take

$f(G) = \log{\pi(G)}.$

This means that in order to get any meaningful analysis, we must restrict ourselves to specific classes Hamiltonians; every different Hamiltonian we choose will probably lead to different behavior (which will probably require different analysis). The decision to have the probability be proportional to $e^f$ and not just $f$ is due to the statistical-mechanics and entropy origin of the formulation, which we won’t discuss here (but it’s also an easy way to keep the probabilities positive).

One very popular class of Hamiltonian are the subgraph-counting functions, which we will use for the rest of this post. These are functions which count how many times a given graph appears inside of $G$. We may count edges, triangles, 27-spoked wheels, cliques, stars; anything really. For example, in the following graph, there are 15 edges, 0 triangles, 12 pentagons and 10 hexagons. However, there are no 27-spoked wheels.

More formally, let $H_{1},\ldots,H_{\ell}$ be fixed finite simple graphs, and denote by $N(H_{i},G)$ the number of copies of $H_{i}$ inside $G$. A subgraph-counting function is then a function of the form

$f(G) = \sum_{i=1}^{\ell} \beta_{i} N(H_{i},G)$

where $\beta_i$ are real constants called weights.

The main idea here is that using subgraph-counting functions, we can more-or-less control the number of subgraphs of different types in $G_{n}^{f}$. Suppose that $\ell = 1$ so that $f$ counts just a single subgraph $H$ with weight $\beta$. If $\beta$ is a large positive number, then each time $H$ appears as a subgraph in $G$, $f$ gets a strong boost. Graphs with lots of copies of $H$ then get a very large values, and will appear with higher probability than graphs without many copies of $H$. On the other hand, if $\beta$ is a large negative number, then $f$ is heavily penalized each time $H$ appears as a subgraph in $G$. Graphs with lots of copies of $H$ then get a very negative weight, and will appear with tiny probability compared to other graphs. Of course, in the general case, $f$ may count multiple subgraphs and give each one a different weight, complicating the matter very much: how will things behave if we give a large bonus to graphs with lots of 4-cliques, but at the same time penalize graphs with lots of triangles?

The relation to G(n,p)

The subgraph-counting exponential random graph model $G_{n}^{f}$ is agonizingly more complicated than the $G(n,p)$ model we discussed last time. In fact, it strictly contains it! Here is a particular case of a subgraph-counting function which reproduces the $G(n,p)$ distribution:

$f(G) = \log\dfrac{p}{1-p} \cdot |E(G)|,$

where $|E(G)|$ is the number of edges in $G$. This is indeed a subgraph-counting function: we have $|E(G)| = N(H,G)$ where $H$ is the edge graph, and $\beta = \log\frac{p}{1-p}$. Under the exponential random graph model, the probability to obtain a graph $G$ with Hamiltonian $f$ is then just

$\text{Pr} [G_{n}^{f} = G ] = e^{f(G)}/Z$

$= e^{\log{p} \cdot |E(G)| -\log{(1-p)}\cdot |E(G)| }/Z$

$=p^{|E(G)|}(1-p)^{-|E(G)|}/Z.$

The denominator $Z$ should be such that the sum over all graphs $\sum_{G} \text{Pr} [G_{n}^{f} = G ]$ equals 1. We will “guess” that

$Z = (1-p)^{-{n \choose 2}},$

giving

$\text{Pr} [G_{n}^{f} = G ] = p^{|E(G)|}(1-p)^{{n \choose 2}-|E(G)|}.$

This is exactly the probability of obtaining a graph in the $G(n,p)$ model! So the exponential random graph model is indeed stronger; we only need to count one tiny graph (an edge) to get $G(n,p)$.

On the other hand, counting pretty much anything else already sets up a dependence between the edges. For example, suppose that you generate a graph just by penalizing the number of triangles in it. Look at some three vertices $v_1$, $v_2$, $v_3$. Conditioned on the event that the edges $v_1 \leftrightarrow v_2$ and $v_2 \leftrightarrow v_3$ exist in the graph, the probability that the graph contains the edge $v_1 \leftrightarrow v_3$ as well is lower than the probability that it doesn’t contain it, because its appearance would add another triangle and therefore give $f$ a smaller value. So the edge $v_1 \leftrightarrow v_3$ depends on the the two other edges $v_1 \leftrightarrow v_2$ and $v_2 \leftrightarrow v_3$.

How to generate

Talking abstractly about probabilities is nice and all, but a good question to ask is how to actually generate an exponential random graph yourself on your computer. There are some good reasons for wanting to do so: For example, if you generate an exponential random graph by counting triangles, you can estimate the expected number of squares by simply generating many exponential graphs and counting the number of squares they contain. At the very least, this might give you some intuition as to the structure of these graphs. We’ll see some more uses up ahead.

This problem is known as “sampling”, and at first glance, doesn’t seem like too big of a deal: for every $n$ there is only a finite number of graphs on $n$ vertices, and our problem is to sample from a finite set with a given probability distribution. This can always be done as follows. First, generate a uniform random number $\theta$ between 0 and 1. Once we have that number, we go over the graphs one after the other in some order, and for each one calculate the probability of obtaining it. When the sum of probabilities of all the graphs we have seen so far exceeds $\theta$, we output the last graph we saw, and we are done.

This approach can work well when $n$ is, say, 5. But the number of graphs on $n$ vertices is $2 ^ {n \choose 2}$, which for all practical purposes can be treated as $\infty$ for all $n > 15$; there are simply too many graphs to go over ($\sim 2^{100}$). Further, we very casually glossed over another problem in the above paragraph: We don’t really know the probability $\text{Pr} [G_{n}^{f} = G ] = e^{f(G)}/Z,$ of obtaining a graph under this model. Sure, we can calculate the numerator $e^{f(G)}$, but the normalizing constant $Z$ again involves going over all graphs! It seems that in this naive approach, there is no avoiding iterating over all graphs.

What people usually do instead is to use a variety of algorithms falling under the name “Markov Chain Monte Carlo Method” (MCMCM). This is a neat class of techniques which use random processes in order to approximate probability distributions.

The general technique is to start with some graph on $n$ vertices, and then proceed in small, random steps. For example, at each step you might add or remove an edge from the graph according to some probability rule. The changes that are made to the graph, and the probability rule that governs them, are determined only by the graph you have in the current step, and not on the graph you started with or on anything else in the history of the computation. This is called a “Markov Chain”. The trick is to choose the changes and the probability of making them carefully, so that after many steps, you end up with a random graph whose distribution is very close to the distribution that you want to sample from. If each step takes only a small amount of time to carry out, and if it doesn’t take too many steps to get close to the original distribution, then this type of algorithm gives an efficient way to sample an exponential random graph!

Let’s see one specific algorithm from this class, called “Glauber dynamics”:

1. Start with any graph $G$; for example, the empty graph.
2. Pick two vertices uniformly at random from $G$. These vertices might or might not be connected by an edge, which we denote $e$.
3. Denote by $G^{+}$ the graph that is identical to $G$, except that it also contains $e$, and by $G^{-}$ the graph that is identical to $G$, except that it does not contain the edge $e$.
4. Add the edge $e$ to $G$ with probability proportional to $e^{f(G^{+})}$ (if $G$ already contains $e$ don’t do anything), and remove the edge $e$ from $G$ with probability proportional to $e^{f(G^{-})}$ (if $G$ doesn’t contain $e$ don’t do anything).
5. Run steps (2)-(4) for $T$ time steps, then stop ($T$ should be chosen beforehand so that the resulting distribution is close to $G_{n}^{f}$; we’ll talk about it later on).

Before we discuss the amount of time it takes to run this algorithm, we should verify that the Glauber dynamics indeed converge to the exponential random graph distribution. That is, we need to show that if we run this process for enough steps, then the graph $G$ will distribute approximately like $G_{n}^{f}$. To do this, let’s first calculate the probabilities of adding / removing an edge.

Denote the probability of adding an edge by $P^{+}$, and the probability of removing an edge by $P^{-}$. First, we have that

$P^{+} + P^{-} = 1,$

since probabilities always sum up to 1. Second, we have that

$\dfrac{P^{+}}{P^{-}} = \dfrac{e^{f(G^{+})}}{e^{f(G^{-})}} = e^{f(G^{+}) - f(G^{-})},$

since the probabilities are proportional to $e^{f}$. Solving these two equations, we get

$P^{-} = \dfrac{1}{1 + e^{f(G^{+}) - f(G^{-})}}$

and

$P^{+} = \dfrac{e^{f(G^{+}) - f(G^{-})}}{1 + e^{f(G^{+}) - f(G^{-})}}.$

Already we are in a better condition than before: the probability to makes a move does not depend on the normalizing constant $Z$. All we have to do at each step is to calculate $f(G^{+})$ and $f(G^{-})$. It is possible to do this in time that depends only polynomially on $n$: if $f$ counts a graph $H$ on $m$ vertices, all we have to do is go over all ordered subsets of vertices of $G^{\pm}$ of size $m$, and see whether or not they form the desired graph $H$. This requires only about $n^m$ computations.

Now that we have calculated the probabilities, we can check that we indeed converge to the exponential random graph distribution. For this we will use a deus-ex-machina well known result about Markov chains. Observe that:

1. At each step, we have a finite probability of not changing the graph at all.
2. If we start with some graph $G$, it is possible (by a miraculous stroke of luck) to obtain any other graph $G'$ after $n^2$ steps: it could be that by chance our algorithm picked all the edges in which $G$ and $G'$ differed and changed them.
3. There are only a finite number of graphs on $n$ vertices.

These three properties imply that the Markov chain is “finite, irreducible and aperiodic”, and a theorem in Markov chain theory (which we won’t prove) states that such chains always converge to a single distribution called the stationary distribution.

Aha! A stationary distribution is a distribution $\pi$ over all possible graphs, so that if we randomly pick $G \sim \pi$ as our initial graph in the algorithm and we take a single step, then the new $G$ is still distributed as $G \sim \pi$ ! In other words, $\pi$ is preserved when applying the steps of the Markov chain. By the above theorem, the Glauber dynamics algorithm has a unique stationary distribution to which it converges. If we can show that the exponential random graph distribution is stationary, we are done!

So, suppose that we start with a graph that distributes according to $G_{n}^{f}$. After one step of Glauber dynamics, what is the probability of obtaining some graph $H$? Since Glauber dynamics can change no more than one edge at a time, we need to go over all graphs $G$ which differ from $H$ by one edge, and calculate the transition probability. We also need to calculate the probability of starting at $H$ and staying there. We do this by going over the edges.

For each edge $e \in H$, there are two ways to get the graph $H$: either we start with the graph without $e$, which we denote $H^{-e}$, and add it; this happens with probability

$\dfrac{e^{f\left(H^{-e}\right)}}{Z} \cdot\dfrac{1}{{n \choose 2}} \cdot \dfrac{e^{f\left(H\right)-f\left(H^{-e}\right)}}{1+e^{f\left(H\right)-f\left(H^{-e}\right)}};$

or, we start with $H$, pick $e$, and choose to keep it; this happens with probability

$\dfrac{e^{f\left(H\right)}}{Z} \cdot \dfrac{1}{{n \choose 2}} \cdot \dfrac{e^{f\left(H\right)-f\left(H^{-e}\right)}}{1+e^{f\left(H\right)-f\left(H^{-e}\right)}}.$

Likewise, for each edge $e \notin H$, there are again two ways to get the graph $H$: either we start with the graph that has $e$, which we denote $H^{+e}$, and remove it; this happens with probability

$\dfrac{e^{f\left(H^{+e}\right)}}{Z} \cdot \dfrac{1}{{n \choose 2}} \cdot \dfrac{1}{1+e^{f\left(H^{+e}\right)-f\left(H\right)}};$

or we start with $H$, pick $e$, and choose not to keep it; this happens with probability

$\dfrac{e^{f\left(H\right)}}{Z} \cdot \dfrac{1}{{n \choose 2}} \cdot \dfrac{1}{1+e^{f\left(H^{+e}\right)-f\left(H\right)}}.$

The probability of getting $H$ after one step is then (ok, you can skip this equation block if it makes you feel queasy)

$=\sum_{e\in H}\left(\dfrac{e^{f\left(H^{-e}\right)}}{Z}\dfrac{1}{{n \choose 2}}\dfrac{e^{f\left(H\right)-f\left(H^{-e}\right)}}{1+e^{f\left(H\right)-f\left(H^{-e}\right)}}+\dfrac{e^{f\left(H\right)}}{Z}\dfrac{1}{{n \choose 2}}\dfrac{e^{f\left(H\right)-f\left(H^{-e}\right)}}{1+e^{f\left(H\right)-f\left(H^{-e}\right)}}\right)+$

$\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,+\sum_{e\notin H}\left(\dfrac{e^{f\left(H^{+e}\right)}}{Z}\dfrac{1}{{n \choose 2}}\dfrac{1}{1+e^{f\left(H^{+e}\right)-f\left(H\right)}}+\dfrac{e^{f\left(H\right)}}{Z}\dfrac{1}{{n \choose 2}}\dfrac{1}{1+e^{f\left(H^{+e}\right)-f\left(H\right)}}\right)$

$=\sum_{e\in H}\left(\dfrac{e^{f\left(H^{-e}\right)}}{Z}\dfrac{1}{{n \choose 2}}\dfrac{e^{f\left(H\right)-f\left(H^{-e}\right)}}{1+e^{f\left(H\right)-f\left(H^{-e}\right)}}+\dfrac{e^{f\left(H\right)}}{Z}\dfrac{1}{{n \choose 2}}\left(1-\dfrac{1}{1+e^{f\left(H\right)-f\left(H^{-e}\right)}}\right)\right)+$

$\,\,\,\,\,\,\,\,\,\,\,\,+\sum_{e\notin H}\left(\dfrac{e^{f\left(H^{+e}\right)}}{Z}\dfrac{1}{{n \choose 2}}\dfrac{1}{1+e^{f\left(H^{+e}\right)-f\left(H\right)}}+\dfrac{e^{f\left(H\right)}}{Z}\dfrac{1}{{n \choose 2}}\left(1-\dfrac{e^{f\left(H^{+e}\right)-f\left(H\right)}}{1+e^{f\left(H^{+e}\right)-f\left(H\right)}}\right)\right)$

$= \dfrac{e^{f(H)}}{Z} + \sum_{e\in H} 0 + \sum_{e \notin H} 0$

$= \dfrac{e^{f(H)}}{Z}.$

It works!

Alright! The calculations are all in order. The nice thing about Glauber dynamics is that they can easily be simulated on a computer: we can start with any initial $G$, choosing two vertices at random is easy, and calculating the value $e^{f(G)}$ isn’t that hard either, as we saw above. Thus each step takes a polynomial time in the number of vertices $n$, which falls under the category of “each step takes only a small amount of time to carry out”. The only problem left is to figure out what $T$ to take in step (5), that is, to know when to stop the algorithm.

Alas, this is a tricky thorn, for two reasons: first, precise analytical results on the rate of convergence for the Markov chain are scarce, and we often don’t really know when we should the stop the algorithm. Second (and worse), some of the results we do possess are quite depressing! For example, it was shown in Bhamidi, Bresler and Sly that if we count subgraphs with only positive weight, then under certain conditions, one of two things happen.

• The Markov chain converges really quickly to the stationary distribution; hurray! However, the stationary distribution in this case is actually pretty much just $G(n,p)$! It turns out that sometimes exponential random graphs behave basically just like $G(n,p)$, even if you count more than just edges. So all the above machinery turns out to be quite an overkill.
• The Markov chain converges really slowly to the stationary distribution (which seems to behave like a mixture of $G(n,p)$‘s). By “really slowly”, I mean a time that is exponential in the number of vertices. Wait, isn’t that what we wanted to avoid in the first place?

Luckily for us, this grim result doesn’t say anything about what happens when any one of the weights $\beta$ is negative. Maybe you, dear reader, are up to the challenge?

Real life and parameter estimation

So, do exponential random graphs have anything to do with real life apart from giving mathematicians a hard time? Many people, especially social scientists, think they do, and they try to “fit” exponential random graph parameters to observed networks. That is, given a real-life network, they try to guess what subgraphs $H_{i}$ are counted and with that weight $\beta_{i}$. Once they have a good estimate for those parameters, they can generate more random graphs and try to learn something interesting about them, perhaps gaining knowledge about the original graph or about the social process that formed it by the way.

One natural way to fit an exponential random graph model to an existing network is as follows. Suppose you already have an observed graph $G$. You notice that it contains very little triangles, but lots of squares; this makes you guess that there is some sort of mechanism that penalizes triangles but rewards squares. A candidate Hamiltonian function $f$ would then be one that counts triangles with a negative weight $\beta_{T}$ and counts squares with a positive weight $\beta_{S}$. But what should the values of $\beta_{T}$ and $\beta_{S}$ be? Well, for every choice of weights, you can calculate the expected number of triangles and the expected number of squares. A likely choice for $\beta_{T}$ and $\beta_{S}$ is to pick them so that the expected number of triangles and squares in $G_{n}^{f}$ is equal to the number of triangles and squares you have in $G$. In other words, if we denote $T(G)$ to be the number of triangles in $G$ and $S(G)$ to be the number of squares, you need to solve the two equations:

$T(G) = \sum_{H} T(H) \cdot \dfrac{e^{\beta_{T}T(H) + \beta_{S}S(H) }}{Z}$

$S(G) = \sum_{H} S(H) \cdot \dfrac{e^{\beta_{T}T(H) + \beta_{S}S(H) }}{Z}$

These are two non-linear equations in two variables ($\beta_{T}$ and $\beta_{S}$), and you can hope that they have a solution of the type you are looking for. In any case, solving them won’t be easy: first, you again need to know the normalizing factor $Z$, and again there is a summation over all graphs $H$ on $n$ vertices; not a pretty sight. Second, how do you even solve such non-linear equations? How do you get guarantees on the solution?

The ability to sample graphs according to $G_{n}^{f}$ offers at least some condolences: we can enumerate over values of $\beta_{T}$ and $\beta_{S}$ which we deem worthy, generate many random graphs, and calculate the empirical average of the number of triangles and squares for those particular $\beta$ values. By the law of large numbers, these should converge to the true values, if we take enough samples (but then again, how many samples is “enough samples”?)

Actually, with regards to modeling the real world, we’ve only scratched the surface. Real networks sometimes have asymmetrical relations (i.e directed edges), they have different types of nodes which correspond to different types of real-life entities, and are in general hard to deal with. For an offbeat worked-out example of this sort of analysis on a modern social network, see this blog post.

A new hope

Luckily, there is still ongoing mathematical research in the field. My advisor and I have recently uploaded a paper to the arXiv which shows that exponential random graphs can actually be seen as a mixture of graphs with independent edges. We don’t know how those mixtures behave yet, but we do have something to say about how those independent edges look like. Generating and modeling graphs with independent edges is certainly easier than dealing with the straightforward exponential definition. And maybe if we put our minds to it, we’ll find out that exponential random graphs aren’t that unmanageable after all.