Not long ago, I started studying for my undergraduate degree in mathematics and physics at the Technion. This by itself is a good prospect, but in the process I also found myself added to various Facebook groups related to the studies. Looking at a particularly small one (11 members), I noticed that most of the people were added to the group by only two members. This got me thinking that it would be interesting to graph the membership dynamics of different groups: who added who (and when?). Perhaps a tree can be constructed which quickly shows who was able to “draft” the most members, and who just joined because he was passively added. In any case, I conjectured that in many groups, a relative few enthusiastic and energetic “recruiters” were behind most of the members of a group.
Well, you know me. Scripts that download, parse, and analyze data from various websites have been featured several times already in this blog, and today is no Exception().
In the current format, all non-private Facebook groups allow you to see their users – even if they are considered “closed groups”, for which the user has to request permission in order to join. However, only for open groups, or for groups of which you are already a member, does the “members” page show who was added by whom. For a foreign closed group, you can see the members, but not the recruiting data. Hence, all my data came from a few closed groups of which I am a member, and many more open groups.
As far as I understand, Facebook distinguishes between two cases when presenting membership data in the members page. If a person joined the group over a year ago, the site will display “Joined over a year ago”, regardless of how she were added. However, if she only recently joined the group, there are two cases. Either she added herself, in which case the text is “Joined X time ago”, or she was added, in which case the text is “Added by…” or “Invited by…”. This information loss means that I can only compute recruitment trees for young groups; that is, groups which are younger than one year.
So in order to get a workable data set, I had to find recent, open groups, with an order of magnitude of 100 people or less. Inspired by my studies, I searched for university related groups: many of these are concentrated on semestrial topics, so they tend to be both small (how many people take advanced calculus in one university?) and young (groups are typically created per semester). It turns out that many such open groups exist; I used slightly less than 30 of them in this afternoon activity, resulting in 1877 memberships analyzed.
Having acquired the data, I wrote a python script which goes over the html, and parses it, generating a list of “who was added by whom” for the group. From this it generates a hierarchical tree, which can show the data nicely. An arrow from A to B means that A invited B to the group. For my own 11 person group, you can see the result (names changed to maintain privacy) – Roger and John did most of the inviting:
The script then goes over each person, and assigns a “recruiting number” – how many members that person recruited. If she were just added, without adding anyone else, that number is 0. If she added herself to the group, that increases her score. In the end, everything is normalized to be the percentage of the members that person recruited.
And the results? In the deepest crevices of my heart, I really wished the recruitment distribution to be a power law. That is, if I line up all the people by decreasing rank so that the lower the rank, the more people that person invited – the percentage of the group they recruited would be in the form of:
where c < 0. Putting them all in rank yields the following graph:
(Trailing zeros, of people who did not invite anyone, were removed from graph).
Although it may seem similar to a power law, something seems amiss – the curve is too fat in the beginning, and dies off too quickly. Indeed, if it were a power law, we should get a linearly decreasing plot when drawing logarithmic axes; however, this is the result:
Not very linearly decreasing. At best, it can be divided into two sections – one for the gentle slope in the beginning, and one for the steep slope at the end. This leads me to think that there is a distinction between different group recruiter types.
Regardless of the precise mathematical formula that governs the curve, we can definitely say that the majority of the contributions to group sizes are caused by a severe minority of group members. In fact, 50% of people who joined groups – about 940 people – did so by the courage and effort of less than 1.5% of the entire group community – about 23 people. That’s not surprising, given that the top two “inviters” in this data set brought 85 and 73 people into their groups. The common everyday inequality usually says 80-20: 80% of something is controlled by 20% of something else. Well, in this case, it’s 80-5: 80% of the members were brought on by 5% of the communities.
Of course, just because there are large inequalities doesn’t mean that they can’t be explained or expected. Could we really expect a high-equality scenario, where everyone invites the same number of people? In highest-equality case, the group is formed by an initial member, and everybody else joins on their own. The “recruitment number” of everyone, in this case, is 1, and the tree is completely flat:
A similar recruitment number is obtained when the creator of the group invites just one other person, who in turn invites one another, and so on, until the last member invites no one. In this case the creator has “recruitment number” of 2 (himself + member1), all but the last have 1, and the last has 0. There is a slight inequality from the above case, although topologically, instead of a completely flat tree, we get a long vine:
However, these are very unrealistic scenarios. In most cases that I know of, a group creator opens the group for a specific purpose, and proceeds to add all the relevant potential members. This gives him a large advantage on “who invites who” – you cannot invite the same person twice. This is especially true when many members are shared friends with each other, so only a few people are needed in order to invite the entire potential group.
In any case, I would think that the above reasoning holds true only for relatively small groups (indeed, like the ones I have tested). Larger ones, which require many more people in order to create a fully connected graph, may behave differently. For example, I am a member my high school’s Facebook group, which boasts more than 1200 members spanning from over twenty years of classes. I could not invite most existing members to the group even if I wanted to. I wonder if Facebook will ever release data like this…